The cellular environment is a complex and dynamic system of functional molecules. These molecules interact, either transiently or stably, in pathways, which remain poorly mapped despite extensive study. High-throughput experimental methods have allowed us to fill some of the gaps in pathway knowledge, but have created new algorithmic challenges associated with the integration of large, disparate sources of data. Here, we present different algorithmic approaches to the integration of heterogeneous datasets. The article examines algorithms that analyze networks with a single data source, methods that analyze networks with more complex data types, and finally algorithms that capture hierarchical flow of biological information. We also examine a critical issue that has been largely ignored in most network approaches: To what extent can gene expression data be used as a proxy for protein levels? While most studies conflate these two, it is now abundantly clear from high-throughput analyses that the correlation between these data is very poor. We close by speculating on where a more realistic modeling of different types of data will lead the field.
Read more at ScienceDirect.