Improved Duplication Models for Proteome Network Evolution
Lecture Notes in Computer Science
https://doi.org/10.1007/978-3-540-48540-7_11…
2 pages
1 file
Sign up for access to the world's latest research
Abstract
Protein-protein interaction networks, particularly that of the yeast S. Cerevisiae, have recently been studied extensively. These networks seem to satisfy the small world property and their (1-hop) degree distribution seems to form a power law. More recently, a number of duplication based random graph models have been proposed with the aim of emulating the evolution of protein-protein interaction networks and satisfying these two graph theoretical properties. In this paper, we show that the proposed model of Pastor-Satorras et al. does not generate the power law degree distribution with exponential cutoff as claimed and the more restrictive model by Chung et al. cannot be interpreted unconditionally. It is possible to slightly modify these models to ensure that they generate a power law degree distribution. However, even after this modification, the more general k-hop degree distribution achieved by these models, for k > 1, are very different from that of the yeast proteome network. We address this problem by introducing a new network growth model that takes into account the sequence similarity between pairs of proteins (as a binary relationship) as well as their interactions. The new model captures not only the k-hop degree distribution of the yeast protein interaction network for all k > 0, but it also captures the 1-hop degree distribution of the sequence similarity network, which again seems to form a power law.
Related papers
Molecular Biology and Evolution, 2005
The impact of the biological network structures on the divergence between the two copies of one duplicate gene pair involved in the networks has not been documented on a genome scale. Having analyzed the most recently updated Database of Interacting Proteins (DIP) by incorporating the information for duplicate genes of the same age in yeast, we find that there was a highly significantly positive correlation between the level of connectivity of ancient genes and the number of shared partners of their duplicates in the protein-protein interaction networks. This suggests that duplicate genes with a low ancestral connectivity tend to provide raw materials for functional novelty, whereas those duplicate genes with a high ancestral connectivity tend to create functional redundancy for a genome during the same evolutionary period. Moreover, the difference in the number of partners between two copies of a duplicate pair was found to follow a power-law distribution. This suggests that loss and gain of interacting partners for most duplicate genes with a lower level of ancestral connectivity is largely symmetrical, whereas the ''hub duplicate genes'' with a higher level of ancient connectivity display an asymmetrical divergence pattern in protein-protein interactions. Thus, it is clear that the proteinprotein interaction network structures affect the divergence pattern of duplicate genes. Our findings also provide insights into the origin and development of biological networks.
2006
In this paper we propose a generalized growth model for biological interaction networks, including a set of biological features which have been inspired by a long tradition of simulations of immune system and chemical reaction networks. In our models we include characteristics such as the heterogeneity of biological nodes, the existence of natural hubs, the nodes binding by mutual affinity and the significance of type-based networks as compared with instance-based networks.
Bioinformatics, 2011
Motivation: Much of the large-scale molecular data from living cells can be represented in terms of networks. Such networks occupy a central position in cellular systems biology. In the protein–protein interaction (PPI) network, nodes represent proteins and edges represent connections between them, based on experimental evidence. As PPI networks are rich and complex, a mathematical model is sought to capture their properties and shed light on PPI evolution. The mathematical literature contains various generative models of random graphs. It is a major, still largely open question, which of these models (if any) can properly reproduce various biologically interesting networks. Here, we consider this problem where the graph at hand is the PPI network of Saccharomyces cerevisiae. We are trying to distinguishing between a model family which performs a process of copying neighbors, represented by the duplication–divergence (DD) model, and models which do not copy neighbors, with the Barab...
Journal of Proteomics & Bioinformatics, 2008
The proper theoretical description of the distribution of the node degree for yeast protein-protein interaction network was investigated to deal with the observed discrepancy between usually proposed models and the existing data. The power law or the generalized power law with exponential cutoff were shown to be inaccurate within a wide range of degree values. Proposed linear-combination-of-exponentialdecays-method exactly characterizing the distribution by the spectrum of decay constants revealed two separate parameter domains. A consequent hypothesis that the node degree distribution could follow the universal double exponential law was successfully verified by selected model comparison using the AIC criterion. BIND and DIP data for H. pylori, E. coli, S. cerevisiae, D. melanogaster, C. elegans and A. thaliana were used for this purpose. A linear change in the magnitude of the distribution components with proteome size was observed, manifesting the evolutional stability of the process of developing the protein interaction network. Proposed kinetic model of protein evolution, considering the two hypothetical protein classes, first, with a relatively rapid emerging rate and a short characteristic residence time, and the second one, with the opposite properties, analytically described the nature of bi-exponential pattern. The model presents a situation in which evolutionary conserved proteins increase their interactions due to specific kinetic conditions. Thus, we oppose the opinion that the majority of such interactions are biologically significant, and, therefore the older parts of interactome are more complex. We believe that our interactome results support the hypothesis of Stuart Kaufman, presented in his book "The Origin of Order", that random mutations and natural selection constitute the origin of order and complexity.
Journal of the Royal Society, Interface / the Royal Society, 2012
We present an analysis of protein interaction network data via the comparison of models of network evolution to the observed data. We take a Bayesian approach and perform posterior density estimation using an approximate Bayesian computation with sequential Monte Carlo method. Our approach allows us to perform model selection over a selection of potential network growth models. The methodology we apply uses a distance defined in terms of graph spectra which captures the network data more naturally than previously used summary statistics such as the degree distribution. Furthermore, we include the effects of sampling into the analysis, to properly correct for the incompleteness of existing datasets, and have analysed the performance of our method under various degrees of sampling. We consider a number of models focusing not only on the biologically relevant class of duplication models, but also including models of scale-free network growth that have previously been claimed to describe such data. We find a preference for a duplication-divergence with linear preferential attachment model in the majority of the interaction datasets considered. We also illustrate how our method can be used to perform multi-model inference of network parameters to estimate properties of the full network from sampled data.
Journal of Statistical Mechanics: Theory and Experiment, 2007
Background: Duplication of genes is important for evolution of molecular networks. Many authors have therefore considered gene duplication as a driving force in shaping the topology of molecular networks. In particular it has been noted that growth via duplication would act as an implicit way of preferential attachment, and thereby provide the observed broad degree distributions of molecular networks. Results: We extend current models of gene duplication and rewiring by including directions and the fact that molecular networks are not a result of unidirectional growth. We introduce upstream sites and downstream shapes to quantify potential links during duplication and rewiring. We find that this in itself generates the observed scaling of transcription factors for genome sites in procaryotes. The dynamical model can generate a scale-free degree distribution, p(k) ∝ 1/k γ , with exponent γ = 1 in the non-growing case, and with γ > 1 when the network is growing. Conclusions: We find that duplication of genes followed by substantial recombination of upstream regions could generate main features of genetic regulatory networks. Our steady state degree distribution is however to broad to be consistent with data, thereby suggesting that selective pruning acts as a main additional constraint on duplicated genes. Our analysis shows that gene duplication can only be a main cause for the observed broad degree distributions, if there is also substantial recombinations between upstream regions of genes.
New Journal of Physics, 2006
We introduce a minimalistic model based on dynamic node deletion and node duplication with heterodimerization. The model is intended to capture the essential features of the evolution of protein interaction networks. We derive an exact two-step rate equation to describe the evolution of the degree distribution. We present results for the case of a fixed-size network. The results are based on the exact numerical solution to the rate equation which are consistent with Monte Carlo simulations of the model's dynamics. Power-law degree distributions with apparent exponents <1 were observed for generic parameter choices. However, a proper finite-size scaling analysis revealed that the actual critical exponent in such cases is equal to one. We present a mean-field argument to determine the asymptotic value of the average degree, illustrating the existence of an attractive fixed point, and corroborate this result with numerical simulations of the first moment of the degree distribution as described by the two-step rate equation. Using the above results, we show that the apparent exponent is determined by the heterodimerization probability. Our preliminary results are consistent with empirical data for a wide range of organisms, and we believe that through implementing some of the suggested modifications, the model could be wellsuited to other types of biological and non-biological networks.
2005
The degree distribution of many biological and technological networks has been described as a power-law distribution. While the degree distribution does not capture all aspects of a network, it has often been suggested that its functional form contains important clues as to underlying evolutionary processes that have shaped the network. Generally, the functional form for the degree distribution has been determined in an ad-hoc fashion, with clear power-law like behaviour often only extending over a limited range of connectivities. Here we apply formal model selection techniques to decide which probability distribution best describes the degree distributions of protein interaction networks. Contrary to previous studies this well defined approach suggests that the degree distribution of many molecular networks is often better described by distributions other than the popular power-law distribution. This, in turn, suggests that simple, if elegant, models may not necessarily help in the quantitative understanding of complex biological processes.
Proceedings of the National …, 2008
Genomic duplication-divergence processes are the primary source of new protein functions and thereby contribute to the evolutionary expansion of functional molecular networks. Yet, it is still unclear to what extent such duplication-divergence processes also restrict by construction the emerging properties of molecular networks, regardless of any specific cellular functions. We address this question, here, focusing on the evolution of protein-protein interaction (PPI) networks. We solve a general duplication-divergence model, based on the statistically necessary deletions of protein-protein interactions arising from stochastic duplications at various genomic scales, from single-gene to whole-genome duplications. Major evolutionary scenarios are shown to depend on two global parameters only: (i) a protein conservation index (M), which controls the evolutionary history of PPI networks, and (ii) a distinct topology index (M) controlling their resulting structure. We then demonstrate that conserved, nondense networks, which are of prime biological relevance, are also necessarily scale-free by construction, irrespective of any evolutionary variations or fluctuations of the model parameters. It is shown to result from a fundamental linkage between individual protein conservation and network topology under general duplication-divergence evolution. By contrast, we find that conservation of network motifs with two or more proteins cannot be indefinitely preserved under general duplication-divergence evolution (independently from any network rewiring dynamics), in broad agreement with empirical evidence between phylogenetically distant species. All in all, these evolutionary constraints, inherent to duplication-divergence processes, appear to have largely controlled the overall topology and scale-dependent conservation of PPI networks, regardless of any specific biological function. evolutionary constraint ͉ scale-free graph ͉ functional motif ͉ orthology ͉ statistical model
Princeton University Press eBooks, 2011
We introduce a graph generating model aimed at representing the evolution of protein interaction networks. The model is based on the hypotesis of evolution by duplications and divergence of the genes which produce proteins. The obtained graphs shows multifractal properties recovering the absence of a characteristic connectivity as found in real data of protein interaction networks. The error tolerance of the model to random or targeted damage is in very good agreement with the behavior obtained in real protein networks analysis. The proposed model is a first step in the identification of the evolutionary dynamics leading to the development of protein functions and interactions.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.