Papers by Joseph Bozorgmehr

Current Genomics, Sep 1, 2013
Protein interactions play an important role in the discovery of protein functions and pathways in... more Protein interactions play an important role in the discovery of protein functions and pathways in biological processes. This is especially true in case of the diseases caused by the loss of specific protein-protein interactions in the organism. The accuracy of experimental results in finding protein-protein interactions, however, is rather dubious and high throughput experimental results have shown both high false positive beside false negative information for protein interaction. Computational methods have attracted tremendous attention among biologists because of the ability to predict protein-protein interactions and validate the obtained experimental results. In this study, we have reviewed several computational methods for protein-protein interaction prediction as well as describing major databases, which store both predicted and detected protein-protein interactions, and the tools used for analyzing protein interaction networks and improving protein-protein interaction reliability.

Research Square (Research Square), May 31, 2023
Despite being previously regarded as extremely unlikely, the idea that entirely novel protein-cod... more Despite being previously regarded as extremely unlikely, the idea that entirely novel protein-coding genes can emerge from non-coding sequences has gradually become accepted over the past two decades. Examples of "de novo origination", resulting in lineage-specific "orphan" genes, lacking orthologs, are now produced every year. However, many are likely cases of duplicates that are difficult to recognize. Here, I reexamine the claims and show that four very well-known examples of genes alleged to have emerged de novo "from scratch"-namely FLJ33706 in humans, Goddard in fruit flies, BSC4 in baker's yeast and AFGP2 in codfish-all have plausible evolutionary ancestors in pre-existing genes. In the case of the first two, highly diverged retrogenes that code for regulatory proteins may have been misidentified as being orphans. The antifreeze glycoproteins in cod, moreover, are shown to have likely not evolved from repetitive non-genic sequences but, as in other related cases, from an apolipoprotein that may well have been pseudogenized before later being reactivated. These findings detract from various claims made about de novo gene birth and show there has been a tendency not to invest the necessary effort in searching for homologs outside of a very limited syntenic or phylostratigraphic methodology. An approach used here for improving homology detection draws upon similarities, not just in terms of statistical sequence analysis, but also with biochemistry and function, in order to obviate failure.

Four classic “de novo” genes all have plausible homologs and likely evolved from retro-duplicated or pseudogenic sequences
Despite being previously regarded as extremely unlikely, the idea that entirely novel protein-cod... more Despite being previously regarded as extremely unlikely, the idea that entirely novel protein-coding genes can emerge from non-coding sequences has gradually become accepted over the past two decades. Examples of “de novo origination”, resulting in lineage-specific “orphan” genes, lacking orthologs, are now produced every year. However, many are likely cases of duplicates that are difficult to recognize. Here, I re-examine the claims and show that four very well-known examples of genes alleged to have emerged de novo “from scratch” - namelyFLJ33706in humans,Goddardin fruit flies,BSC4in baker’s yeast andAFGP2in codfish - all have plausible evolutionary ancestors in pre-existing genes. In the case of the first two, highly diverged retrogenes that code for regulatory proteins may have been misidentified as being orphans. The antifreeze glycoproteins in cod, moreover, are shown to have likely not evolved from repetitive non-genic sequences but, as in other related cases, from an apolipo...
Response to Zhang et al. (2019): De novo protein-coding genes in Oryza may not be unique or functional

Journal of Theoretical Biology, 2015
Although the analysis of protein molecules is extensive, their primary sequences have yet to be q... more Although the analysis of protein molecules is extensive, their primary sequences have yet to be quantified like their mass or size. The composition and particular arrangement of amino acids in proteins confers the distinct biochemical functionality, but it remains unclear why only a tiny proportion of possible character combinations are potentially functional. Here, I offer a simple but effective technique, utilizing the assignment of codons in the genetic code, that permits the quantification of polypeptide sequences and establishes statistical parameters through which they can now be numerically compared. Two main tests were conducted, one analyzing the composition and the other the specific order of the amino acids within the primary sequence. The results confirm that natural proteins are significantly different to random heteropolymers of equivalent size, although this is much more marginal in the case of the arrangement than it is for the composition. Moreover, they reveal that there are key patterns that have hitherto not been identified, relevant to the the study of the evolution of proteins, and which raise doubts about the plausibility of some purported cases of the de novo origination of protein-coding genes from intergenic DNA. Despite the fact that the applicability of quantification to the design of novel proteins is probably limited, it nonetheless provides a useful guideline that could complement much more precise methods.
The origin of the Homeobox at the C-terminus of MraY in Lokiarchaea
Adaptive Landscapes in Light of Co‐Option and Exaptation: How the Darwin–Mivart Dispute Continues to Shape Evolutionary Biology
BioEssays
The Origin of Chromosomal Histones in a 30S Ribosomal Protein
Gene
Molecular BioSystems, 2014
In high-dimensional genome-wide (GWA) data, a key challenge is to detect genomic variants that in... more In high-dimensional genome-wide (GWA) data, a key challenge is to detect genomic variants that interact in a nonlinear fashion in their association with disease.

In Silico Pharmacology, 2013
With the growing understanding of complex diseases, the focus of drug discovery has shifted away ... more With the growing understanding of complex diseases, the focus of drug discovery has shifted away from the wellaccepted "one target, one drug" model, to a new "multi-target, multi-drug" model, aimed at systemically modulating multiple targets. Identification of the interaction between drugs and target proteins plays an important role in genomic drug discovery, in order to discover new drugs or novel targets for existing drugs. Due to the laborious and costly experimental process of drug-target interaction prediction, in silico prediction could be an efficient way of providing useful information in supporting experimental interaction data. An important notion that has emerged in postgenomic drug discovery is that the large-scale integration of genomic, proteomic, signaling and metabolomic data can allow us to construct complex networks of the cell that would provide us with a new framework for understanding the molecular basis of physiological or pathophysiological states. An emerging paradigm of polypharmacology in the postgenomic era is that drug, target and disease spaces can be correlated to study the effect of drugs on different spaces and their interrelationships can be exploited for designing drugs or cocktails which can effectively target one or more disease states. The future goal, therefore, is to create a computational platform that integrates genome-scale metabolic pathway, protein-protein interaction networks, gene transcriptional analysis in order to build a comprehensive network for multi-target multi-drug discovery.

Genomics, 2014
Protein-protein interaction (PPI) detection is one of the central goals of functional genomics an... more Protein-protein interaction (PPI) detection is one of the central goals of functional genomics and systems biology. Knowledge about the nature of PPIs can help fill the widening gap between sequence information and functional annotations. Although experimental methods have produced valuable PPI data, they also suffer from significant limitations. Computational PPI prediction methods have attracted tremendous attentions. Despite considerable efforts, PPI prediction is still in its infancy in complex multicellular organisms such as humans. Here, we propose a novel ensemble learning method, LocFuse, which is useful in human PPI prediction. This method uses eight different genomic and proteomic features along with four types of different classifiers. The prediction performance of this classifier selection method was found to be considerably better than methods employed hitherto. This confirms the complex nature of the PPI prediction problem and also the necessity of using biological information for classifier fusion. The LocFuse is available at: http://lbb.ut.ac. ir/Download/LBBsoft/LocFuse. Q14 Biological significance: The results revealed that if we divide proteome space according to the cellular localization of proteins, then the utility of some classifiers in PPI prediction can be improved. Therefore, to predict the interaction for any given protein pair, we can select the most accurate classifier with regard to the cellular localization information. Based on the results, we can say that the importance of different features for PPI prediction varies between differently localized proteins; however in general, our novel features, which were Q3 extracted from position-specific scoring matrices (PSSMs), are the most important ones and the Random Forest (RF) classifier performs best in most cases. LocFuse was developed with a user-friendly graphic interface and it is freely available for Linux, Mac OSX and MS Windows operating systems.

Theory in Biosciences, 2014
In developmental and evolutionary biology, particular emphasis has been given to the relationship... more In developmental and evolutionary biology, particular emphasis has been given to the relationship between transcription factors and the cognate cis-regulatory elements of their target genes. These constitute the gene regulatory networks that control expression and are assumed to causally determine the formation of structures and body plans. Comparative analysis has, however, established a broad sequence homology among species that nonetheless display quite different anatomies. Transgenic experiments have also confirmed that many developmentally important elements are, in fact, functionally interchangeable. Although dependent upon the appropriate degree of gene expression, the actual construction of specific structures appears not directly linked to the functions of gene products alone. Instead, the self-formation of complex patterns, due in large part to epigenetic and nongenetic determinants, remains a persisting theme in the study of ontogeny and regenerative medicine. Recent evidence indeed points to the existence of a self-organizing process, operating through a set of intrinsic rules and forces, which imposes coordination and a holistic order upon cells and tissue. This has been repeatedly demonstrated in experiments on regeneration as well as in the autonomous formation of structures in vitro. The process cannot be wholly attributed to the functional outcome of proteinprotein interactions or to concentration gradients of diffusible chemicals. This phenomenon is examined here along with some of the methodological and theoretical approaches that are now used in understanding the causal basis for self-organization in development and its evolution. Keywords Self-organization Á Gene regulatory networks Á Morphogenesis Á Evo-devo Á Regeneration Theory Biosci.

Cancer systems biology and modeling: Microscopic scale and multiscale approaches
Seminars in Cancer Biology, 2014
Cancer has become known as a complex and systematic disease on macroscopic, mesoscopic and micros... more Cancer has become known as a complex and systematic disease on macroscopic, mesoscopic and microscopic scales. Systems biology employs state-of-the-art computational theories and high-throughput experimental data to model and simulate complex biological procedures such as cancer, which involves genetic and epigenetic, in addition to intracellular and extracellular complex interaction networks. In this paper, different systems biology modeling techniques such as systems of differential equations, stochastic methods, Boolean networks, Petri nets, cellular automata methods and agent-based systems are concisely discussed. We have compared the mentioned formalisms and tried to address the span of applicability they can bear on emerging cancer modeling and simulation approaches. Different scales of cancer modeling, namely, microscopic, mesoscopic and macroscopic scales are explained followed by an illustration of angiogenesis in microscopic scale of the cancer modeling. Then, the modeling of cancer cell proliferation and survival are examined on a microscopic scale and the modeling of multiscale tumor growth is explained along with its advantages.

Journal of Genetics, 2012
Gene duplicates have the inherent property of initially being functionally redundant. This means ... more Gene duplicates have the inherent property of initially being functionally redundant. This means that they can compensate for the effect of deleterious variation occurring at one or more sister sites. Here, I present data bearing on evolutionary theory that illustrates the manner in which any functional adaptation in duplicate genes is markedly constrained because of the compensatory utility provided by a sustained genetic redundancy. Specifically, a two-locus epistatic model of paralogous genes was simulated to investigate the degree of purifying selection imposed, and whether this would serve to impede any possible biochemical innovation. Three population sizes were considered to see if, as expected, there was a significant difference in any selection for robustness. Interestingly, physical linkage between tandem duplicates was actually found to increase the probability of any neofunctionalization and the efficacy of selection, contrary to what is expected in the case of singleton genes. The results indicate that an evolutionary trade-off often exists between any functional change under either positive or relaxed selection and the need to compensate for failures due to degenerative mutations, thereby guaranteeing the reliability of protein production. [Bozorgmehr J. E. H. 2012 The effect of functional compensation among duplicate genes can constrain their evolutionary divergence.

Journal of Bioeconomics, 2012
The success of extant species is largely due to their ability to adapt in the face of constantly ... more The success of extant species is largely due to their ability to adapt in the face of constantly changing environmental conditions. Natural selection is the biological mechanism that takes advantage of opportunities to promote spontaneous variations and facilitate evolutionary development. The character of this biological opportunism is considered here, placing it firmly within the context of various social and economic principles-notably individualism, industrialism, utilitarianism and consequentialism-that have characterised the philosophy of the modern era. However, this purely opportunistic approach, and its myopic emphasis on immediate problem solving, has serious shortcomings within both life and business practice. These are examined here in contrast to some of the alternative approaches found in biology and economics theory. The nature and relationship of function to utility in biology is also given particular consideration, as is the issue of incrementalism in the development of complex adaptive features. The methodological reductionism at the heart of evolutionary biology certainly does offer insightful empirical results reported in the scientific literature. Nonetheless, natural selection is observed to be a purely reflexive mechanism and not one capable of producing the kind of innovation necessary for the more revolutionary changes in an organism's systems.
To the Editors of IUBMB Life: Correspondence concerning an alleged erratum in a review on the genetic code
IUBMB Life, 2012

Current Genomics, 2014
In recent years, in silico studies and trial simulations have complemented experimental procedure... more In recent years, in silico studies and trial simulations have complemented experimental procedures. A model is a description of a system, and a system is any collection of interrelated objects; an object, moreover, is some elemental unit upon which observations can be made but whose internal structure either does not exist or is ignored. Therefore, any network analysis approach is critical for successful quantitative modeling of biological systems. This review highlights some of most popular and important modeling algorithms, tools, and emerging standards for representing, simulating and analyzing cellular networks in five sections. Also, we try to show these concepts by means of simple example and proper images and graphs. Overall, systems biology aims for a holistic description and understanding of biological processes by an integration of analytical experimental approaches along with synthetic computational models. In fact, biological networks have been developed as a platform for integrating information from high to low-throughput experiments for the analysis of biological systems. We provide an overview of all processes used in modeling and simulating biological networks in such a way that they can become easily understandable for researchers with both biological and mathematical backgrounds. Consequently, given the complexity of generated experimental data and cellular networks, it is no surprise that researchers have turned to computer simulation and the development of more theory-based approaches to augment and assist in the development of a fully quantitative understanding of cellular dynamics.

Current Genomics, 2013
The Trypanosomatids parasites Leishmania braziliensis, Leishmania major and Leishmania infantum a... more The Trypanosomatids parasites Leishmania braziliensis, Leishmania major and Leishmania infantum are important human pathogens. Despite of years of study and genome availability, effective vaccine has not been developed yet, and the chemotherapy is highly toxic. Therefore, it is clear just interdisciplinary integrated studies will have success in trying to search new targets for developing of vaccines and drugs. An essential part of this rationale is related to protein-protein interaction network (PPI) study which can provide a better understanding of complex protein interactions in biological system. Thus, we modeled PPIs for Trypanosomatids through computational methods using sequence comparison against public database of protein or domain interaction for interaction prediction (Interolog Mapping) and developed a dedicated combined system score to address the predictions robustness. The confidence evaluation of network prediction approach was addressed using gold standard positive and negative datasets and the AUC value obtained was 0.94. As result, 39,420, 43,531 and 45,235 interactions were predicted for L. braziliensis, L. major and L. infantum respectively. For each predicted network the top 20 proteins were ranked by MCC topological index. In addition, information related with immunological potential, degree of protein sequence conservation among orthologs and degree of identity compared to proteins of potential parasite hosts was integrated. This information integration provides a better understanding and usefulness of the predicted networks that can be valuable to select new potential biological targets for drug and vaccine development. Network modularity which is a key when one is interested in destabilizing the PPIs for drug or vaccine purposes along with multiple alignments of the predicted PPIs were performed revealing patterns associated with protein turnover. In addition, around 50% of hypothetical protein present in the networks received some degree of functional annotation which represents an important contribution since approximately 60% of Leishmania predicted proteomes has no predicted function.

Complexity, 2011
All life depends on the biological information encoded in DNA with which to synthesize and regula... more All life depends on the biological information encoded in DNA with which to synthesize and regulate various peptide sequences required by an organism's cells. Hence, an evolutionary model accounting for the diversity of life needs to demonstrate how novel exonic regions that code for distinctly different functions can emerge. Natural selection tends to conserve the basic functionality, sequence, and size of genes and, although beneficial and adaptive changes are possible, these serve only to improve or adjust the existing type. However, gene duplication allows for a respite in selection and so can provide a molecular substrate for the development of biochemical innovation. Reference is made here to several well-known examples of gene duplication, and the major means of resulting evolutionary divergence, to examine the plausibility of this assumption. The totality of the evidence reveals that, although duplication can and does facilitate important adaptations by tinkering with existing compounds, molecular evolution is nonetheless constrained in each and every case. Therefore, although the process of gene duplication and subsequent random mutation has certainly contributed to the size and diversity of the genome, it is alone insufficient in explaining the origination of the highly complex information pertinent to the essential functioning of living organisms.

Biosystems, 2011
One of the prevailing arguments in evolutionary theory is that the duplicates of genes can acquir... more One of the prevailing arguments in evolutionary theory is that the duplicates of genes can acquire novel functionality. This is because only one of the paralogs need maintain the ancestral function, leaving room for natural experimentation due to a respite in purifying selection. Although many duplicates can subsequently become disabled by nullifying mutations, a few may also go on to diverge along a novel evolutionary trajectory. Here, evidence is provided that demonstrates how this scenario may not always be true. Rather, in the case of the highly conserved KPNA importin family, an initial relaxation in selection induced a frameshift that was later suppressed and heavily compensated for as part of a reparative and optimizing process. Despite a resulting divergence, there remains a distinct preservation of both sequence and functionality among the paralogs. This would indicate that duplicates can be retained by selection for reasons related to their redundant functionality. It also shows that, even when positive selection is inferred in duplicate genes, this may be of a compensatory nature rather than one representing any biochemical innovation. Generally, this development would perhaps be a more common outcome for gene duplication than is currently maintained.
Uploads
Papers by Joseph Bozorgmehr