Continual increasing the number of sequencing plant genomes imposes a high demand for computational analysis to retrieve information from it. Although various promoter prediction methods have been developed to date, they have not up to... more
Continual increasing the number of sequencing plant genomes imposes a high demand for computational analysis to retrieve information from it. Although various promoter prediction methods have been developed to date, they have not up to the satisfactory predictive performance. The limitations include the challenge of selecting appropriate features of promoters that distinguish them from non-promoters and predictive ability of the machine learning algorithms. In this paper, a novel approach is proposed in which n-mer sequences along with the nonlinear machine-learning algorithms such as support vector machine are used to distinguish between promoter and non-promoter DNA sequences. The basic principle of this proposed method is to observe the occurrences of most frequent n-mer sequences (FDAFSA) and the analysis of random triplet-triplet pairs based on genetic algorithm (RTFSGA) in promoter and non-promoter sequences that serve as a discriminating factor between these two. The classification model has showed a 10-fold cross-validation accuracy of 87.21% and 90% accuracy is achieved for 100 randomly selected dataset. The high sensitivity and selectivity indicates that n-mer frequencies and random triplet-pair analysis along with supervised machine learning method can be useful in the identification of plant RNA polymerase II promoters.
Integrated study on copy number aberration and gene expression has already been applied successfully in characterizing various cancer related problems, computationally. Decoding gene-gene relationship in cancer datasets is getting... more
Integrated study on copy number aberration and gene expression has already been applied successfully in characterizing various cancer related problems, computationally. Decoding gene-gene relationship in cancer datasets is getting increasing focus now-a-days but such concept has yet limitedly considered for integrating heterogeneous datasets. Majority of existing algorithm for detecting functional module relies on topological information i.e. protein-protein interaction whereas, observations from integrated datasets can also be applied to ensure the soundness of found modules both biologically and topologically. To integrate copy number and gene expression data we built gene-gene relationship network by enumerating all types of pair-wise correlations e.g. CNA-CNA, CNA-GE and GE-GE depending on the availability of data for particular gene-pairs. We calculated correlation over multiple patients for which both copy number and gene expression data are available in TCGA Glioblastoma Multiforme datasets. To reconstruct the network we considered maximum correlation value among all three types as a pair-wise entry (direct relation) if it was above threshold. Otherwise, that entry was updated by the indirect relationship value which was calculated by the geometric mean of all direct relationship values along a significant path (between those particular two genes). Next, we proposed a novel algorithm to find functional modules using the reconstructed network as source of biological information along with protein-protein interaction as topological information. We found 77 modules which have significant overlap (FDR-corrected p-values from hyper geometric test ≤ 0.05) with KEGG, Biocarta pathways and GO terms. Again, most of these modules were found to contain well known GBM driver genes i.e. TP53, RB1, PTEN, EGFR, SKP2, NF1, CDK4, TEP1, GRM3, CDKN2A etc. and also recovered 147 out of 457 genes in Cancer Gene Census. Our proposed methods for integrating copy number and gene expression suggest that gene-gene relationship (direct and indirect) value can be useful along with PPI to identify functional modules in biological network. Our proposed module detection algorithm performed better than hierarchical clustering in terms of both pathway and cancer gene set enrichments. Moreover, existence of all three types of relations in found modules can be beneficial for better explanation of cancer related activities with greater insights.
Background: Initial success of inhibitors targeting oncogenes is often followed by tumor relapse due to acquired resistance. In addition to mutations in targeted oncogenes, signaling cross-talks among pathways play a vital role in such... more
Background: Initial success of inhibitors targeting oncogenes is often followed by tumor relapse due to acquired resistance. In addition to mutations in targeted oncogenes, signaling cross-talks among pathways play a vital role in such drug inefficacy. These include activation of compensatory pathways by other receptor tyrosine kinases, and altered activity of key effectors in cell survival and growthassociated pathways by other signaling pathways.
- by A. K. M. Azad
- •
Background: With an increasing number of plant genome sequences, it has become important to develop a robust computational method for detecting plant promoters. Although a wide variety of programs are currently available, prediction... more
Background: With an increasing number of plant genome sequences, it has become important to develop a robust computational method for detecting plant promoters. Although a wide variety of programs are currently available, prediction accuracy of these still requires further improvement. The limitations of these methods can be addressed by selecting appropriate features for distinguishing promoters and non-promoters.
Recently, computational approaches integrating copy number aberrations (CNAs) and gene expression (GE) have been extensively studied to identify cancer-related genes and pathways. In this work, we integrate these two data sets with... more
Recently, computational approaches integrating copy number aberrations (CNAs) and gene expression (GE) have been extensively studied to identify cancer-related genes and pathways. In this work, we integrate these two data sets with protein-protein interaction (PPI) information to find cancer-related functional modules. To integrate CNA and GE data, we first built a gene-gene relationship network from a set of seed genes by enumerating all types of pairwise correlations, e.g. GE-GE, CNA-GE, and CNA-CNA, over multiple patients. Next, we propose a voting-based cancer module identification algorithm by combining topological and data-driven properties (VToD algorithm) by using the gene-gene relationship network as a source of data-driven information, and the PPI data as topological information. We applied the VToD algorithm to 266 glioblastoma multiforme (GBM) and 96 ovarian carcinoma (OVC) samples that have both expression and copy number measurements, and identified 22 GBM modules and 23 OVC modules. Among 22 GBM modules, 15, 12, and 20 modules were significantly enriched with cancer-related KEGG, BioCarta pathways, and GO terms, respectively. Among 23 OVC modules, 19, 18, and 23 modules were significantly enriched with cancer-related KEGG, BioCarta pathways, and GO terms, respectively. Similarly, we also observed that 9 and 2 GBM modules and 15 and 18 OVC modules were enriched with cancer gene census (CGC) and specific cancer driver genes, respectively. Our proposed module-detection algorithm significantly outperformed other existing methods in terms of both functional and cancer gene set enrichments. Most of the cancer-related pathways from both cancer data sets found in our algorithm contained more than two types of gene-gene relationships, showing strong positive correlations between the number of different types of relationship and CGC enrichment q-values (0.64 for GBM and 0.49 for OVC). This study suggests that identified modules containing both expression changes and CNAs can explain cancer-related activities with greater insights.
Small molecule inhibitors, such as lapatinib, are effective against breast cancer in clinical trials , but tumor cells ultimately acquire resistance to the drug. Maintaining sensitization to drug action is essential for durable growth... more
Small molecule inhibitors, such as lapatinib, are effective against breast cancer in clinical trials , but tumor cells ultimately acquire resistance to the drug. Maintaining sensitization to drug action is essential for durable growth inhibition. Recently, adaptive reprogramming of signaling circuitry has been identified as a major cause of acquired resistance. We developed a computational framework using a Bayesian statistical approach to model signal rewiring in acquired resistance. We used the p 1-model to infer potential aberrant gene-pairs with differential posterior probabilities of appearing in resistant-vs-parental networks. Results were obtained using matched gene expression profiles under resistant and parental conditions. Using two lapatinib-treated ErbB2-positive breast cancer cell-lines: SKBR3 and BT474, our method identified similar dysregulated signaling pathways including EGFR-related pathways as well as other receptor-related pathways, many of which were reported previously as compensatory pathways of EGFR-inhibition via signaling cross-talk. A manual literature survey provided strong evidence that aberrant signaling activities in dysregulated pathways are closely related to acquired resistance in EGFR tyrosine kinase inhibitors. Our approach predicted literature-supported dysregulated pathways complementary to both node-centric (SPIA, DAVID, and GATHER) and edge-centric (ESEA and PAGI) methods. Moreover, by proposing a novel pattern of aberrant signaling called V-structures, we observed that genes were dysregulated in resistant-vs-sensitive conditions when they were involved in the switch of dependencies from targeted to bypass signaling events. A literature survey of some important V-structures suggested they play a role in breast cancer metasta-sis and/or acquired resistance to EGFR-TKIs, where the mRNA changes of TGFBR2, LEF1 and TP53 in resistant-vs-sensitive conditions were related to the dependency switch from targeted to bypass signaling links. Our results suggest many signaling pathway structures are compromised in acquired resistance, and V-structures of aberrant signaling within/ among those pathways may provide further insights into the bypass mechanism of targeted inhibition.
Data-driven models of signalling networks are becoming increasingly important in systems biology in order to reflect the dynamic patterns of signalling activities in a context-specific manner. State-of-the-art approaches for categorising... more
Data-driven models of signalling networks are becoming increasingly important in systems biology in order to reflect the dynamic patterns of signalling activities in a context-specific manner. State-of-the-art approaches for categorising and detecting signalling cross-talks may not be suitable for such models since they rely on static topologies of cell signalling networks and prior biological knowledge. In this chapter, we review state-of-the-art approaches that categorise all possible cross-talks in signalling networks and propose a novel categorisation specific to data-driven network models. Considering such models as undirected networks, we propose two categories of signalling cross-talks between any two given signalling pathways. In a Type-I cross-talk, a signalling link {g i ,g j } connects two signalling pathways, where g i and g j are signalling nodes that belong to two distinct pathways. In a Type-II cross-talk, two signalling links {g i ,g j } and {g j ,g k } meet at the intersection of two signalling pathways at a shared signalling node g j. We compared our categorisation approach with others and found that all the types of cross-talks defined by those approaches can be mapped to Type-I and Type-II cross-talks when underlying signalling activities are considered as non-causal relationships. Next, we provided a simple but intuitive algorithm called XDaMoSiN (cross-talks in data-driven models of signalling networks) to detect both Type-I and Type-II cross-talks between any two given signalling pathways in a data-driven network model. By detecting cross-talks in such network models, our approach can be used to analyse and decipher latent mechanisms of various cell phenotypes, such as cancer or acquired drug resistance, that may evolve due to the highly adaptable and dynamic nature of signal transduction networks.
The availability of multiple heterogeneous high-throughput datasets provides an enabling resource for cancer systems biology. Types of data include: Gene expression (GE), copy number aberration (CNA), miRNA expression, methylation, and... more
The availability of multiple heterogeneous high-throughput datasets provides an enabling resource for cancer systems biology. Types of data include: Gene expression (GE), copy number aberration (CNA), miRNA expression, methylation, and protein–protein Interactions (PPI). One important problem that can potentially be solved using such data is to determine which of the possible pair-wise interactions among genes contributes to a range of cancer-related events, from tumorigenesis to metastasis. It has been shown by various studies that applying integrated knowledge from multi-omics datasets elucidates such complex phenomena with higher statistical significance than using a single type of dataset individually. However, computational methods for processing multiple data types simultaneously are needed. This chapter reviews some of the computational methods that use integrated approaches to find cancer-related modules.
Background: Initial success of inhibitors targeting oncogenes is often followed by tumor relapse due to acquired resistance. In addition to mutations in targeted oncogenes, signaling cross-talks among pathways play a vital role in such... more
Background: Initial success of inhibitors targeting oncogenes is often followed by tumor relapse due to acquired resistance. In addition to mutations in targeted oncogenes, signaling cross-talks among pathways play a vital role in such drug inefficacy. These include activation of compensatory pathways and altered activities of key effectors in other cell survival and growth-associated pathways. Results: We propose a computational framework using Bayesian modeling to systematically characterize potential cross-talks among breast cancer signaling pathways. We employed a fully Bayesian approach known as the p 1-model to infer posterior probabilities of gene-pairs in networks derived from the gene expression datasets of ErbB2-positive breast cancer cell-lines (parental, lapatinib-sensitive cell-line SKBR3 and the lapatinib-resistant cell-line SKBR3-R, derived from SKBR3). Using this computational framework, we searched for cross-talks between EGFR/ErbB and other signaling pathways from Reactome, KEGG and WikiPathway databases that contribute to lapatinib resistance. We identified 104, 188 and 299 gene-pairs as putative drug-resistant cross-talks, respectively, each comprised of a gene in the EGFR/ErbB signaling pathway and a gene from another signaling pathway, that appear to be interacting in resistant cells but not in parental cells. In 168 of these (distinct) gene-pairs, both of the interacting partners are up-regulated in resistant conditions relative to parental conditions. These gene-pairs are prime candidates for novel cross-talks contributing to lapatinib resistance. They associate EGFR/ErbB signaling with six other signaling pathways: Notch, Wnt, GPCR, hedgehog, insulin receptor/IGF1R and TGF-β receptor signaling. We conducted a literature survey to validate these cross-talks, and found evidence supporting a role for many of them in contributing to drug resistance. We also analyzed an independent study of lapatinib resistance in the BT474 breast cancer cell-line and found the same signaling pathways making cross-talks with the EGFR/ErbB signaling pathway as in the primary dataset. Conclusions: Our results indicate that the activation of compensatory pathways can potentially cause up-regulation of EGFR/ErbB pathway genes (counteracting the inhibiting effect of lapatinib) via signaling cross-talk. Thus, the up-regulated members of these compensatory pathways along with the members of the EGFR/ErbB signaling pathway are interesting as potential targets for designing novel anti-cancer therapeutics.
Getting stuck in local maxima is a problem that arises while learning Bayesian network (BN) structures. In this paper, we studied a recently proposed Markov chain Monte Carlo (MCMC) sampler, called the Neighbourhood sampler (NS), and... more
Getting stuck in local maxima is a problem that arises while learning Bayesian network (BN) structures. In this paper, we studied a recently proposed Markov chain Monte Carlo (MCMC) sampler, called the Neighbourhood sampler (NS), and examined how efficiently it can sample BNs when local maxima are present. We assume that a posterior distribution f (N, E|D) has been defined, where D represents data relevant to the inference, N and E are the set of nodes and directed edges, respectively. We illustrate the new approach by sampling from such a distribution, and inferring some BNs. The simulations conducted in this paper show that the new learning approach substantially avoids getting stuck in local modes of the distribution, and achieves a more rapid rate of convergence, compared to the MCMC Metropolis-Hastings sampler and other heuristic algorithms.
- by A. K. M. Azad and +1
- •
- Bayesian Networks
Many applications in graph analysis require a space of graphs or networks to be sampled uniformly at random. For example, one may need to efficiently draw a small representative sample of graphs from a particular large target space. We... more
Many applications in graph analysis require a space of graphs or networks to be sampled uniformly at random. For example, one may need to efficiently draw a small representative sample of graphs from a particular large target space. We assume that a uniform distribution f (N, E) = 1/|X | has been defined, where N is a set of nodes, E is a set of edges, (N, E) is a graph in the target space X and |X | is the (finite) total number of graphs in the target space. We propose a new approach to sample graphs at random from such a distribution. The new approach uses a Markov chain Monte Carlo method called the Neighbourhood Sampler. We validate the new sampling technique by simulating from feasible spaces of directed or undirected graphs, and compare its computational efficiency with the conventional Metropolis-Hastings Sampler. The simulation results indicate efficient uniform sampling of the target spaces, and more rapid rate of convergence than Metropolis-Hastings Sampler.
- by A. K. M. Azad and +1
- •
- Bayesian Networks
Background: Initial success of inhibitors targeting oncogenes is often followed by tumor relapse due to acquired resistance. In addition to mutations in targeted oncogenes, signaling cross-talks among pathways play a vital role in such... more
Background: Initial success of inhibitors targeting oncogenes is often followed by tumor relapse due to acquired resistance. In addition to mutations in targeted oncogenes, signaling cross-talks among pathways play a vital role in such drug inefficacy. These include activation of compensatory pathways by other receptor tyrosine kinases, and altered activity of key effectors in cell survival and growthassociated pathways by other signaling pathways.
- by A. K. M. Azad and +1
- •
- Bioinformatics, Molecular Biology
Background: Small molecule inhibitors, such as lapatinib, are effective treatments for breast cancer. Lapatinib typically produces early clinical benefits, but after prolonged use, tumours develop acquired resistance (AR). Recently,... more
Background: Small molecule inhibitors, such as lapatinib, are effective treatments for breast cancer. Lapatinib typically produces early clinical benefits, but after prolonged use, tumours develop acquired resistance (AR). Recently, adaptive reprogramming of signaling circuitry was reported as a major cause of AR, hence maintaining sensitization of tumours to drug action is essential for durable growth inhibition.
- by A. K. M. Azad and +1
- •
- Acquired Drug Resistance