Papers by Vanathi Gopalakrishnan
White Rose Research Online (University of Leeds), Aug 9, 2016
This is a repository copy of Regulation of Inflammatory and anti-apoptotic responses through the ... more This is a repository copy of Regulation of Inflammatory and anti-apoptotic responses through the IL-1RI/TILRR complex.

This chapter provides a perspective on 3 important collaborative areas in systems biology researc... more This chapter provides a perspective on 3 important collaborative areas in systems biology research. These areas represent biological problems of clinical significance. The first area deals with macromolecular crystallization, which is a crucial step in protein structure determination. The second area deals with proteomic biomarker discovery from high-throughput mass spectral technologies; while the third area is protein structure prediction and complex fold recognition from sequence and prior knowledge of structure properties. For each area, successful case studies are revisited from the perspective of computer-aided knowledge discovery using machine learning and statistical methods. Information about protein sequence, structure, and function is slowly accumulating in standardized forms within databases. Methods are needed to maximize the use of this prior information for prediction and analysis purposes. This chapter provides insights into such methods by which available information in existing databases can be processed and combined with systems biology expertise to expedite biomedical discoveries.

PLOS ONE, Aug 5, 2019
Finding optimal blood pressure (BP) target and BP treatment after acute ischemic or hemorrhagic s... more Finding optimal blood pressure (BP) target and BP treatment after acute ischemic or hemorrhagic strokes is an area of controversy and a significant unmet need in the critical care of stroke victims. Numerous large prospective clinical trials have been done to address this question but have generated neutral or conflicting results. One major limitation that may have contributed to so many neutral or conflicting clinical trial results is the "one-size fit all" approach to BP targets, while the optimal BP target likely varies between individuals. We address this problem with the Acute Intervention Model of Blood Pressure (AIM-BP) framework: an individualized, human interpretable model of BP and its control in the acute care setting. The framework consists of two components: one, a model of BP homeostasis and the various effects that perturb it; and two, a parameter estimator that can learn clinically important model parameters on a patient by patient basis. By estimating the parameters of the AIM-BP model for a given patient, the effectiveness of antihypertensive medication can be quantified separately from the patient's spontaneous BP trends. We hypothesize that the AIM-BP is a sufficient framework for estimating parameters of a homeostasis perturbation model of a stroke patient's BP time course and the AIM-BP parameter estimator can do so as accurately and consistently as a state-of-the-art maximum likelihood estimation method. We demonstrate that this is the case in a proof of concept of the AIM-BP framework, using simulated clinical scenarios modeled on stroke patients from real world intensive care datasets.
Parallel experiment planning: macromolecular crystallization case study

Computer Aided Knowledge Discovery in Biomedicine
IGI Global eBooks, Jan 18, 2011
This chapter provides a perspective on 3 important collaborative areas in systems biology researc... more This chapter provides a perspective on 3 important collaborative areas in systems biology research. These areas represent biological problems of clinical significance. The first area deals with macromolecular crystallization, which is a crucial step in protein structure determination. The second area deals with proteomic biomarker discovery from high-throughput mass spectral technologies; while the third area is protein structure prediction and complex fold recognition from sequence and prior knowledge of structure properties. For each area, successful case studies are revisited from the perspective of computer- aided knowledge discovery using machine learning and statistical methods. Information about protein sequence, structure, and function is slowly accumulating in standardized forms within databases. Methods are needed to maximize the use of this prior information for prediction and analysis purposes. This chapter provides insights into such methods by which available information in existing databases can be processed and combined with systems biology expertise to expedite biomedical discoveries.
Challenges and opportunities for omics-based precision medicine in chronic low back pain
European Spine Journal, Dec 24, 2022

International Archives of Occupational and Environmental Health, May 12, 2022
Purpose Exposures related to beryllium (Be) are an enduring concern among workers in the nuclear ... more Purpose Exposures related to beryllium (Be) are an enduring concern among workers in the nuclear weapons and other high-tech industries, calling for regular and rigorous biological monitoring. Conventional biomonitoring of Be in urine is not informative of cumulative exposure nor health outcomes. Biomarkers of exposure to Be based on non-invasive biomonitoring could help refine disease risk assessment. In a cohort of workers with Be exposure, we employed blood plasma extracellular vesicles (EVs) to discover novel biomarkers of exposure to Be. Methods EVs were isolated from plasma using size-exclusion chromatography and subjected to mass spectrometry-based proteomics. A protein-based classifier was developed using LASSO regression and validated by ELISA. Results We discovered a dual biomarker signature comprising zymogen granule protein 16B and putative protein FAM10A4 that differentiated between Be-exposed and-unexposed subjects. ELISA-based quantification of the biomarkers in an independent cohort of samples confirmed higher expression of the signature in the Be-exposed group, displaying high predictive accuracy (AUROC = 0.919). Furthermore, the biomarkers efficiently discriminated high-and low-exposure groups (AUROC = 0.749). Conclusions This is the first report of EV biomarkers associated with Be exposure and exposure levels. The biomarkers could be implemented in resource-limited settings for Be exposure assessment.

World Journal of Clinical Oncology, 2018
AIM To develop a framework to incorporate background domain knowledge into classification rule le... more AIM To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine. METHODS Bayesian rule learning (BRL) is a rule-based classifier that uses a greedy best-first search over a space of Bayesian belief-networks (BN) to find the optimal BN to explain the input dataset, and then infers classification rules from this BN. BRL uses a Bayesian score to evaluate the quality of BNs. In this paper, we extended the Bayesian score to include informative structure priors, which encodes our prior domain knowledge about the dataset. We call this extension of BRL as BRLp. The structure prior has a λ hyperparameter that allows the user to tune the degree of incorporation of the prior knowledge in the model learning process. We studied the effect of λ on model learning using a simulated dataset and a real-world lung cancer prognostic biomarker dataset, by measuring the degree of incorporation of our specified prior knowledge. We also monitored its effect on the model predictive performance. Finally, we compared BRLp to other stateof-the-art classifiers commonly used in biomedicine. RESULTS We evaluated the degree of incorporation of prior knowledge into BRLp, with simulated data by measuring the Graph Edit Distance between the true datagenerating model and the model learned by BRLp. We specified the true model using informative structure

Background: Ongoing molecular profiling studies enabled by advances in biomedical technologies ar... more Background: Ongoing molecular profiling studies enabled by advances in biomedical technologies are producing vast amounts of ‘omic’ data for early detection, monitoring, and prognosis of diverse diseases. A major common limitation is the scarcity of biological samples, necessitating integrative modeling frameworks that can make optimal use of available data for disease classification tasks. Related data sets are often available from different studies, but may have been generated using different technology platforms. Thus, there is a critical need for flexible modeling methods that can handle data from diverse sources to facilitate the discovery of robust biomarkers that underlie disease regulatory processes. Results: In this paper, we introduce a novel framework called Knowledge Augmented Rule Learning (KARL), which incorporates two sources of knowledge, domain, and data, for pattern discovery from small and high-dimensional datasets, such as transcriptomic data. We propose KARL as ...

1) Introduction: Brain parcellation is an important processing step in the analysis of structural... more 1) Introduction: Brain parcellation is an important processing step in the analysis of structural brain MRI. Existing software implementations are optimized for fully developed adult brains, and provide inadequate results when applied to neonatal brain imaging. 2) Methods: We developed a semi-automated pipeline, NeBSS, for extracting 50 discrete brain structures from neonatal brain MRI, using an atlas registration method that leverages the existing ALBERT neonatal atlas 3) Results: We demonstrate a simple linear workflow for neonatal brain parcellation. NeBSS is robust to variation in imaging acquisition protocol and magnet field strength. 4) Conclusion: NeBSS is a robust pipeline capable of parcellating neonatal brain MRIs using a simple processing workflow. NeBSS fills a need in clinical translational research in neonatal imaging, where existing automated or semi-automated implementations are too rigid to be successfully applied to multi-center neuroprotection studies and clinical...

A Simple Hidden Markov Model Could Prevent Physician Error in Failure To Diagnose Infectious Mononucleosis
Infectious mononucleosis (Mono) is mostly caused by the Epstein-Barr virus (EBV), and can spread ... more Infectious mononucleosis (Mono) is mostly caused by the Epstein-Barr virus (EBV), and can spread through infected people sharing food and drinks with others. Once this virus gets into your system, it is there to stay. The virus can get activated when a person has low immunity and can cause major complications. Furthermore, if physicians miss the diagnosis of this disease, and prescribe penicillin-based antibiotics, it can cause severe rash and adverse reactions that compromise patient safety. This paper develops a simple Hidden Markov Model using which a Viterbi algorithm provides the maximum a posteriori probability estimate for the most likely hidden state path, given a sequence of symptoms arising as observations from a patient with hidden EBV positive or negative states. Apart from bringing awareness to help reduce missed diagnoses and subsequent adverse events, this work provides a tool for health care systems to better incorporate prompts during electronic medical record (EMR)...

Journal of Biomedical Informatics
Modeling factors influencing disease phenotypes, from biomarker profiling study datasets, is a cr... more Modeling factors influencing disease phenotypes, from biomarker profiling study datasets, is a critical task in biomedicine. Such datasets are typically generated from high-throughput 'omic' technologies, which help examine disease mechanisms at an unprecedented resolution. These datasets are challenging because they are high-dimensional. The disease mechanisms they study are also complex because many diseases are multifactorial, resulting from the collective activity of several factors, each with a small effect. Bayesian rule learning (BRL) is a rule model inferred from learning Bayesian networks from data, and has been shown to be effective in modeling high-dimensional datasets. However, BRL is not efficient at modeling multifactorial diseases since it suffers from data fragmentation during learning. In this paper, we overcome this limitation by implementing and evaluating three types of ensemble model combination strategies with BRL-uniform combination (UC; same as Bagging), Bayesian model averaging (BMA), and Bayesian model combination (BMC)-collectively called Ensemble Bayesian Rule Learning (EBRL). We also introduce a novel method to visualize EBRL models, called the Bayesian Rule Ensemble Visualizing tool (BREVity), which helps extract interpret the most important rule patterns guiding the predictions made by the ensemble model. Our results using twenty-five public, high-dimensional, gene expression datasets of multifactorial diseases, suggest that, both EBRL models using UC and BMC achieve better predictive performance than BMA and other classic machine learning methods. Furthermore, BMC is found to be more reliable than UC, when the ensemble includes sub-optimal models resulting from the stochasticity of the model search process. Together, EBRL and BREVity provides researchers a promising and novel tool for modeling multifactorial diseases from high-dimensional datasets that leverages strengths of ensemble methods for predictive performance, while also providing interpretable explanations for its predictions. associations. This increases the chance of making false positive discoveries, where spurious factors may be found to be associated to the target variable at random. Moreover, in multifactorial diseases, many hundreds or thousands of factors may be associated with the disease outcome that further increases the chance of selecting false positive
In this paper, we present a new algorithm for parallel and anti-parallel beta-sheet prediction us... more In this paper, we present a new algorithm for parallel and anti-parallel beta-sheet prediction using conditional random fields. In recent years, various approaches have been proposed to capture the long-range interactions of beta-sheets. However, most of them are not very successful: either the learning models are not general enough to capture the non-local information, or the features they used do not contain the information, for example, the window based profiles. Our new method has the advantages over previous methods in two aspects: (1) It takes into account both the local information and long-range interaction information (2) The condition random fields are powerful models that are able to capture long-range interaction features. The experimental results show that our algorithm performs significantly better than the state-of-art secondary structure prediction methods.
In this paper, we present a new algorithm for parallel and anti-parallel beta-sheet prediction us... more In this paper, we present a new algorithm for parallel and anti-parallel beta-sheet prediction using conditional random fields. In recent years, various approaches have been proposed to capture the long-range interactions of beta-sheets. However, most of them are not very successful: either the learning models are not general enough to capture the non-local information, or the features they used do not contain the information, for example, the window based profiles. Our new method has the advantages over previous methods in two aspects: (1) It takes into account both the local information and long-range interaction information (2) The condition random fields are powerful models that are able to capture long-range interaction features. The experimental results show that our algorithm performs significantly better than the state-of-art secondary structure prediction methods.

BMC Cancer, 2016
Background: Adenocarcinoma (ADC) and squamous cell carcinoma (SCC) are the most prevalent histolo... more Background: Adenocarcinoma (ADC) and squamous cell carcinoma (SCC) are the most prevalent histological types among lung cancers. Distinguishing between these subtypes is critically important because they have different implications for prognosis and treatment. Normally, histopathological analyses are used to distinguish between the two, where the tissue samples are collected based on small endoscopic samples or needle aspirations. However, the lack of cell architecture in these small tissue samples hampers the process of distinguishing between the two subtypes. Molecular profiling can also be used to discriminate between the two lung cancer subtypes, on condition that the biopsy is composed of at least 50 % of tumor cells. However, for some cases, the tissue composition of a biopsy might be a mix of tumor and tumor-adjacent histologically normal tissue (TAHN). When this happens, a new biopsy is required, with associated cost, risks and discomfort to the patient. To avoid this problem, we hypothesize that a computational method can distinguish between lung cancer subtypes given tumor and TAHN tissue. Methods: Using publicly available datasets for gene expression and DNA methylation, we applied four classification tasks, depending on the possible combinations of tumor and TAHN tissue. First, we used a feature selector (ReliefF/Limma) to select relevant variables, which were then used to build a simple naïve Bayes classification model. Then, we evaluated the classification performance of our models by measuring the area under the receiver operating characteristic curve (AUC). Finally, we analyzed the relevance of the selected genes using hierarchical clustering and IPA® software for gene functional analysis. Results: All Bayesian models achieved high classification performance (AUC > 0.94), which were confirmed by hierarchical cluster analysis. From the genes selected, 25 (93 %) were found to be related to cancer (19 were associated with ADC or SCC), confirming the biological relevance of our method. Conclusions: The results from this study confirm that computational methods using tumor and TAHN tissue can serve as a prognostic tool for lung cancer subtype classification. Our study complements results from other studies where TAHN tissue has been used as prognostic tool for prostate cancer. The clinical implications of this finding could greatly benefit lung cancer patients.

Cardiovascular Magnetic Resonance Imaging (CMRI) has become a powerful popular non-invasive tool ... more Cardiovascular Magnetic Resonance Imaging (CMRI) has become a powerful popular non-invasive tool for detecting biomarkers of various types of subtle pediatric cardiomyopathies yielding BIG temporal, high-resolution data. The complexities associated with the annotation of images and extraction of markers, nec essitate the development of efficient workflows to acquire, manage and transform this data into actionable knowledge for patient care. We develop and test a novel framework called CMRI-BED for biomarker extraction and discovery from pediatric cardiac MRI data involving the use of a suite of tools for image processing, marker extraction and predictive modeling. We applied the workflow to obtain and analyze a small dataset containing CMRI-derived biomarkers for classifying positive versus negative findings of cardiomyopathy in children. Preliminary results show the feasibility of our framework for pro- cessing such data while also yielding actionable predictive classification rule...
Additional file 1: of On Predicting lung cancer subtypes using 'omic' data from tumor and tumor-adjacent histologically-normal tissue
Formatted TCGA dataset used in this study, along with sample IDs for classification task TAHNADC ... more Formatted TCGA dataset used in this study, along with sample IDs for classification task TAHNADC vs.TumorADC in gene expression. (CSV 5182 kb)
Additional file 4: of On Predicting lung cancer subtypes using 'omic' data from tumor and tumor-adjacent histologically-normal tissue
Appendix A shows the Cancer Genome Atlas annotations to identify the types of samples used in thi... more Appendix A shows the Cancer Genome Atlas annotations to identify the types of samples used in this study.Appendix B shows additional performance measures for the models described. (DOCX 106 kb)
Automatic annotation of protein motif function with Gene Ontology terms-0
<b>Copyright information:</b>Taken from "Automatic annotation of protein motif f... more <b>Copyright information:</b>Taken from "Automatic annotation of protein motif function with Gene Ontology terms"BMC Bioinformatics 2004;5():122-122.Published online 2 Sep 2004PMCID:PMC517493.Copyright © 2004 Lu et al; licensee BioMed Central Ltd.ecision of assignment is plotted vs M.I. cutoff value. The Pearson correlation coefficient between the precision and the cutoff is 0.837.
Uploads
Papers by Vanathi Gopalakrishnan