Papers by Raviprasad Aduri

arXiv (Cornell University), Nov 1, 2023
RNA protein Interactions (RPIs) play an important role in biological systems. Recently, we have e... more RNA protein Interactions (RPIs) play an important role in biological systems. Recently, we have enumerated the RPIs at residue level and have elucidated the minimum structural unit (MSU) in these interactions to be a stretch of five residues (Nucleotides/amino acids). Pseudouridine is the most frequent modification in RNA. The conversion of uridine to pseudouridine involves interactions between pseudourdine synthase and RNA. The existing models to predict the pseudouridine sites in a given RNA sequence mainly depend on user defined features such as mono and dinucleotide composition/propensities of RNA sequences. Predicting pseudouridine sites is a non-linear classification problem with limited data points. Deep Learning models are efficient discriminators when the data set size is reasonably large and fails when there is paucity in data (< 1000 samples). To mitigate this problem, we propose a Support Vector Machine (SVM) Kernel based on utility theory from Economics, and using data driven parameters (i.e. MSU) as features. For this purpose, we have used position-specific tri/quad/pentanucleotide composition/propensity (PSPC/PSPP) besides nucleotide and dineculeotide composition as features. SVMs are known to work well in small data regime and kernels in SVM are designed to classify non-linear data. The proposed model outperforms the existing state of the art models significantly (10% -15% on average). Keywords pseudouridine • RNA protein interactions • Utility Kernel (UK) • Small data Machine learning (ML)

Proceedings of the AAAI Conference on Artificial Intelligence
Large Language Models (LLMs) can be used as repositories of biological and chemical information t... more Large Language Models (LLMs) can be used as repositories of biological and chemical information to generate pharmacological lead compounds. However, for LLMs to focus on specific drug targets typically requires experimentation with progressively more refined prompts. Results thus become dependent not just on what is known about the target, but also on what is known about the prompt- engineering. In this paper, we separate the prompt into domain-constraints that can be written in a standard logical form and a simple text-based query. We investigate whether LLMs can be guided, not by refining prompts manually, but by refining the logical component automatically, keeping the query unchanged. We describe an iterative procedure LMLF (“Language Model with Logical Feedback”) in which the constraints are progressively refined using a logical notion of generalisation. On any iteration, newly generated instances are verified against the constraint, providing "logical-feedback" for t...
An <i>in silico</i> discovery of potential 3CL protease inhibitors of SARS-CoV-2 based upon inactivation of the cysteine 145-Histidine 41 catalytic dyad
Figshare, 2022

The physicochemical properties of a drug molecule determine its metabolism properties. There have... more The physicochemical properties of a drug molecule determine its metabolism properties. There have been hybrid quantum mechanics approaches with computer-aided drug design and recent supervised machine-learning approaches to predict these properties of small-molecule drugs. However, these methods are low in accuracy and computationally expensive. To get around this problem and improve the performance of a model that predicts the properties of drug molecules, we came up with a novel architecture that uses a "bond order matrix" and structural information to improve molecular graph representations and information in the molecule. Message-passing neural networks (MPNNs) are a framework used to learn local and global features from irregularly formed data invariant to permutations. We take advantage of MPNN architecture and introduce a "semi-master node," a unique way of representing the functional groups in a small molecule and aggregating features obtained from the functional groups, in anticipation of reverse engineering small molecules given the desired physicochemical properties. This novel architecture and molecule representation were evaluated on the QM9 dataset, which has 133,000 stable small organic molecules with nine heavy atoms (CONF) out of the GDB-17 chemical universe. The metric for evaluating the model's performance is DFT error, an estimated average error of the properties of each molecule. Our models have shown a performance gain of ~10%.

arXiv (Cornell University), Nov 20, 2023
Schizophrenia is a complicated mental illness characterized by a broad spectrum of symptoms affec... more Schizophrenia is a complicated mental illness characterized by a broad spectrum of symptoms affecting cognition, behavior, and emotion. The task of identifying reliable biomarkers to classify Schizophrenia accurately continues to be a challenge in the field of psychiatry. We investigate the temporal patterns within the motor activity data as a potential key to enhancing the categorization of individuals with Schizophrenia, using the dataset having motor activity recordings of 22 Schizophrenia patients and 32 control subjects. The dataset contains per-minute motor activity measurements collected for an average of 12.7 days in a row for each participant. We dissect each day into segments (Twelve, Eight, six, four, three, and two parts) and evaluate their impact on classification. We employ sixteen statistical features within these temporal segments and train them on Seven machine learning models to get deeper insights. LightGBM model outperforms the other six models. Our results indicate that the temporal segmentation significantly improves the classification, with AUC-ROC = 0.93, F1 score = 0.84( LightGBM-without any segmentation) and AUC-ROC = 0.98, F1 score = 0.93( LightGBM-with segmentation). Distinguishing between diurnal and nocturnal segments amplifies the differences between Schizophrenia patients and controls. However, further subdivisions into smaller time segments do not affect the AUC-ROC significantly. Morning, afternoon, evening, and night partitioning gives similar classification performance to day-night partitioning. These findings are valuable as they indicate that extensive temporal classification beyond distinguishing between day and night does not yield substantial results, offering an efficient approach for further classification, early diagnosis, and monitoring of Schizophrenia.

bioRxiv (Cold Spring Harbor Laboratory), Sep 16, 2023
Large Language Models (LLMs) can be used as repositories of biological and chemical information t... more Large Language Models (LLMs) can be used as repositories of biological and chemical information to generate pharmacological lead compounds. However, for LLMs to focus on specific drug targets typically require experimentation with progressively more refined prompts. Results thus become dependent not just on what is known about the target, but also on what is known about the prompt-engineering. In this paper, we separate the prompt into domain-constraints that can be written in a standard logical form, and a simple textbased query. We investigate whether LLMs can be guided, not by refining prompts manually, but by refining the the logical component automatically, keeping the query unchanged. We describe an iterative procedure LMLF ("Language Models with Logical Feedback") in which the constraints are progressively refined using a logical notion of generalisation. On any iteration, newly generated instances are verified against the constraint, providing "logical-feedback" for the next iteration's refinement of the constraints. We evaluate LMLF using two well-known targets (inhibition of the Janus Kinase 2; and Dopamine Receptor D2); and two different LLMs (GPT-3 and PaLM). We show that LMLF, starting with the same logical constraints and query text, can guide both LLMs to generate potential leads. We find: (a) Binding affinities of LMLF-generated molecules are skewed towards higher binding affinities than those from existing baselines; (b) LMLF results in generating molecules that are skewed towards higher binding affinities than without logical feedback; (c) Assessment by a computational chemist suggests that LMLF generated compounds may be novel inhibitors. These findings suggest that LLMs with logical feedback may provide a mechanism for generating new leads without requiring the domain-specialist to acquire sophisticated skills in prompt-engineering.
ARL15, a GTPase implicated in rheumatoid arthritis, potentially repositions its truncated N-terminus as a function of guanine nucleotide binding
International Journal of Biological Macromolecules, Dec 31, 2023

Global analysis of RNA–protein interactions in TNF‐α induced alternative splicing in metabolic disorders
FEBS Letters, Jan 24, 2021
In this report, using the database of RNA‐binding protein specificities (RBPDB) and our previousl... more In this report, using the database of RNA‐binding protein specificities (RBPDB) and our previously published RNA‐seq data, we analyzed the interactions between RNA and RNA‐binding proteins to decipher the role of alternative splicing in metabolic disorders induced by TNF‐α. We identified 13 395 unique RNA–RBP interactions, including 385 unique RNA motifs and 35 RBPs, some of which (including MBNL‐1 and 3, ZFP36, ZRANB2, and SNRPA) are transcriptionally regulated by TNF‐α. In addition to some previously reported RBPs, such as RBMX and HuR/ELAVL1, we found a few novel RBPs, such as ZRANB2 and SNRPA, to be involved in the regulation of metabolic syndrome‐associated genes that contain an enrichment of tetrameric RNA sequences (AUUU). Taken together, this study paves the way for novel RNA–protein interaction‐based therapeutics for treating metabolic syndromes.

Nucleic Acids Research, Dec 26, 2012
Human immunodeficiency virus genome dimerization is initiated through an RNA-RNA kissing interact... more Human immunodeficiency virus genome dimerization is initiated through an RNA-RNA kissing interaction formed via the dimerization initiation site (DIS) loop sequence, which has been proposed to be converted to a more thermodynamically stable linkage by the viral p7 form of the nucleocapsid protein (NC). Here, we systematically probed the role of specific amino acids of NCp7 in its chaperone activity in the DIS conversion using 2-aminopurine (2-AP) fluorescence and nuclear magnetic resonance spectroscopy. Through comparative analysis of NCp7 mutants, the presence of positively charged residues in the N-terminus was found to be essential for both helix destabilization and strand transfer functions. It was also observed that the presence and type of the Zn finger is important for NCp7 chaperone activity, but not the order of the Zn fingers. Swapping single aromatic residues between Zn fingers had a significant effect on NCp7 activity; however, these mutants did not exhibit the same activity as mutants in which the order of the Zn fingers was changed, indicating a functional role for other flanking residues. RNA chaperone activity is further correlated with NCp7 structure and interaction with RNA through comparative analysis of nuclear magnetic resonance spectra of NCp7 variants, and complexes of these proteins with the DIS dimer.
Synthesis, Optical Properties and DNA‐Binding Behavior of a Quinoxaline Ring‐Fused π‐Elongated Chlorin – Efforts Towards Preparation of Long Wavelength Absorbing Porphyrinoids
ChemistrySelect, May 3, 2022
A Biophysical Investigation of DNA-Binding Interactions of Push-Pull Dibenzodioxins and Implications for <i>in Vitro</i> anti-Cancer Activity
Polycyclic Aromatic Compounds, May 27, 2022

International Journal of Infectious Diseases, Apr 1, 2016
Background: RNA-RNA interactions, central to many biological processes, are often mediated by var... more Background: RNA-RNA interactions, central to many biological processes, are often mediated by various secondary structural elements of the RNA. In the context of single-stranded RNA viruses such as Dengue virus (DENV) and other flaviviruses, such RNA-RNA interactions may be the key to switching between translation and replication. DENV (serotypes 1-4) is the causative agent of Dengue fever (DF), Dengue hemorrhagic fever (DHF), and Dengue shock syndrome (DSS). Each of the DENV serotypes is further classified into several genotypes having varying degrees of pathogenicity and virulence. One of the conserved features of DENV and other flaviviruses is the presence of complementary sequences in the 5'and 3'-untranslated regions (UTRs) that participate in long-range RNA-RNA interactions leading to the circularization of the genome. We hypothesized that the differences in secondary structures (and the corresponding three-dimensional orientation) of the 5' and 3' UTRs of the DENV RNA genome may underpin differences in virulence and pathogenicity of the different genotypes. Currently, there is no global scale analysis of DENV genomes correlating the RNA secondary structure with pathogenicity and virulence. Methods & Materials: Towards this end, we have curated the NCBI database for full length genomes of DENV and classified them according to their respective genotypes. Using mFOLD, we derived the putative RNA secondary structures of the 5'-end of the RNA genome (encompassing the 5' UTR, the capsid hairpin (cHP), and the 5'-cyclization sequence (5' CS)) and the final 106 nucleotides of the 3'-UTR (comprising the 3'-SL and 3'-CS). Comparative analysis of the secondary structure elements of different genotypes was done using in-house software packages. We have also performed comparative analysis of these RNA structural elements across the serotypes. Results: Our work has led to the observation of subtle but significant RNA secondary structure variations among not only the serotypes but within genotypes of a given serotype. Conclusion: By carrying out an extensive global analysis of DENV genomic RNA secondary structure we were able to correlate serotype and genotype specific RNA secondary structural elements and their possible role in pathogenicity and virulence.
A Biophysical Investigation of DNA-Binding Interactions of Push-Pull Dibenzodioxins and Implications for in Vitro anti-Cancer Activity
Polycyclic Aromatic Compounds

Global analysis of RNA–protein interactions in TNF‐α induced alternative splicing in metabolic disorders
FEBS Letters
In this report, using the database of RNA‐binding protein specificities (RBPDB) and our previousl... more In this report, using the database of RNA‐binding protein specificities (RBPDB) and our previously published RNA‐seq data, we analyzed the interactions between RNA and RNA‐binding proteins to decipher the role of alternative splicing in metabolic disorders induced by TNF‐α. We identified 13 395 unique RNA–RBP interactions, including 385 unique RNA motifs and 35 RBPs, some of which (including MBNL‐1 and 3, ZFP36, ZRANB2, and SNRPA) are transcriptionally regulated by TNF‐α. In addition to some previously reported RBPs, such as RBMX and HuR/ELAVL1, we found a few novel RBPs, such as ZRANB2 and SNRPA, to be involved in the regulation of metabolic syndrome‐associated genes that contain an enrichment of tetrameric RNA sequences (AUUU). Taken together, this study paves the way for novel RNA–protein interaction‐based therapeutics for treating metabolic syndromes.
Dengue outbreak and severity prediction: current methods and the future scope
VirusDisease
Synthesis, Optical Properties and DNA‐Binding Behavior of a Quinoxaline Ring‐Fused π‐Elongated Chlorin – Efforts Towards Preparation of Long Wavelength Absorbing Porphyrinoids
ChemistrySelect

Effect of ACGT motif in spatiotemporal regulation of AtAVT6D, which improves tolerance to osmotic stress and nitrogen-starvation
Plant Molecular Biology, 2022
KEY MESSAGE Plasma membrane-localized AtAVT6D importing aspartic acid can be targeted to develop ... more KEY MESSAGE Plasma membrane-localized AtAVT6D importing aspartic acid can be targeted to develop plants with enhanced osmotic and nitrogen-starvation tolerance. AtAVT6D promoter can be exploited as a stress-inducible promoter for genetic improvements to raise stress-resilient crops. The AtAVT6 family of amino acid transporters in Arabidopsis thaliana has been predicted to export amino acids like aspartate and glutamate. However, the functional characterization of these amino acid transporters in plants remains unexplored. The present study investigates the expression patterns of AtAVT6 genes in different tissues and under various abiotic stress conditions using quantitative Real-time PCR. The expression analysis demonstrated that the member AtAVT6D was significantly induced in response to phytohormone ABA and stresses like osmotic and drought. The tissue-specific expression analysis showed that AtAVT6D was strongly expressed in the siliques. Taking together these results, we can speculate that AtAVT6D might play a vital role in silique development and abiotic stress tolerance. Further, subcellular localization study showed AtAVT6D was localized to the plasma membrane. The heterologous expression of AtAVT6D in yeast cells conferred significant tolerance to nitrogen-deficient and osmotic stress conditions. The Xenopus oocyte studies revealed that AtAVT6D is involved in the uptake of Aspartic acid. While overexpression of AtAVT6D resulted in smaller siliques in Arabidopsis thaliana. Additionally, transient expression studies were performed with the full-length AtAVT6D promoter and its deletion constructs to study the effect of ACGT-N24-ACGT motifs on the reporter gene expression in response to abiotic stresses and ABA treatment. The fluorometric GUS analyses revealed that the promoter deletion construct-2 (Pro.C2) possessing a single copy of ACGT-N24-ACGT motif directed the strongest GUS expression under all the abiotic conditions tested. These results suggest that Pro.C2 can be used as a stress-inducible promoter to drive a significant transgene expression.

An in silico discovery of potential 3CL protease inhibitors of SARS-CoV-2 based upon inactivation of the cysteine 145-Histidine 41 catalytic dyad
Journal of Biomolecular Structure and Dynamics, 2022
Coronavirus disease 19 (COVID19) is caused by severe acute respiratory syndrome coronavirus 2 (SA... more Coronavirus disease 19 (COVID19) is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Currently, several countries are at risk of the pandemic caused by this virus. In the absence of any vaccine or virus-specific antiviral treatments, the need is to fast track search for potential drug candidates to combat the virus. Though there are known drugs that are being repurposed to fight against the SARS-CoV-2, there is a requirement for the virus-specific drugs at the earliest. One of the main drug targets of SARS-CoV-2 is an essential non-structural protein, 3CL protease, critical for the life cycle of the virus. We have used molecular docking studies to screen a chemically diverse set of small molecules to identify potential drug candidates to target this protein. Of the 22,630 molecules from varied small molecule libraries, based on the binding affinities and physicochemical properties, we finalized 30 molecules to be potential drug candidates. Eight of these molecules bind in a manner allowing for the scope of a nearly orthogonal backside nucleophilic attack on their suitably placed electrophilic carbonyl groups by the thiol group of cysteine residue 145, while remaining inside a 4 Ǻ distance range. It is interesting since carbonyl groups are known to be attacked in a similar fashion by external nucleophiles and can be relevant when considering these molecules as potential mechanism-based irreversible inhibitors of the 3CLPro. Further, ADMET analysis and Molecular dynamics simulations and available bioactive assays led to the identification of three molecules with high potential to be explored as drug candidates/lead molecules to target 3CLPro of SARS-CoV-2.Communicated by Ramaswamy H. Sarma.
Uploads
Papers by Raviprasad Aduri