Academia.eduAcademia.edu

Gene Normalization

description10 papers
group1 follower
lightbulbAbout this topic
Gene normalization is the process of standardizing gene names and identifiers across different databases and studies to ensure consistency and accuracy in genomic research. This involves mapping various nomenclatures to a unified system, facilitating data integration, comparison, and interpretation in bioinformatics and molecular biology.
lightbulbAbout this topic
Gene normalization is the process of standardizing gene names and identifiers across different databases and studies to ensure consistency and accuracy in genomic research. This involves mapping various nomenclatures to a unified system, facilitating data integration, comparison, and interpretation in bioinformatics and molecular biology.

Key research themes

1. What are effective data normalization strategies for accurate microRNA and gene expression quantification in qPCR and RNA-seq experiments?

This theme addresses the critical challenge of data normalization in gene expression quantification techniques such as quantitative real-time PCR (qPCR) and RNA sequencing (RNA-seq), with a focus on microRNAs and mRNAs. Normalization is fundamental to correcting for technical variability (e.g., differing RNA input, sequencing depth, or batch effects) to ensure accurate, reproducible, and biologically meaningful expression measurements. The lack of consensus on optimal endogenous or exogenous reference genes and normalization procedures leads to variability and complicates cross-study comparison. The theme explores selecting reference genes with stable expression across conditions, normalization algorithms for RNA-seq counts, and new approaches integrating genomic information to improve normalization robustness.

Key finding: This comprehensive review highlights the lack of consensus on optimal normalization strategies for microRNA quantification using qPCR and microarrays. The authors analyze endogenous small RNAs commonly used as normalizers and... Read more
Key finding: This study systematically evaluates the expression stability of 12 previously recommended reference genes across two sub-clones of the MCF-7 breast cancer cell line over multiple passages, including under nutrient stress... Read more
Key finding: By combining RNA-seq transcriptomic datasets from diverse Chinese hamster ovary (CHO) cell lines and culture conditions with qPCR validation, the study identifies four mRNAs (Gnb1, Fkbp1a, Tmed2, and Mmadhc) exhibiting highly... Read more
Key finding: This paper systematically evaluates multiple RNA-seq read count normalization methods, including established approaches (DESeq median-of-ratios, TMM, Upper Quartile) and novel per-gene normalization after per-sample global... Read more
Key finding: Introducing a novel normalization method for RNA-seq data, this study integrates DNA copy number alteration (CNA) information to adjust gene expression measurements, recognizing that CNAs explain a significant fraction (~15%)... Read more

2. How can integrated bioinformatics frameworks and knowledgebases enhance gene normalization by providing standardized and context-aware reference gene annotations for RT-qPCR and gene set management?

Accurate gene normalization not only depends on the appropriate experimental design but also on the availability of standardized, well-curated reference gene annotations and gene sets that consider species, tissue specificity, developmental stages, and experimental conditions. This theme revolves around the development of community-curated databases and computational platforms that aggregate experimentally validated internal control genes and gene sets. Such resources enable reproducible normalization across diverse biological contexts, facilitating proper interpretation and cross-study comparisons.

Key finding: ICG provides a publicly editable wiki-based knowledgebase integrating over 750 experimentally validated internal control genes across 73 animal species, 115 plants, fungi, and bacteria. It includes detailed application... Read more
Key finding: MyGeneset.info offers integrated access to curated and user-submitted gene sets from multiple sources (e.g., Wikipathways, Reactome, GO) along with up-to-date gene annotations via APIs, supporting species across humans and... Read more

3. How do statistical machine learning and computational approaches contribute to addressing batch effects, gene clustering, and orthology-independent gene normalization in expression data?

Batch effects and heterogeneity in high-throughput gene expression data pose significant challenges for normalization and downstream analysis. Advanced computational approaches, such as artificial intelligence-based normalization, block mixture models for eQTL-driven gene clustering, and orthogonal shared basis factorization for cross-species expression comparison, enhance gene normalization by capturing underlying biological and technical structure without relying solely on physical gene homology or simplistic assumptions. These methods improve the accuracy of gene expression interpretation and facilitate comparative transcriptomic analyses.

Key finding: The authors introduce an artificial intelligence-driven normalization method aiming to reduce batch effects in transcriptome data without imposing assumptions on gene expression distribution. Unlike traditional normalization... Read more
Key finding: This study presents a Gaussian block mixture model integrating gene clustering, genetic mapping, and network reconstruction by simultaneously modeling genotype-specific gene expression clustering patterns. Applied to C.... Read more
Key finding: The paper introduces the orthogonal shared basis factorization (OSBF) method, a joint matrix factorization approach estimating a common expression subspace across species that captures conserved gene co-expression patterns... Read more
Key finding: Challenging the conventional wisdom that better batch normalization (BatchNorm) statistics arise from larger mini-batches, this study shows that GhostNorm, which normalizes smaller ‘ghost batches’ independently within... Read more

All papers in Gene Normalization

Background: We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers of the genes detected in full-text articles. For training, 32 fully and 500... more
Gene expression analysis is fundamental for understanding biological processes, and quantitative real-time PCR (qRT-PCR) has become a widely used method for validating the expressions. Proper normalization across multiple samples and... more
Studies on the expression of genes in different contexts are essential to our understanding of the functioning of organisms and their adaptations to the environment. Gene expression studies require steps of normalization, which are done... more
Background: We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers of the genes detected in full-text articles. For training, 32 fully and 500... more
Given the large amount of data stored in biological databases, the management of uncertainty and incompleteness in them is a non-trivial problem. To cope with the large amount of sequences being produced, a significant number of genes and... more
For our participation in the CDR task of BioCreative 5, we have adapted the Ontogene System and optimized it for disease recognition (DNER Task) and identification of chemical-disease relationships (CID Task). For the DNER Task we have... more
Text mining services are rapidly becoming a crucial component of various knowledge management pipelines, for example in the process of database curation, or for exploration and enrichment of biomedical data within the pharmaceutical... more
The BioCreative National Library of Medicine (NLM)-Chem track calls for a community effort to fine-tune automated recognition of chemical names in the biomedical literature. Chemicals are one of the most searched biomedical entities in... more
Background We prepared and evaluated training and test materials for an assessment of text mining methods in molecular biology. The goal of the assessment was to evaluate the ability of automated systems to generate a list of unique gene... more
Background The biological research literature is a major repository of knowledge. As the amount of literature increases, it will get harder to find the information of interest on a particular topic. There has been an increasing amount of... more
Biology has now become an information science, and researchers are increasingly dependent on expert-curated biological databases to organize the findings from the published literature. We report here on a series of experiments related to... more
Hepatitis C Virus (HCV) causes significant morbidity worldwide with restricted treatment options and lack of a universal cure which necessitate design of novel drugs. Researchers face an enormous growth of literature with very small... more
The automatic assignment of species information to the corresponding genes in a research article is a critically important step in the gene normalization task, whereby a gene mention is normalized and linked to a database record or an... more
Background: The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task,... more
Background: This article describes the approaches taken by the OntoGene group at the University of Zurich in dealing with two tasks of the BioCreative III competition: classification of articles which contain curatable proteinprotein... more
We present an approach towards the automatic detection of names of proteins, genes, species, etc. in biomedical literature and their grounding to widely accepted identifiers. The annotation is based on a large term list that contains the... more
We present an approach towards the automatic detection of names of proteins, genes, species, etc. in biomedical literature and their grounding to widely accepted identifiers. The annotation is based on a large term list that contains the... more
Background This article describes the approaches taken by the OntoGene group at the University of Zurich in dealing with two tasks of the BioCreative III competition: classification of articles which contain curatable protein-protein... more
The rapidly increasing number of available PubMed documents calls the need for an automatic approach in the identification and normalization of disease mentions in order to increase the precision and effectivity of information retrieval.... more
Several research results have shown that specifying the information about certain entities is the most common information demand of information retrieval users. The needs should be answered by returning specific entities, their properties... more
Recent advances in biological research reveal that the majority of the experiments strive for comprehensive exploration of the biological system rather than targeting specific biological entities. The qualitative and quantitative findings... more
This article describes a high-recall, high-precision approach for the extraction of biomedical entities from scientific articles. The approach uses a two-stage pipeline, combining a dictionary-based entity recognizer with a... more
The Second BioCreAtIvE Challenge provided an ideal opportunity to evaluate biomedical nlp techniques. Prior to the Challenge, an information extraction pipeline was developed to extract entities and relations relevant to the biomedical... more
Background: Identifying protein-protein interactions (PPIs) from literature is an important step in mining the function of individual proteins as well as their biological network. Since it is known that PPIs have distinctive patterns in... more
A considerable effort has been made to extract biological and chemical entities, as well as their relationships, from the scientific literature, either manually through traditional literature curation or by using information extraction... more
Gene mention normalization (GN) refers to the automated mapping of gene names to a unique identifier, such as an NCBI Entrez Gene ID. Such knowledge helps in indexing and retrieval, linkage to additional information (such as sequences),... more
This paper presents an approach towards high performance extraction of biomedical entities from the literature, which is built by combining a high recall dictionarybased technique with a high-precision machine learning filtering step. The... more
The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism... more
Background: With the biomedical literature continually expanding, searching PubMed for information about specific genes becomes increasingly difficult. Not only can thousands of results be returned, but gene name ambiguity leads to many... more
As suggested in recent studies, species recognition and disambiguation is one of the most critical and challenging steps in many downstream text-mining applications such as the gene normalization task and protein-protein interaction... more
Kinases are enzymes that mediate phosphate transfer. Extracting information on kinases from biomedical literature is an important task which has direct implications for applications such as drug design. In this work, we develop KinDER,... more
Background: The task of recognizing and identifying species names in biomedical literature has recently been regarded as critical for a number of applications in text and data mining, including gene name recognition, species-specific... more
Biological knowledge is represented in scientific literature that often describes the function of genes/proteins (bioentities) in terms of their interactions (biointeractions). Such bioentities are often related to biological concepts of... more
Evidence in support of relationships among biomedical entities, such as protein-protein interactions, can be gathered from a multiplicity of sources. The larger the pool of evidence, the more likely a given interaction can be considered... more
The evolving complexity of genome-scale experiments has increasingly centralized the role of a highly computable, accurate, and comprehensive resource spanning multiple biological scales and viewpoints. To provide a resource to meet this... more
This paper presents an approach towards high performance extraction of biomedical entities from the literature, which is built by combining a high recall dictionary-based technique with a high-precision machine learning filtering step.... more
Background: Reliable information extraction applications have been a long sought goal of the biomedical text mining community, a goal that if reached would provide valuable tools to benchside biologists in their increasingly difficult... more
Biomedical text mining methods and technologies have improved significantly in the last decade. Considerable efforts have been invested in understanding the main challenges of biomedical literature retrieval and extraction and proposing... more
Background The overall goal of the BioCreative Workshops is to promote the development of text mining and text processing tools which are useful to the communities of researchers and database curators in the biological sciences. To this... more
BACKGROUND: Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The... more
Background Within the emerging field of text mining and statistical natural language processing (NLP) applied to biomedical articles, a broad variety of techniques have been developed during the past years. Nevertheless, there is still a... more
Background: Within the emerging field of text mining and statistical natural language processing (NLP) applied to biomedical articles, a broad variety of techniques have been developed during the past years. Nevertheless, there is still a... more
Background: The overall goal of the BioCreative Workshops is to promote the development of text mining and text processing tools which are useful to the communities of researchers and database curators in the biological sciences. To this... more
An important research topic in Bioinformatics involves the exploration of vast amounts of biological and biomedical scientific literature (BioLiterature). Over the last few decades, text-mining systems have exploited this BioLiterature to... more
Background: The selection of relevant articles for curation, and linking those articles to experimental techniques confirming the findings became one of the primary subjects of the recent BioCreative III contest. The contest's... more
This article describes a high-recall, high-precision approach for the extraction of biomedical entities from scientific articles. The approach uses a two-stage pipeline, combining a dictionary-based entity recognizer with a... more
Background: The BioCreative challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. The biocurator community, as an active user of biomedical... more
This article describes a high-recall, high-precision approach for the extraction of biomedical entities from scientific articles. The approach uses a two-stage pipeline, combining a dictionary-based entity recognizer with a... more
Download research papers for free!