Use and misuse of the gene ontology annotations
2008, Nature Reviews Genetics
https://doi.org/10.1038/NRG2363Abstract
The accumulation of data produced by genome-scale research requires explicitly defined vocabularies to describe the biological attributes of genes in order to allow integration, retrieval and computation of the data 1 . Arguably, the most successful example of systematic description of biology is the Gene Ontology (GO) project 2 . GO is widely used in biological databases, annotation projects and computational analyses (there are 2,960 citations for GO in version 3.0 of the ISI Web of Knowledge) for annotating newly sequenced genomes 3 , text mining 4,5 , network modelling 6 and clinical applications 7 , among others. GO has two components: the ontologies themselves, which are the defined terms and the structured relationships between them (GO ontology); and the associations between gene products and the terms (GO annotations). GO provides both ontologies and annotations for three distinct areas of cell biology: molecular function, biological process, and cellular component or location.
References (44)
- Bard, J. B. & Rhee, S. Y. Ontologies in biology: design, applications and future challenges. Nature Rev. Genet. 5, 213-222 (2004). This paper provides a more detailed overview of types and uses of ontologies in biology, with an emphasis on GO.
- Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25, 25-29 (2000). This paper includes more details about the Gene Ontology.
- Adams, M. D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185-2195 (2000).
- Hirschman, L., Yeh, A., Blaschke, C. & Valencia, A. Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics 6 (Suppl. 1), S1 (2005).
- Camon, E. B. et al. An evaluation of GO annotation retrieval for BioCreAtIvE and GOA. BMC Bioinformatics 6 (Suppl. 1), S17 (2005).
- Liu, M. et al. Network-based analysis of affected biological processes in type 2 diabetes models. PLoS Genet. 3, e96 (2007).
- Dressman, H. K. et al. Gene expression signatures that predict radiation exposure in mice and humans. PLoS Med. 4, e106 (2007).
- The Gene Ontology Consortium. Creating the gene ontology resource: design and implementation.
- Genome Res. 11, 1425-1433 (2001). This paper describes in more detail how the GO ontology is built and maintained in more detail.
- Camon, E., Barrell, D., Lee, V., Dimmer, E. & Apweiler, R. The Gene Ontology Annotation (GOA) Database -an integrated resource of GO annotations to the UniProt Knowledgebase. In Silico Biol. 4, 5-6 (2004).
- Cai, S. & Lashbrook, C. C. Stamen abscission zone transcriptome profiling reveals new candidates for abscission control: enhanced retention of floral organs in transgenic plants overexpressing Arabidopsis zinc finger protein 2. Plant Physiol. 146, 1305-1321 (2008).
- Datu, B. J. et al. Transcriptional changes in the hookworm, Ancylostoma caninum, during the transition from a free-living to a parasitic larva. PLoS Negl. Trop. Dis. 2, e130 (2008).
- Faustino, R. S., Behfar, A., Perez-Terzic, C. & Terzic, A. Genomic chart guiding embryonic stem cell cardiopoiesis. Genome Biol. 9, R6 (2008).
- Ginos, M. A. et al. Identification of a gene expression signature associated with recurrent disease in squamous cell carcinoma of the head and neck. Cancer Res. 64, 55-63 (2004).
- Li, Y. & Sarkar, F. H. Gene expression profiles of genistein-treated PC3 prostate cancer cells. J. Nutr. 132, 3623-3631 (2002).
- Okada, H. et al. Genome-wide expression of azoospermia testes demonstrates a specific profile and implicates ART3 in genetic susceptibility. PLoS Genet. 4, e26 (2008).
- Uddin, M. et al. Sister grouping of chimpanzees and humans as revealed by genome-wide phylogenetic analysis of brain gene expression profiles. Proc. Natl Acad. Sci. USA 101, 2957-2962 (2004).
- van der Pouw Kraan, T. C. et al. Expression of a pathogen-response program in peripheral blood cells defines a subgroup of rheumatoid arthritis patients. Genes Immun. 9, 16-22 (2008).
- Zhang, X. et al. Whole-genome analysis of histone H3 lysine 27 trimethylation in Arabidopsis. PLoS Biol. 5, e129 (2007).
- Draghici, S., Khatri, P., Martins, R. P., Ostermeier, G. C. & Krawetz, S. A. Global functional profiling of gene expression. Genomics 81, 98-104 (2003). This paper describes how the significance of enriched or depleted terms is calculated using a number of alternative models in GO profiling.
- Man, M. Z., Wang, X. & Wang, Y. POWER_SAGE: comparing statistical tests for SAGE experiments. Bioinformatics 16, 953-959 (2000).
- Alexa, A., Rahnenfuhrer, J. & Lengauer, T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22, 1600-1607 (2006). This paper explains some of the problems related to the structure of GO and proposes an approach that can be used to address them.
- Grossmann, S., Bauer, S., Robinson, P. N. & Vingron, M. Improved detection of overrepresentation of Gene Ontology annotations with parent child analysis. Bioinformatics 23, 3024-3031 (2007).
- Schlicker, A., Rahnenfuhrer, J., Albrecht, M., Lengauer, T. & Domingues, F. S. GOTax: investigating biological processes and biochemical activities along the taxonomic tree. Genome Biol. 8, R33 (2007).
- McCarthy, F. M., Bridges, S. M. & Burgess, S. C. GOing from functional genomics to biological significance. Cytogenet. Genome Res. 117, 278-287 (2007).
- Khatri, P. & Draghici, S. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21, 3587-3595 (2005). This includes a detailed comparison of 14 functional profiling tools using a number of different criteria, including scope of the analysis, visualization capabilities, statistical model(s) used, correction for multiple comparisons, reference microarrays available, installation issues and sources of annotation data.
- Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65-70 (1979).
- Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. (Ser. B) 57, 289-300 (1995).
- Draghici, S. Data Analysis Tools for DNA Microarrays (Chapman & Hall/CRC, Boca Raton, Florida, 2003).
- Farcomeni, A. A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion. Stat. Methods Med. Res. 14 Aug 2007 (doi:101177/0962280206079046).
- Marcotte, E. & Date, S. Exploiting big biology: integrating large-scale biological data for function inference. Brief. Bioinform. 2, 363-374 (2001).
- Markowetz, F. & Troyanskaya, O. G. Computational identification of cellular networks and pathways. Mol. Biosyst. 3, 478-482 (2007).
- Srinivasan, B. S. et al. Current progress in network research: toward reference networks for key model organisms. Brief. Bioinform. 8, 318-332 (2007).
- Khatri, P., Done, B., Rao, A., Done, A. & Draghici, S. A semantic analysis of the annotations of the human genome. Bioinformatics 21, 3416-3421 (2005).
- Wong, S. L., Zhang, L. V. & Roth, F. P. Discovering functional relationships: biochemistry versus genetics. Trends Genet. 21, 424-427 (2005).
- Myers, C. L., Barrett, D. R., Hibbs, M. A., Huttenhower, C. & Troyanskaya, O. G. Finding function: evaluation methods for functional genomic data. BMC Genomics 7, 187 (2006).
- Yu, J. et al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 79-92 (2002).
- Kawai, J. et al. Functional annotation of a full-length mouse cDNA collection. Nature 409, 685-690 (2001).
- Whitfield, C. W. et al. Annotated expressed sequence tags and cDNA microarrays for studies of brain and behavior in the honey bee. Genome Res. 12, 555-566 (2002).
- Perrin, R. M. et al. Transcriptional regulation of chemical diversity in Aspergillus fumigatus by LaeA. PLoS Pathog. 3, e50 (2007).
- Qin, X., Ahn, S., Speed, T. P. & Rubin, G. M. Global analyses of mRNA translational control during early Drosophila embryogenesis. Genome Biol. 8, R63 (2007).
- Bender, M. A., Farach-Colton, M., Pemmasani, G., Skiena, S. & Sumazin, P. Lowest common ancestors in trees and directed acyclic graphs. J. Algorithms 57, 75-94 (2005).
- Seung Yon Rhee's hompage: http://carnegiedpb.stanford. edu/research/research_rhee.php Sorin Draghici's homepage: http://vortex.cs.wayne.edu An Introduction to the Gene Ontology: http://www. geneontology.org/GO.doc.shtml#term-term-relationships Gene Ontology (GO) project: http://www.geneontology.org GO annotation conventions: http://www.geneontology.org/ GO.annotation.conventions.shtml#qual GO annotation project at the European Bioinformatics Institute (GOA): http://www.ebi.ac.uk/GOA GO downloads: http://www.geneontology.org/GO.downloads.shtml GO Slim and Subset Guide: http://www.geneontology.org/GO.slims.shtml?all Interpro database: http://www.ebi.ac.uk/interpro ISI Web of Knowledge: http://apps.isiknowledge.com Map2slim: http://search.cpan.org/~cmungall/go-perl/scripts/map2slim
- Princeton University's GO Term Mapper: http://go. princeton.edu/cgi-bin/GOTermMapper/GOTermMapper Reference genome annotation project at GO: http://www.geneontology.org/GO.refgenome.shtml