Academia.eduAcademia.edu

Outline

Use and misuse of the gene ontology annotations

2008, Nature Reviews Genetics

https://doi.org/10.1038/NRG2363

Abstract

The accumulation of data produced by genome-scale research requires explicitly defined vocabularies to describe the biological attributes of genes in order to allow integration, retrieval and computation of the data 1 . Arguably, the most successful example of systematic description of biology is the Gene Ontology (GO) project 2 . GO is widely used in biological databases, annotation projects and computational analyses (there are 2,960 citations for GO in version 3.0 of the ISI Web of Knowledge) for annotating newly sequenced genomes 3 , text mining 4,5 , network modelling 6 and clinical applications 7 , among others. GO has two components: the ontologies themselves, which are the defined terms and the structured relationships between them (GO ontology); and the associations between gene products and the terms (GO annotations). GO provides both ontologies and annotations for three distinct areas of cell biology: molecular function, biological process, and cellular component or location.

References (44)

  1. Bard, J. B. & Rhee, S. Y. Ontologies in biology: design, applications and future challenges. Nature Rev. Genet. 5, 213-222 (2004). This paper provides a more detailed overview of types and uses of ontologies in biology, with an emphasis on GO.
  2. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25, 25-29 (2000). This paper includes more details about the Gene Ontology.
  3. Adams, M. D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185-2195 (2000).
  4. Hirschman, L., Yeh, A., Blaschke, C. & Valencia, A. Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics 6 (Suppl. 1), S1 (2005).
  5. Camon, E. B. et al. An evaluation of GO annotation retrieval for BioCreAtIvE and GOA. BMC Bioinformatics 6 (Suppl. 1), S17 (2005).
  6. Liu, M. et al. Network-based analysis of affected biological processes in type 2 diabetes models. PLoS Genet. 3, e96 (2007).
  7. Dressman, H. K. et al. Gene expression signatures that predict radiation exposure in mice and humans. PLoS Med. 4, e106 (2007).
  8. The Gene Ontology Consortium. Creating the gene ontology resource: design and implementation.
  9. Genome Res. 11, 1425-1433 (2001). This paper describes in more detail how the GO ontology is built and maintained in more detail.
  10. Camon, E., Barrell, D., Lee, V., Dimmer, E. & Apweiler, R. The Gene Ontology Annotation (GOA) Database -an integrated resource of GO annotations to the UniProt Knowledgebase. In Silico Biol. 4, 5-6 (2004).
  11. Cai, S. & Lashbrook, C. C. Stamen abscission zone transcriptome profiling reveals new candidates for abscission control: enhanced retention of floral organs in transgenic plants overexpressing Arabidopsis zinc finger protein 2. Plant Physiol. 146, 1305-1321 (2008).
  12. Datu, B. J. et al. Transcriptional changes in the hookworm, Ancylostoma caninum, during the transition from a free-living to a parasitic larva. PLoS Negl. Trop. Dis. 2, e130 (2008).
  13. Faustino, R. S., Behfar, A., Perez-Terzic, C. & Terzic, A. Genomic chart guiding embryonic stem cell cardiopoiesis. Genome Biol. 9, R6 (2008).
  14. Ginos, M. A. et al. Identification of a gene expression signature associated with recurrent disease in squamous cell carcinoma of the head and neck. Cancer Res. 64, 55-63 (2004).
  15. Li, Y. & Sarkar, F. H. Gene expression profiles of genistein-treated PC3 prostate cancer cells. J. Nutr. 132, 3623-3631 (2002).
  16. Okada, H. et al. Genome-wide expression of azoospermia testes demonstrates a specific profile and implicates ART3 in genetic susceptibility. PLoS Genet. 4, e26 (2008).
  17. Uddin, M. et al. Sister grouping of chimpanzees and humans as revealed by genome-wide phylogenetic analysis of brain gene expression profiles. Proc. Natl Acad. Sci. USA 101, 2957-2962 (2004).
  18. van der Pouw Kraan, T. C. et al. Expression of a pathogen-response program in peripheral blood cells defines a subgroup of rheumatoid arthritis patients. Genes Immun. 9, 16-22 (2008).
  19. Zhang, X. et al. Whole-genome analysis of histone H3 lysine 27 trimethylation in Arabidopsis. PLoS Biol. 5, e129 (2007).
  20. Draghici, S., Khatri, P., Martins, R. P., Ostermeier, G. C. & Krawetz, S. A. Global functional profiling of gene expression. Genomics 81, 98-104 (2003). This paper describes how the significance of enriched or depleted terms is calculated using a number of alternative models in GO profiling.
  21. Man, M. Z., Wang, X. & Wang, Y. POWER_SAGE: comparing statistical tests for SAGE experiments. Bioinformatics 16, 953-959 (2000).
  22. Alexa, A., Rahnenfuhrer, J. & Lengauer, T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22, 1600-1607 (2006). This paper explains some of the problems related to the structure of GO and proposes an approach that can be used to address them.
  23. Grossmann, S., Bauer, S., Robinson, P. N. & Vingron, M. Improved detection of overrepresentation of Gene Ontology annotations with parent child analysis. Bioinformatics 23, 3024-3031 (2007).
  24. Schlicker, A., Rahnenfuhrer, J., Albrecht, M., Lengauer, T. & Domingues, F. S. GOTax: investigating biological processes and biochemical activities along the taxonomic tree. Genome Biol. 8, R33 (2007).
  25. McCarthy, F. M., Bridges, S. M. & Burgess, S. C. GOing from functional genomics to biological significance. Cytogenet. Genome Res. 117, 278-287 (2007).
  26. Khatri, P. & Draghici, S. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21, 3587-3595 (2005). This includes a detailed comparison of 14 functional profiling tools using a number of different criteria, including scope of the analysis, visualization capabilities, statistical model(s) used, correction for multiple comparisons, reference microarrays available, installation issues and sources of annotation data.
  27. Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65-70 (1979).
  28. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. (Ser. B) 57, 289-300 (1995).
  29. Draghici, S. Data Analysis Tools for DNA Microarrays (Chapman & Hall/CRC, Boca Raton, Florida, 2003).
  30. Farcomeni, A. A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion. Stat. Methods Med. Res. 14 Aug 2007 (doi:101177/0962280206079046).
  31. Marcotte, E. & Date, S. Exploiting big biology: integrating large-scale biological data for function inference. Brief. Bioinform. 2, 363-374 (2001).
  32. Markowetz, F. & Troyanskaya, O. G. Computational identification of cellular networks and pathways. Mol. Biosyst. 3, 478-482 (2007).
  33. Srinivasan, B. S. et al. Current progress in network research: toward reference networks for key model organisms. Brief. Bioinform. 8, 318-332 (2007).
  34. Khatri, P., Done, B., Rao, A., Done, A. & Draghici, S. A semantic analysis of the annotations of the human genome. Bioinformatics 21, 3416-3421 (2005).
  35. Wong, S. L., Zhang, L. V. & Roth, F. P. Discovering functional relationships: biochemistry versus genetics. Trends Genet. 21, 424-427 (2005).
  36. Myers, C. L., Barrett, D. R., Hibbs, M. A., Huttenhower, C. & Troyanskaya, O. G. Finding function: evaluation methods for functional genomic data. BMC Genomics 7, 187 (2006).
  37. Yu, J. et al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 79-92 (2002).
  38. Kawai, J. et al. Functional annotation of a full-length mouse cDNA collection. Nature 409, 685-690 (2001).
  39. Whitfield, C. W. et al. Annotated expressed sequence tags and cDNA microarrays for studies of brain and behavior in the honey bee. Genome Res. 12, 555-566 (2002).
  40. Perrin, R. M. et al. Transcriptional regulation of chemical diversity in Aspergillus fumigatus by LaeA. PLoS Pathog. 3, e50 (2007).
  41. Qin, X., Ahn, S., Speed, T. P. & Rubin, G. M. Global analyses of mRNA translational control during early Drosophila embryogenesis. Genome Biol. 8, R63 (2007).
  42. Bender, M. A., Farach-Colton, M., Pemmasani, G., Skiena, S. & Sumazin, P. Lowest common ancestors in trees and directed acyclic graphs. J. Algorithms 57, 75-94 (2005).
  43. Seung Yon Rhee's hompage: http://carnegiedpb.stanford. edu/research/research_rhee.php Sorin Draghici's homepage: http://vortex.cs.wayne.edu An Introduction to the Gene Ontology: http://www. geneontology.org/GO.doc.shtml#term-term-relationships Gene Ontology (GO) project: http://www.geneontology.org GO annotation conventions: http://www.geneontology.org/ GO.annotation.conventions.shtml#qual GO annotation project at the European Bioinformatics Institute (GOA): http://www.ebi.ac.uk/GOA GO downloads: http://www.geneontology.org/GO.downloads.shtml GO Slim and Subset Guide: http://www.geneontology.org/GO.slims.shtml?all Interpro database: http://www.ebi.ac.uk/interpro ISI Web of Knowledge: http://apps.isiknowledge.com Map2slim: http://search.cpan.org/~cmungall/go-perl/scripts/map2slim
  44. Princeton University's GO Term Mapper: http://go. princeton.edu/cgi-bin/GOTermMapper/GOTermMapper Reference genome annotation project at GO: http://www.geneontology.org/GO.refgenome.shtml