Academia.eduAcademia.edu

Outline

TXTGate: profiling gene groups with text-based information

2004, Genome biology

https://doi.org/10.1186/GB-2004-5-6-R43

Abstract

We implemented a framework called TXTGate that combines literature indices of selected public biological resources in a flexible text-mining system designed towards the analysis of groups of genes. By means of tailored vocabularies, term- as well as gene-centric views are offered on selected textual fields and MEDLINE abstracts used in LocusLink and the Saccharomyces Genome Database. Subclustering and links to external resources allow for in-depth analysis of the resulting term profiles.

References (47)

  1. Gerstein M, Junker J: Blurring the boundaries between scien- tific papers and biological databases. Nature Online [http:// www.nature.com/nature/debates/e-access/articles/gernstein.html].
  2. Pruitt K, Maglott D: RefSeq and LocusLink: NCBI gene-cen- tered resources. Nucleic Acids Res 2001, 29:137-140.
  3. Masys DR, Welsh JB, Fink JL, Gribskov M, Klacansky I, Corbeil J: Use of keyword hierarchies to interpret gene expression. Bioinfor- matics 2001, 17:319-326.
  4. Jenssen T, Laegreid A, Komorowski J, Hovig E: A literature net- work of human genes for high-throughput analysis of gene expression. Nat Genet 2001, 28:21-28.
  5. Shatkay H, Edwards S, Boguski M: Information retrieval meets gene analysis. IEEE Intell Syst (Special Issue on Intelligent Systems in Biology) 2002, 17:45-53.
  6. Chaussabel D, Sher A: Mining microarray expression data by lit- erature profiling. Genome Biol 2002, 3:research0055.1-0055.16.
  7. Glenisson P, Antal P, Mathys J, Moreau Y, Moor BD: Evaluation of the vector space representation in text-based gene clustering. Pac Symp Biocomput 2003:391-402.
  8. Raychaudhuri S, Schutze H, Altman RB: Using text analysis to identify functionally coherent gene groups. Genome Res 2002, 12:1582-1590.
  9. Leonard JE, Colombe JB, Levy JL: Finding relevant references to genes and proteins in Medline using a Bayesian approach. Bio- informatics 2002, 18:1515-1522.
  10. Raychaudhuri S, Chang JT, Sutphin PD, Altman RB: Associating genes with Gene Ontology codes using a maximum entropy analysis of biomedical literature. Genome Res 2002, 12:203-214.
  11. Gene Ontology Consortium [http://www.geneontology.org]
  12. Medical Subject Headings [http://www.nlm.nih.gov/mesh/mesh home.html]
  13. Kelso J, Visagie J, Theiler G, Christoels A, Bardien S, Smedley D, Otgaar D, Greyling G, Jongeneel C, McCarthy M, et al.: eVOC: a controlled vocabulary for unifying gene expression data.
  14. Genome Res 2003, 13:1222-1230.
  15. Gene Ontology Annotation [http://www.ebi.ac.uk/GOA]
  16. Blaschke C, Oliveros J, Valencia A: Mining functional information associated with expression arrays. Funct Integr Genomics 2001, 1:256-268.
  17. Tanabe L, Scherf U, Smith L, Lee J, Hunter L, Weinstein J: MedMiner: an internet text-mining tool for biomedical information, with application to gene expression profiling. BioTechniques 1999, 27:1210-1217.
  18. MedMiner [http://discover.nci.nih.gov/textmining]
  19. Rebhan M, Chalifa-Caspi V, Prilusky J, Lancet D: GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics 1998, 14:656-664.
  20. Calogero R, Iazzetti G, Motta S, Pedrazzi G, Rago S, Rossi E, Turra R: MedMOLE: mining literature to extract biological knowl- edge by microarray data. In Proc Virtual Conf Genomics Bioinformatics 2002, 2:9-14.
  21. MedMOLE at CINECA [http://www.cineca.it/HPSystems/Chim ica/medmole]
  22. DNA Array Analysis with GEISHA [http://www.pdg.cnb.uam.es/ blaschke/cgi-bin/geisha]
  23. PubGene Gene Database and Tools [http://www.pubgene.org]
  24. Hu Y, Hines L, Weng H, Zuo D, Rivera M, Richardson A, LaBaer J: Analysis of genomic and proteomic data using advanced lit- erature mining. J Proteome Res 2003, 2:405-412.
  25. MedGene Database [http://hipseq.med.harvard.edu/MEDGENE]
  26. Perez-Iratxeta C, Bork P, Andrade M: Association of genes to genetically inherited diseases using data mining. Nat Genet 2002, 31:316-319.
  27. G2D Candidate Genes to Inherited Diseases [http://
  28. Chiang J, Yu H: MeKE: discovering the functions of gene prod- ucts from biomedical literature via sentence alignment. Bio- informatics 2003, 19:1417-1422.
  29. MeKE (Medical Knowledge Explorer) [http:// ismp.csie.ncku.edu.tw/~yuhc/meke]
  30. Java Remote Method Invocation (Java RMI) [http:// java.sun.com/products/jdk/rmi]
  31. Baeza-Yates R, Ribeiro-Neto B: Modern Information Retrieval Reading, MA: Addison-Wesley/ACM Press; 1999.
  32. Porter MF: An algorithm for suffix stripping. Program 1980, 14:130-137.
  33. Saccharomyces Genome Database [http://www.yeastge nome.org]
  34. OMIM -Online Mendelian Inheritance in Man [http:// www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM]
  35. HUGO Gene Nomenclature Commitee (HGNC) [http:// www.gene.ucl.ac.uk/nomenclature]
  36. Jain A, Dubes R: Algorithms for Clustering Data Upper Saddle River, NJ: Prentice Hall; 1988.
  37. Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cer- evisiae by microarray hybridization. Mol Biol Cell 1998, 9:3273-3297.
  38. Glenisson P, Mathys J, Moreau Y, De Moor B: Scoring and summa- rizing gene groups from text using the vector space model. Technical Report 03-97, ESAT-SISTA 2003 [ftp://ftp.esat.kuleuven.ac.be/ pub/SISTA/glenisson/ reports/genomebiol/TR03-97.pdf]. Leuven, Bel- gium: K.U.Leuven
  39. Eisen M, Spellman P, Brown P, Botstein D: Cluster analysis and dis- play of genome-wide expression patterns. Proc Natl Acad Sci USA 1998, 95:14863-14868.
  40. AmiGO Gene Ontology browser [http://www.godatabase.org]
  41. Kas K, Voz ML, Roijer E, Astrom AK, Meyen E, Stenman G, Van de Genome Biology 2004, 5:R43
  42. Ven WJ: Promoter swapping between the genes for a novel zinc finger protein and beta-catenin in pleiomorphic adeno- mas with t(3;
  43. translocations. Nat Genet 1997, 15:170-174.
  44. Voz ML, Mathys J, Hensen K, Pendeville H, Van Valckenborgh I, Van Huffel C, Chavez M, Van Damme B, De Moor B, Moreau Y, Van de Ven WJ: Microarray screening for target genes of the proto- oncogene PLAG1. Oncogene 2004, 23:179-191.
  45. Stephens M, Palakal M, Mukhopadhyay S, Raje R, Mostafa J: Detect- ing gene relations from Medline abstracts. Pac Symp Biocomput 2001:483-495.
  46. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al.: Gene ontology: tool for the unification of biology. Nat Genet 2000, 25:25-29.
  47. Raychaudhuri S, Chang JT, Imam F, Altman RB: The computational analysis of scientific literature to define and recognize gene expression clusters. Nucleic Acids Res 2003, 31:4553-4560.