TXTGate: profiling gene groups with text-based information
2004, Genome biology
https://doi.org/10.1186/GB-2004-5-6-R43Abstract
We implemented a framework called TXTGate that combines literature indices of selected public biological resources in a flexible text-mining system designed towards the analysis of groups of genes. By means of tailored vocabularies, term- as well as gene-centric views are offered on selected textual fields and MEDLINE abstracts used in LocusLink and the Saccharomyces Genome Database. Subclustering and links to external resources allow for in-depth analysis of the resulting term profiles.
References (47)
- Gerstein M, Junker J: Blurring the boundaries between scien- tific papers and biological databases. Nature Online [http:// www.nature.com/nature/debates/e-access/articles/gernstein.html].
- Pruitt K, Maglott D: RefSeq and LocusLink: NCBI gene-cen- tered resources. Nucleic Acids Res 2001, 29:137-140.
- Masys DR, Welsh JB, Fink JL, Gribskov M, Klacansky I, Corbeil J: Use of keyword hierarchies to interpret gene expression. Bioinfor- matics 2001, 17:319-326.
- Jenssen T, Laegreid A, Komorowski J, Hovig E: A literature net- work of human genes for high-throughput analysis of gene expression. Nat Genet 2001, 28:21-28.
- Shatkay H, Edwards S, Boguski M: Information retrieval meets gene analysis. IEEE Intell Syst (Special Issue on Intelligent Systems in Biology) 2002, 17:45-53.
- Chaussabel D, Sher A: Mining microarray expression data by lit- erature profiling. Genome Biol 2002, 3:research0055.1-0055.16.
- Glenisson P, Antal P, Mathys J, Moreau Y, Moor BD: Evaluation of the vector space representation in text-based gene clustering. Pac Symp Biocomput 2003:391-402.
- Raychaudhuri S, Schutze H, Altman RB: Using text analysis to identify functionally coherent gene groups. Genome Res 2002, 12:1582-1590.
- Leonard JE, Colombe JB, Levy JL: Finding relevant references to genes and proteins in Medline using a Bayesian approach. Bio- informatics 2002, 18:1515-1522.
- Raychaudhuri S, Chang JT, Sutphin PD, Altman RB: Associating genes with Gene Ontology codes using a maximum entropy analysis of biomedical literature. Genome Res 2002, 12:203-214.
- Gene Ontology Consortium [http://www.geneontology.org]
- Medical Subject Headings [http://www.nlm.nih.gov/mesh/mesh home.html]
- Kelso J, Visagie J, Theiler G, Christoels A, Bardien S, Smedley D, Otgaar D, Greyling G, Jongeneel C, McCarthy M, et al.: eVOC: a controlled vocabulary for unifying gene expression data.
- Genome Res 2003, 13:1222-1230.
- Gene Ontology Annotation [http://www.ebi.ac.uk/GOA]
- Blaschke C, Oliveros J, Valencia A: Mining functional information associated with expression arrays. Funct Integr Genomics 2001, 1:256-268.
- Tanabe L, Scherf U, Smith L, Lee J, Hunter L, Weinstein J: MedMiner: an internet text-mining tool for biomedical information, with application to gene expression profiling. BioTechniques 1999, 27:1210-1217.
- MedMiner [http://discover.nci.nih.gov/textmining]
- Rebhan M, Chalifa-Caspi V, Prilusky J, Lancet D: GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics 1998, 14:656-664.
- Calogero R, Iazzetti G, Motta S, Pedrazzi G, Rago S, Rossi E, Turra R: MedMOLE: mining literature to extract biological knowl- edge by microarray data. In Proc Virtual Conf Genomics Bioinformatics 2002, 2:9-14.
- MedMOLE at CINECA [http://www.cineca.it/HPSystems/Chim ica/medmole]
- DNA Array Analysis with GEISHA [http://www.pdg.cnb.uam.es/ blaschke/cgi-bin/geisha]
- PubGene Gene Database and Tools [http://www.pubgene.org]
- Hu Y, Hines L, Weng H, Zuo D, Rivera M, Richardson A, LaBaer J: Analysis of genomic and proteomic data using advanced lit- erature mining. J Proteome Res 2003, 2:405-412.
- MedGene Database [http://hipseq.med.harvard.edu/MEDGENE]
- Perez-Iratxeta C, Bork P, Andrade M: Association of genes to genetically inherited diseases using data mining. Nat Genet 2002, 31:316-319.
- G2D Candidate Genes to Inherited Diseases [http://
- Chiang J, Yu H: MeKE: discovering the functions of gene prod- ucts from biomedical literature via sentence alignment. Bio- informatics 2003, 19:1417-1422.
- MeKE (Medical Knowledge Explorer) [http:// ismp.csie.ncku.edu.tw/~yuhc/meke]
- Java Remote Method Invocation (Java RMI) [http:// java.sun.com/products/jdk/rmi]
- Baeza-Yates R, Ribeiro-Neto B: Modern Information Retrieval Reading, MA: Addison-Wesley/ACM Press; 1999.
- Porter MF: An algorithm for suffix stripping. Program 1980, 14:130-137.
- Saccharomyces Genome Database [http://www.yeastge nome.org]
- OMIM -Online Mendelian Inheritance in Man [http:// www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM]
- HUGO Gene Nomenclature Commitee (HGNC) [http:// www.gene.ucl.ac.uk/nomenclature]
- Jain A, Dubes R: Algorithms for Clustering Data Upper Saddle River, NJ: Prentice Hall; 1988.
- Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cer- evisiae by microarray hybridization. Mol Biol Cell 1998, 9:3273-3297.
- Glenisson P, Mathys J, Moreau Y, De Moor B: Scoring and summa- rizing gene groups from text using the vector space model. Technical Report 03-97, ESAT-SISTA 2003 [ftp://ftp.esat.kuleuven.ac.be/ pub/SISTA/glenisson/ reports/genomebiol/TR03-97.pdf]. Leuven, Bel- gium: K.U.Leuven
- Eisen M, Spellman P, Brown P, Botstein D: Cluster analysis and dis- play of genome-wide expression patterns. Proc Natl Acad Sci USA 1998, 95:14863-14868.
- AmiGO Gene Ontology browser [http://www.godatabase.org]
- Kas K, Voz ML, Roijer E, Astrom AK, Meyen E, Stenman G, Van de Genome Biology 2004, 5:R43
- Ven WJ: Promoter swapping between the genes for a novel zinc finger protein and beta-catenin in pleiomorphic adeno- mas with t(3;
- translocations. Nat Genet 1997, 15:170-174.
- Voz ML, Mathys J, Hensen K, Pendeville H, Van Valckenborgh I, Van Huffel C, Chavez M, Van Damme B, De Moor B, Moreau Y, Van de Ven WJ: Microarray screening for target genes of the proto- oncogene PLAG1. Oncogene 2004, 23:179-191.
- Stephens M, Palakal M, Mukhopadhyay S, Raje R, Mostafa J: Detect- ing gene relations from Medline abstracts. Pac Symp Biocomput 2001:483-495.
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al.: Gene ontology: tool for the unification of biology. Nat Genet 2000, 25:25-29.
- Raychaudhuri S, Chang JT, Imam F, Altman RB: The computational analysis of scientific literature to define and recognize gene expression clusters. Nucleic Acids Res 2003, 31:4553-4560.