eGIFT: Mining Gene Information from the Literature
2010, BMC Bioinformatics
https://doi.org/10.1186/1471-2105-11-418Abstract
Background: With the biomedical literature continually expanding, searching PubMed for information about specific genes becomes increasingly difficult. Not only can thousands of results be returned, but gene name ambiguity leads to many irrelevant hits. As a result, it is difficult for life scientists and gene curators to rapidly get an overall picture about a specific gene from documents that mention its names and synonyms.
References (33)
- McEntyre J, Lipman D: PubMed: bridging the information gap. Canadian Medical Association Journal 2001, 164(9):1317-1319 [http://www.ncbi.nlm. nih.gov/sites/entrez].
- BioMed Central. [http://www.biomedcentral.com/].
- Andrade MA, Valencia A: Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families. Bioinformatics 1998, 14(7):600-607.
- Liu Y, Brandon M, Navathe S, Dingledine R, Ciliax BJ: Text mining functional keywords associated with genes. MedInfo 2004, 11:292-296.
- Kaczanowski S, Siedlecki P, Zielenkewicz P: The High Throughput Sequence Annotation Service (HT-SAS) -the shortcut from sequence to true Medline words. BMC Bioinformatics 2009, 10:148-154.
- Rebholz-Schuhmann D, Kirsch H, Arregui M, Guadan S, Riethoven M, Stoehr P: EBIMed -text crunching to gather facts for proteins from Medline. Bioinformatics 2006, 23:e237-e244.
- Tsuruoka Y, Tsujii J, Ananiadou S: FACTA: a text search engine for finding associated biomedical concepts. Bioinformatics 2008, 24(21):2559-2560.
- Cheng D, Knox C, Young N, Stothard P, Damaraju S, Wishart DS: PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleid Acids Research 2008, 36(suppl 2):W399-W405.
- Maier H, Dohr S, Grote K, O'Keeffe S, Werner T, de Angelis MH, Schneider R: LitMiner and Wiki-Gene: identifying problem-related key players of gene regulation using publication abstracts. Nucleic Acids Research 2005, 33: W779-W782.
- Gladki A, Siedlecki P, Kaczanowski S, Zielenkewicz P: e-LiSe-an online tool for finding needles in the 'Medline haystack'. Bioinformatics 2008, 24(8):1115-1117.
- Kim JJ, Pezik P, Rebholz-Schuhmann D: Retrieving textual evidence of relations between biomedical concepts from Medline. Bioinformatics 2008, 24(11):1410-1412.
- Smalheiser NR, Zhou W, Torvik VI: Anne O'Tate: A tool to support user- driven summarization, drill-down and browsing of PubMed search results. Journal of Biomedical Discovery and Collaboration 2008, 3:2-11.
- Perez-Iratxeta C, Perez AJ, Bork P, Andrade MA: Update on XplorMed: a web server for exploring scientific literature. Nucleid Acid Research 2003, 31(13):3866-3868.
- Shatkay H, Wilbur WJ: Finding Themes in Medline Documents: Probabilistic Similarity Search. Seventh IEEE Advances in Digital Libraries (ADL'00) 2000, 183-192.
- Jelier R, Schuemie MJ, Veldhoven A, Dorssers LC, Kenster G, Kors JA: Anni 2.0: a multipurpose text-mining tool for the life sciences. Genome Biology 2008, 9(6):R96.
- Tsoi LC, Boehnke M, Klein RL, Zheng WJ: Evaluation of genome-wide association study results through development of ontology fingerprints. Bioinformatics 2009, 25(10):1314-1320.
- Ding J, Berleant D, Xu J, Juhlin K, Wurtele E, Fulmer A: GeneNarrator: Mining the Literature for Relations Among Genes. Journal of Proteomics and Bioinformatics 2009, 2(8):360-371.
- Cohen KB, Dolbey AE, Acquaah-Mensah GK, Hunter L: Contrast and Variability in Gene Names. In ACL Workshop on Natural Language Processing in the Biomedical Domain 2002, 14-20.
- Gospodnetic O, Hatcher E: Lucene in Action Manning Publ 2004.
- Shah PK, Perez-Iratxeta C, Bork P, Andrade MA: Information extraction from full scientific articles: Where are the keywords? BMC Bioinformatics 2003, 4:20.
- Bruce R, Wiebe J: Word-sense disambiguation using decomposable models. Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics 1994, 139-146.
- Yarowsky D: Unsupervised word sense disambiguation rivaling supervised methods. 33rd annual meeting on Association for Computational Linguistics 1995, 189-196.
- Pakhomov S: Semi-supervised Maximum Entropy based approach to acronym and abbreviation normalization in texts. 40th Annual Meeting on Association for Computational Linguistics 2001.
- Gaudan S, Kirsch H, Rebholz-Schuhmann D: Resolving abbreviations to their senses in Medline. Bioinformatics 2005, 21(18):3658-3664.
- Schwartz AS, Hearst MA: A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text. Pacific Symposium on Biocumputing 2003, 451-462.
- Miller J, Torii M, Vijay-Shanker K: Building Domain-Specific Taggers without Annotated (Domain) Data. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) 2007, 1103-1111.
- Tudor CO, Schmidt CJ, Vijay-Shanker K: Mining for Gene-Related Key Terms: Where Do We Find Them? Proceedings of the Third International Symposium on Semantic Mining in Biomedicine (SMBM 2008) Turku Centre for Computer Science (TUCS) 2008, 157-160.
- Krallinger M, Morgan AA, Smith L, Leitner F, Tanabe L, Wilbur J, Hirschman L, Valencia A: Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge. Genome Biology 2008, 9(Suppl 2):S1.
- Fundel K, Zimmer R: Gene and protein nomenclature in public databases. BMC Bioinformatics 2006, 7:372-384.
- Blaschke C, Leon EA, Krallinger M, Valencia A: Evaluation of BioCreAtIvE assessment of task 2. BMC Bioinformatics 2005, 6(Suppl I):S16-S28.
- The Scope of GO. [http://www.geneontology.org/GO.doc.shtml#not].
- UniProtKB User Manual. [http://www.expasy.org/sprot/userman. html#KW_line]. doi:10.1186/1471-2105-11-418
- Cite this article as: Tudor et al.: eGIFT: Mining Gene Information from the Literature. BMC Bioinformatics 2010 11:418.