Academia.eduAcademia.edu

Outline

Overview of the BioCreative III workshop

2011

Abstract

Background: The overall goal of the BioCreative Workshops is to promote the development of text mining and text processing tools which are useful to the communities of researchers and database curators in the biological sciences. To this end BioCreative I was held in 2004, BioCreative II in 2007, and BioCreative II.5 in 2009. Each of these workshops involved humanly annotated test data for several basic tasks in text mining applied to the biomedical literature. Participants in the workshops were invited to compete in the tasks by constructing software systems to perform the tasks automatically and were given scores based on their performance. The results of these workshops have benefited the community in several ways. They have 1) provided evidence for the most effective methods currently available to solve specific problems; 2) revealed the current state of the art for performance on those problems; 3) and provided gold standard data and results on that data by which future advances can be gauged. This special issue contains overview papers for the three tasks of BioCreative III. Results: The BioCreative III Workshop was held in September of 2010 and continued the tradition of a challenge evaluation on several tasks judged basic to effective text mining in biology, including a gene normalization (GN) task and two protein-protein interaction (PPI) tasks. In total the Workshop involved the work of twenty-three teams. Thirteen teams participated in the GN task which required the assignment of EntrezGene IDs to all named genes in full text papers without any species information being provided to a system. Ten teams participated in the PPI article classification task (ACT) requiring a system to classify and rank a PubMed ® record as belonging to an article either having or not having "PPI relevant" information. Eight teams participated in the PPI interaction method task (IMT) where systems were given full text documents and were required to extract the experimental methods used to establish PPIs and a text segment supporting each such method. Gold standard data was compiled for each of these tasks and participants competed in developing systems to perform the tasks automatically. BioCreative III also introduced a new interactive task (IAT), run as a demonstration task. The goal was to develop an interactive system to facilitate a user's annotation of the unique database identifiers for all the genes appearing in an article. This task included ranking genes by importance (based preferably on the amount of described experimental information regarding genes). There was also an optional task to assist the user in finding the most relevant articles about a given gene. For BioCreative III, a user advisory group (UAG) was assembled and played an important role 1) in producing some of the gold standard annotations for the GN task, 2) in critiquing IAT systems, and 3) in providing guidance for a future more rigorous evaluation of IAT systems. Six teams participated in the IAT demonstration task and received feedback on their systems from the UAG group. Besides innovations in the GN and PPI tasks making them more realistic and practical and the introduction of the IAT task, discussions were begun on community data standards to promote interoperability and on user requirements and evaluation metrics to address utility and usability of systems.

References (25)

  1. Grishman R, Sundheim B: Message Understanding Conference -6: A Brief History. 16th International Conference on Computational Linguistics Kopenhagen 1996, 466-471.
  2. Krallinger M, Valencia A, Hirschman L: Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biol 2008, 9(Suppl 2):S8.
  3. Friedman C, Kra P, Rzhetsky A: Two biomedical sublanguages: a description based on the theories of Zellig Harris. J Biomed Inform 2002, 35:222-235.
  4. Yeh A, Hirschman L, Morgan A: Background and overview for KDD Cup 2002 task 1: information extraction from biomedical articles. SIGKDD Explor Newsl 2002, 4:87-89.
  5. Hersh W, Voorhees E: TREC genomics special issue overview. Inf Retr 2009, 12:1-15.
  6. Kim J, Ohta T, Tsuruoka Y, Tateisi Y, Collier N: Introduction to the Bio-Entity Task at JNLPBA. BioCreative Challenge Evaluation Workshop Granada, Spain; 2004.
  7. Kim JD, Ohta T, Tateisi Y, Tsujii J: GENIA corpus-semantically annotated corpus for bio-textmining. Bioinformatics 2003, 19(Suppl 1):i180-182.
  8. Nedellec C: Learning language in logic-genic interaction extraction challenge. Proceedings of the LLL05 workshop 2005.
  9. Kim JD, Ohta T, Pyysalo S, Kano Y, Tsujii Ji: Overview of BioNLP'09 Shared Task on Event Extraction. BioNLP 2009 Workshop Upsalla, Sweden: ACL; 2009.
  10. Rebholz-Schuhmann D, Yepes AJ, Van Mulligen EM, Kang N, Kors J, Milward D, Corbett P, Buyko E, Beisswanger E, Hahn U: CALBC silver standard corpus. J Bioinform Comput Biol 2010, 8:163-179.
  11. Rebholz-Schuhmann D, Yepes AJ, Van Mulligen EM, Kang N, Kors J, Milward D, Corbett P, Hahn U: CALBC silver standard corpus. 3rd International Symposium on Language in Biology and Medicine Jeju Island, South Korea; 2009.
  12. Hirschman L, Yeh A, Blaschke C, Valencia A: Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics 2005, 6(Suppl 1):S1.
  13. Leitner F, Krallinger M, Rodriguez-Penagos C, Hakenberg J, Plake C, Kuo CJ, Hsu CN, Tsai RT, Hung HC, Lau WW, et al: Introducing meta-services for biomedical information extraction. Genome Biol 2008, 9(Suppl 2):S6.
  14. Leitner F, Mardis SA, Krallinger M, Cesareni G, Hirschman LA, Valencia A: An Overview of BioCreative II.5. IEEE/ACM Trans Comput Biol Bioinform 2009, 7:385-399.
  15. Lu Z, Kao HY, Wei CH, Huang M, Liu J, Kuo CJ, Hsu CN, Tsai RTH, Dai HJ, Okazaki N, et al: The Gene Normalization Task in BioCreative III. BMC Bioinformatics 2011.
  16. Krallinger M, Vazquez M, Leitner F, Salgado D, Chatr-aryamontri A, Winter A, Perfetto L, Briganti L, Licata L, Iannuccelli M, et al: The Protein-Protein Interaction tasks of BioCreative III: classication/ranking of articles and linking bio-ontology concepts to full text. BMC Bioinformatics 2011.
  17. Arighi CN, Roberts P, Agarwal S, Bhattacharya S, Cesareni G, Chatr- aryamontri A, Clematide S, Gaudet P, Giglio MG, Harrow I, et al: BioCreative III Interactive Task: an Overview. BMC Bioinformatics 2011.
  18. Chatr-aryamontri A, Winter A, Perfetto L, Briganti L, Licata L, Iannuccelli M, Castagnoli L, Cesareni G, Tyers M: Benchmarking of the 2010 BioCreative Challenge III Text Mining Competition by the BioGRID and MINT Interaction Databases. BMC Bioinformatics 2011.
  19. Carroll H, Kann M, Sheetlin S, Spouge J: Threshold Average Precision (TAP- k): a measure of retrieval designed for bioinformatics. Bioinformatics 2010, 26:1708-1713.
  20. Krallinger M, Morgan A, Smith L, Leitner F, Tanabe L, Wilbur J, Hirschman L, Valencia A: Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge. Genome Biol 2008, 9(Suppl 2):S1.
  21. Leitner F, Chatr-aryamontri A, Mardis S, Ceol A, Krallinger M, Licata L, Hirschman L, Cesareni G, Valencia A: The FEBS Letters/BioCreative II.5 experiment: making biological information accessible. Nature biotechnology 2009, 28:897-899.
  22. Krallinger M, Vazquez M, Leitner F, Valencia A: Results of the BioCreative III (Interaction) Article Classification Task. In BioCreative III Workshop; Bethesda, MD Cohen K 2010, 17-23.
  23. Altman RB, Bergman CM, Blake J, Blaschke C, Cohen A, Gannon F, Grivell L, Hahn U, Hersh W, Hirschman L, et al: Text mining for biology-the way forward: opinions from leading scientists. Genome Biol 2008, 9(Suppl 2): S7.
  24. Craven MW, Shavlik JW: Extracting tree-structured representations of trained networks. Advances in Neural Information Processing Systems 1996, 24-30.
  25. Guo Y, Selman B: ExOpaque: A Framework to Explain Opaque Machine Learning Models Using Inductive Logic Programming. 19th IEEE International Conference on Tools with Artificial Intelligence Patras; 2007. doi:10.1186/1471-2105-12-S8-S1 Cite this article as: Arighi et al.: Overview of the BioCreative III Workshop. BMC Bioinformatics 2011 12(Suppl 8):S1.