DIMACS at the TREC 2004 Genomics Track
Abstract
DIMACS participated in the text categorization and ad hoc retrieval tasks of the TREC 2004 Genomics track. For the categorization task, we tackled the triage and annotation hierarchy subtasks. and biology of the laboratory mouse. In particular, the Mouse Genome Database (MGD) contains information on the characteristics and functions of genes in the mouse, and on where this information appeared in the scientific litera- ture. Human curators encode this information using con- trolled vocabulary terms from the Gene Ontology2 (GO), and provide citations to documents that report each piece of information. GO consists of three structured networks: Bi- ological Process (BP), Molecular Function (MF), and Cellu- lar Component (CC)) of terms describing attributes of genes and gene products. The TREC 2004 Genomics track defined a categorization task with three subtasks based on simplified versions of this curation process. DIMACS participated in two of those sub- tasks, triage and annotation h...
References (12)
- E. Brill. Some advances in rule-based part of speech tagging. In Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), Seattle, WA, 1994.
- Bradley P. Carlin and Thomas A. Louis. Bayes and Empirical Bayes Methods for Data Analysis. Chapman & Hall, London, 1996.
- Alexander Genkin, David D. Lewis, and David Madigan. Large-scale bayesian logistic regression for text categorization. Technical report, DIMACS, 2004.
- William Hersh. Trec 2004 genomics track overview. In 13th Text Retrieval Conference, 2004. To appear.
- David D. Lewis. Evaluating and optimizing autonomous text classification systems. In Edward A. Fox, Peter Ingwersen, and Raya Fidel, editors, SIGIR '95: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 246-254, New York, 1995. Association for Computing Machinery.
- M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130-137, July 1980.
- G. Salton, editor. The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, 1971.
- Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513-523, 1988.
- C. J. van Rijsbergen. Information Retrieval. Butterworths, London, second edition, 1979.
- Cornelis Joost van Rijsbergen. Automatic Information Structuring and Retrieval. PhD thesis, King's College, Cambridge, July 1972.
- I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, San Francisco, CA, 2 edition, 1999.
- H. Yu and E. Agichtein. Extracting synonymous gene and protein terms from biological literature. Bioinformatics, 19:340-349, 2003.