Automatically inducing ontologies from corpora
2002, Corpus
Abstract
The emergence of vast quantities of on-line information has raised the importance of methods for automatic cataloguing of information in a variety of domains, including electronic commerce and bioinformatics. Ontologies can play a critical role in such cataloguing. In this paper, we describe a system that automatically induces an ontology from any large on-line text collection in a specific domain. The ontology that is induced consists of domain concepts, related by kind-of and part-of links. To achieve domain-independence, ...
References (18)
- Abney, S. 1996. Partial parsing Via Finite-State Cascades. Proceedings of the ESSLLI '96 Robust Parsing Workshop.
- Caraballo, S. A. 1999. Automatic Construction of a hypernym-labeled noun hierarchy from text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL'1999), 120-122.
- Cohen, P. R., Chaudhri, V., Pease, A. and Schrag, R. 1999. Does Prior Knowledge Facilitate the Development of Knowledge-based Systems? The Sixteenth National Conference on Artificial Intelligence (AAAI-99).
- Craven, M. and Kumlien, J. 1999. Constructing biological knowledge bases by extracting information from text sources. Proc Int Conf Intell Syst Mol Biol., 77-86.
- Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., and Slattery, S.. 1998. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of AAAI-98, 509-516.
- Daude, J., Padro, L. and Rigau, G. 2001 A Complete WN1.5 to WN1.6 Mapping. NAACL- 2001 Workshop on WordNet and Other Lexical Resources: Applications, Extension, and Customization, 83-88.
- Doan, A., Madhavan, J. , Domings, P. and Halevy, A. 2002. Learning to Map between Ontologies on the Semantic Web. WWW'2002.
- Dunning, T. 1993. Accurate Methods for the Statistics of Surprise and Coincidence," Computational Linguistics, 19(1):61-74, 1993.
- Girju, R., Badulescu, A., and Moldovan, D. 2003. Learning Semantic Constraints for the Automatic Discovery of Part-Whole Relations. Proceedings of HLT'2003, Edmonton.
- Grefenstette, G. 1997. Explorations in Automatic Thesaurus Discovery. Kluwer International Series in Engineering and Computer Science, Vol 278.
- Hearst, M. 1992. Automatic Acquisition of Hyponyms from Large Text Corpora. Proceedings of the fourteenth International Conference on Computational Linguistics, Nantes, France, July 1992.
- Hull, R. and Gomez, F. 1993. Inferring Heuristic Classification Hierarchies from Natural Language Input. Telematics and Informatics, 9(3/4), pp. 265-281.
- IRS (Internal Revenue Service). 2001. Tax Guide 2001. Publication 17. http://www.irs.gov/pub/irs- pdf/p17.pdf
- Lawrie, D., Croft, W. B., and Rosenberg, A. 2001. Finding topic words for hierarchical summarization. 24th ACM Intl. Conf. on Research and Development in Information Retrieval, 349-357, 2001.
- Miller, G. (1995). WordNet: A Lexical Database for English. Communications Of the Association For Computing Machinery (CACM) 38, 39-41.
- Sanderson, M. and Croft, B. 1995. Deriving concept hierarchies from text. Proceedings of the 22 nd Annual Internationaql ACM SIGIR Conference on Research and Development in Information Retrieval, 160-170.
- Sekine, S., Sudo, K. and Ogino, T. 1999. Statistical Matching of Two Ontologies. Proceedings of ACL SIGLEX99 Workshop: Standardizing Lexical Resources.
- Zhang, K., Wang, J. T. L. and Shasha, D. 1996. On the Editing Distance between Undirected Acyclic Graphs and Related Problems. International Journal of Foundations of Computer Science 7, 43-58.