Academia.eduAcademia.edu

Outline

Mining concepts from Wikipedia for ontology construction

2009

Abstract

An ontology is a structured knowledgebase of concepts organized by relations among them. But concepts are usually mixed with their instances in the corpora for knowledge extraction. Concepts and their corresponding instances share similar features and are difficult to distinguish. In this paper, a novel approach is proposed to comprehensively obtain concepts with the help of definition sentences and Category Labels in Wikipedia pages. N-gram statistics and other NLP knowledge are used to help extracting appropriate concepts. The proposed method identified nearly 50,000 concepts from about 700,000 Wiki pages. The precision reaching 78.5% makes it an effective approach to mine concepts from Wikipedia for ontology construction.

References (13)

  1. I. Niles and A. Pease. "Towards a Standard Upper Ontology", in proceedings of FOIS-2001, available at the following address, http://home.earthlink.net/~adampease/professional/FOIS.pdf, last visited Apr. 1, 2009.
  2. Y. R. Chen, Q. Lu, W. J. Li, W. Y. Li, L. N. Ji, and G. Y. Cui, "Automatic Construction of a Chinese Core Ontology from an English-Chinese Term Bank", OntoLex07, Busan, 2007, pp. 78-87.
  3. L. Zhou, "Ontology Learning: State of the Art and Open Issues", Information Technology and Management, 8(3), pp.241-252, 2007.
  4. J. Kazama and K. Torisawa, "Exploiting Wikipedia as External Knowledge for Named Entity Recognition", EMNLP-CoNLL 2007, Prague, Jun. 2007, pp. 698-707.
  5. G. Y. Cui, Q. Lu, W. J. Li, and Y. R. Chen, "Automatic Acquisition of Attributes for Ontology Construction", ICCPOL 2009, Hong Kong, Mar. 2009, pp. 248-259
  6. R. Navigli and P. Velardi, "Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites", Computational Linguistics, 2004, MIT Press.
  7. B. Liu, C. W. Chin and H. T. Ng, "Mining Topic-Specific Concepts and Definitions on the Web", WWW 2003, Budapest, Hungary, May 23-24, 2003.
  8. M. Shirakawa et al., "Concept Vector Extraction from Wikipedia Category Network", ICUIMC-09, Suwon, Jan. 2009.
  9. G. Y. Cui, Q. Lu, W. J. Li, and Y. R. Chen, "Corpus Exploitation from Wikipedia for Ontology Construction", LREC 2008, Marrakech, 2008, pp. 2125-2132.
  10. A. Gregorowicz and M. A. Kramer, "Mining a Large-Scale Term- Concept Network from Wikipedia", Technical Report #06-1028, The MITRE Corp., Oct. 2006.
  11. C. Zirn, V. Nastase, and M. Strube, "Distinguishing Between Instances and Classes in the Wikipedia Taxonomy", ESWC2008, Tenerife, 2008.
  12. Margin of error. (2009, May 5). In Wikipedia, The Free Encyclopedia. Retrieved 03:46, May 5, 2009, from http://en.wikipedia.org/w/index.php?title=Margin_of_error&oldid=28 7987112
  13. K. Toutanova and C. D. Manning, "Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger", EMNLP/VLC- 2000, Hong Kong, Oct. 2000, pp. 63-70.