Harvesting Domain Specific Ontologies from Text
2014, 2014 IEEE International Conference on Semantic Computing
Abstract
Ontologies are a vital component of most knowledgebased applications, including semantic web search, intelligent information integration, and natural language processing. In particular, we need effective tools for generating in-depth ontologies that achieve comprehensive converge of specific application domains of interest, while minimizing the time and cost of this process. Therefore we cannot rely on the manual or highly supervised approaches often used in the past, since they do not scale well. We instead propose a new approach that automatically generates domain-specific ontologies from a small corpus of documents using deep NLP-based text-mining. Starting from an initial small seed of domain concepts, our OntoHarvester system iteratively extracts ontological relations connecting existing concepts to other terms in the text, and adds strongly connected terms to the current ontology. As a result, OntoHarvester (i) remains focused on the application domain, (ii) is resistant to noise, and (iii) generates very comprehensive ontologies from modest-size document corpora. In fact, starting from a small seed, OntoHarvester produces ontologies that outperform both manually generated ontologies and ontologies generated by current techniques, even those that require very large well-focused data sets. • <variables, part of, expression> • <finite number of algebraic operations, part of, expression>
References (36)
- Sparql query language for rdf. http://www.w3.org/TR/rdf-sparql-query/, 2008.
- The stanford parser: A statistical parser. http://nlp.stanford.edu/software/lex-parser.shtml, 2013.
- M. Banko, M. J. Cafarella, S. Soderl, M. Broadhead, and O. Etzioni. Open information extraction from the web. In IJCAI, 2007.
- C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. Dbpedia -a crystallization point for the web of data. J. Web Sem., 7(3):154-165, 2009.
- K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD Conference, pages 1247-1250, 2008.
- D. Bourigault. Surface grammatical analysis for the extraction of terminological noun phrases. In COLING, pages 977-981, 1992.
- P. Drouin. Term extraction using non-technical corpora as a point of leverage. TERMINOLOGY, 9:99-116, 2003.
- E. Drymonas, K. Zervanou, and E. G. M. Petrakis. Unsupervised ontology acquisition from plain texts: The ontogain system. In NLDB, pages 277-287, 2010.
- G. Furnas, T. Landauer, L. Gomez, and S. Dumais. The vocabulary problem in human-system communication. Commun. ACM, 30(11):964- 971, Nov. 1987.
- T. R. Gruber. A Translation Approach to Portable Ontology Specifica- tions. Knowledge Acquisition, 6:199-220, 1993.
- M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In COLING, pages 539-545, 1992.
- J. Hoffart, F. M. Suchanek, K. Berberich, E. Lewis-Kelham, G. de Melo, and G. Weikum. Yago2: exploring and querying world knowledge in time, space, context, and many languages. In WWW, 2011.
- M. Janik and K. Kochut. Training-less Ontology-based Text Categoriza- tion. In Workshop on ESAIR, Mar. 2008.
- X. Jiang and A.-H. Tan. Crctol: A semantic-based domain ontology learning system. JASIST, 61(1):150-168, 2010.
- Z. Kozareva and E. H. Hovy. A semi-supervised method to learn and construct taxonomies using the web. In EMNLP, 2010.
- J. Krishnamurthy and T. M. Mitchell. Which noun phrases denote which concepts? In Proceedings of ACL: HLT, pages 570-580, PA, USA, 2011.
- C.-S. Lee, Y.-F. Kao, Y.-H. Kuo, and M.-H. Wang. Automated ontology construction for unstructured text documents. Data Knowl. Eng., 60(3):547-566, Mar. 2007.
- D. Lin and P. Pantel. Dirt @sbt@discovery of inference rules from text. In KDD, pages 323-328, 2001.
- S. Loh, L. K. Wives, and J. P. M. de Oliveira. Concept-based knowledge discovery in texts extracted from the web. SIGKDD Explor. Newsl., 2(1):29-39, June 2000.
- A. Maedche and S. Staab. Semi-automatic engineering of ontologies from text. In SEKE, Chicago, IL, 2000.
- H. Mousavi, S. Gao, and C. Zaniolo. Ibminer: A text mining tool for constructing and populating infobox databases and knowledge bases. PVLDB, 6(12):1330-1333, 2013.
- H. Mousavi, D. Kerr, and M. Iseli. A new framework for textual information mining over parse trees. In ICSC, pages 185-188, 2011.
- H. Mousavi, D. Kerr, M. Iseli, and C. Zaniolo. Ontoharvester: An unsupervised ontology generator from free text. In CSD TR #130003, UCLA, 2013.
- H. Mousavi, D. Kerr, M. Iseli, and C. Zaniolo. Mining semantic structures from syntactic structures in free text documents. In CSD TR #140005, UCLA, 2014.
- R. Navigli, P. Velardi, and S. Faralli. A graph-based algorithm for inducing lexical taxonomies from scratch. In IJCAI, 2011.
- P. Pantel and D. Lin. A statistical corpus-based term extractor. In Canadian Conference on AI, pages 36-46, 2001.
- A. G. Parameswaran, H. Garcia-Molina, and A. Rajaraman. Towards the web of concepts: Extracting concepts from large datasets. PVLDB, 3(1):566-577, 2010.
- H. Poon and P. Domingos. Unsupervised ontology induction from text. In ACL, pages 296-305, 2010.
- T. Quan, S. Hui, A. Fong, and T. Cao. Automatic generation of ontology for scholarly semantic web. ISWC, pages 726-740, 2004.
- R. Snow. Semantic taxonomy induction from heterogenous evidence. In In Proceedings of COLING/ACL 2006, pages 801-808, 2006.
- V. Stoyanov and C. Cardie. Topic identification for fine-grained opinion analysis. In COLING, pages 817-824, Stroudsburg, PA, USA, 2008.
- Q. T. Tho, S. C. Hui, A. C. M. Fong, and T. H. Cao. Automatic fuzzy ontology generation for semantic web. IEEE Trans. on Knowl. and Data Eng., 18(6):842-856, June 2006.
- A. Voutilainen. Nptool, a detector of english noun phrases. CoRR, cmp-lg/9502010, 1995.
- W. Wong, W. Liu, and M. Bennamoun. Ontology learning from text: A look back and into the future. ACM Comput. Surv., 44(4):20, 2012.
- F. Wu and D. S. Weld. Automatically refining the wikipedia infobox ontology. In WWW, pages 635-644, 2008.
- W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: a probabilistic taxonomy for text understanding. SIGMOD, pages 481-492, 2012.