Academia.eduAcademia.edu

Outline

URES : an Unsupervised Web Relation Extraction System

2006, Acl

https://doi.org/10.3115/1273073.1273159

Abstract

Most information extraction systems either use hand written extraction patterns or use a machine learning algorithm that is trained on a manually annotated corpus. Both of these approaches require massive human effort and hence prevent information extraction from becoming more widely applicable. In this paper we present URES (Unsupervised Relation Extraction System), which extracts relations from the Web in a totally unsupervised way. It takes as input the descriptions of the target relations, which include the names of the predicates, the types of their attributes, and several seed instances of the relations. Then the system downloads from the Web a large collection of pages that are likely to contain instances of the target relations. From those pages, utilizing the known seed instances, the system learns the relation patterns, which are then used for extraction. We present several experiments in which we learn patterns and extract instances of a set of several common IE relations, comparing several pattern learning and filtering setups. We demonstrate that using simple noun phrase tagger is sufficient as a base for accurate patterns. However, having a named entity recognizer, which is able to recognize the types of the relation attributes significantly, enhances the extraction performance. We also compare our approach with KnowItAll's fixed generic patterns.

References (20)

  1. Agichtein, E. and L. Gravano (2000). Snowball: Ex- tracting Relations from Large Plain-Text Collec- tions. Proceedings of the 5th ACM International Conference on Digital Libraries (DL).
  2. Bikel, D. M., S. Miller, et al. (1997). Nymble: a high- performance learning name-finder. Proceedings of ANLP-97: 194-201.
  3. Brin, S. (1998). Extracting Patterns and Relations from the World Wide Web. WebDB Workshop, EDBT '98.
  4. Califf, M. E. and R. J. Mooney (1998). Relational Learning of Pattern-Match Rules for Information Extraction. Working Notes of AAAI Spring Sym- posium on Applying Machine Learning to Dis- course Processing. Menlo Park, CA, AAAI Press: 6-11.
  5. Chinchor, N., L. Hirschman, et al. (1994). "Evaluat- ing Message Understanding Systems: An Analysis of the Third Message Understanding Conference (MUC-3)." Computational Linguistics 3(19): 409- 449.
  6. Collins, M. and S. Miller (1998). Semantic Tagging using a Probabilistic Context Free Grammar. Pro- ceedings of the Sixth Workshop on Very Large Corpora.
  7. Etzioni, O., M. Cafarella, et al. (2005). "Unsupervised named-entity extraction from the Web: An ex- perimental study." Artificial Intelligence.
  8. Fisher, D., S. Soderland, et al. (1995). Description of the UMass Systems as Used for MUC-6. 6th Mes- sage Understanding Conference: 127-140.
  9. Manning, C. and H. Schutze (1999). Foundations of Statistical Natural Language Processing. Cam- bridge, US, The MIT Press.
  10. McCallum, A., D. Freitag, et al. (2000). Maximum Entropy Markov Models for Information Extrac- tion and Segmentation. Proc. 17th International Conf. on Machine Learning, Morgan Kaufmann, San Francisco, CA: 591-598.
  11. Miller, D., R. Schwartz, et al. (1999). Named entity extraction from broadcast news. Proceedings of DARPA Broadcast News Workshop. Herndon, VA. Miller, G. A. (1995). "WordNet: A lexical database for English." CACM 38(11): 39-41.
  12. Phillips, W. and E. Riloff (2002). Exploiting Strong Syntactic Heuristics and Co-Training to Learn Semantic Lexicons. Conference on Empirical Methods in Natural Language Processing (EMNLP 2002).
  13. Riloff, E. (1996). Automatically Generating Extrac- tion Patterns from Untagged Text. AAAI/IAAI, Vol. 2: 1044-1049.
  14. Riloff, E. and R. Jones (1999). Learning Dictionaries for Information Extraction by Multi-level Boot- strapping. Proceedings of the Sixteenth National Conference on Artificial Intelligence, The AAAI Press/MIT Press: 1044-1049.
  15. Rosenfeld, B., R. Feldman, et al. (2004). TEG: a hy- brid approach to information extraction. CIKM 2004, Arlington, VA.
  16. Soderland, S. (1999). "Learning Information Extrac- tion Rules for Semi-Structured and Free Text." Machine Learning 34(1-3): 233-272.
  17. Sudo, K., S. Sekine, et al. (2001). Automatic pattern acquisition for Japanese information extraction. Human Language Technology Conference (HTL2001).
  18. Thelen, M. and E. Riloff (2002). A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts. Conference on Em- pirical Methods in Natural Language Processing (EMNLP 2002).
  19. Yeh, A. and L. Hirschman (2002). "Background and overview for kdd cup 2002 task 1: Information ex- traction from biomedical articles." KDD Ex- plorarions 4(2): 87-89.
  20. Zelle, J. M. and R. J. Mooney. (1996). Learning to parse database queries using inductive logic pro- gramming. 13th National Conference on Artificial Intelligence (AAAI-96).