Academia.eduAcademia.edu

Outline

Webnox: web knowledge extraction

2008

Abstract

The paper describes and evaluates a system for extracting knowledge from the web that uses a domain independent fact extraction approach and a self supervised learning algorithm. Using a trust algorithm, the precision of the system is improved to over 70% compared with a baseline of 52%. The paper describes and evaluates a system for extracting knowledge from the web that uses a domain independent fact extraction approach and a self supervised learning algorithm. Using a trust algorithm, the precision of the system is improved to over 70% compared with a baseline of 52%. The paper describes and evaluates a system for extracting knowledge from the web that uses a domain independent fact extraction approach and a self supervised learning algorithm. Using a trust algorithm, the precision of the system is improved to over 70% compared with a baseline of 52%. The paper describes and evaluates a system for extracting knowledge from the web that uses a domain independent fact extraction ap...

References (8)

  1. Michele Banko, Micheal J. Cafarella, Stephen Soderland, Matt Broadhead and Oren Etzioni. Open Information Extraction from the Web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 2670-2676, 2007.
  2. Chia-Hui Chang, Mohammed Kayed, Mohed R. Girgis and Khaled F. Shaalan. A Survey of Web Information Extraction Systems. IEEE Transactions on Knowledge and Data Engineering, Volume 18, Number 10, pages 1411-1428, 2006.
  3. William W. Cohen, Matthew Hurst and Lee S. Jensen. A flexible learning system for wrapping tables and lists in HTML documents. In Proceedings of the 11th Interna- tional Conference on World Wide Web, pages 232-241. ACM, 2002.
  4. Oren Etzioni, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soder- land, Daniel S. Weld and Alexander Yates. Web-scale information extraction in knowitall: (preliminary results).
  5. In WWW '04: Proceedings of the 13th International Conference on World Wide Web, pages 100-110. ACM, 2004.
  6. Anne-Marie Vercoustre, James A. Thom and Jovan Pe- hcevski. Entity ranking in Wikipedia. In SAC '08: Proceedings of the 2008 ACM symposium on Applied computing, pages 1101-1106. ACM, 2008.
  7. Alexander Yates. Information Extraction from the Web: Techniques and Applications. Ph.D. thesis, University of Washington, Computer Science and Engineering, 2007.
  8. Shubin Zhao and Jonathan Betz. Corroborate and Learn Facts from the Web. In KDD '07: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge discovery and data mining, pages 995-1003. ACM, 2007.