Academia.eduAcademia.edu

Outline

An Ontology-Based Focused Crawler

2008, Lecture Notes in Computer Science

https://doi.org/10.1007/978-3-540-69858-6_48

Abstract

In this paper we present a novel approach for building a focused crawler. The goal of our crawler is to effectively identify web pages that relate to a set of predefined topics and download them regardless of their web topology or connectivity with other popular pages on the web. The main challenges that we address in our study concern the following. First we need to be able to effectively identify the pages' topical content before these are fully downloaded and processed. Thereafter, we need to obtain a well-balanced set of training examples that the crawler will regularly consult in its subsequent web visits.

References (9)

  1. Altingovde I.S., Ulusoy O.: Exploiting interclass rules for focused crawling, In: Intelligent Systems, IEEE p66-73 (Dec. 2004)
  2. Barzilay R.: Lexical chains for text summarization. Master's Thesis, (1997)
  3. Chakrabarti S., M. van den Berg, and B. Dom.: Focused Crawling: A new approach to Topic-Specific Web Resource Discovery. Computer Networks 31(11-16), (1999)
  4. Chakrabarti S., Punera K., and Subramanyam M.: Accelerated focused crawling through online relevance feedback In WWW2002, Honolulu, Hawaii USA ACM, (May 7-11, 2002)
  5. De Bra P., Houben G., Kornatzky Y., and Post R.: Information Retrieval in Distributed Hypertexts, In: Proceedings of the 4th RIAO Conference, 481 -491, New York, (1994)
  6. Ehrig M., Maedche A.: Ontology-focused Crawling of Web Documents. In the ACM Symposium on Applied computing. (2003)
  7. Hersovici, M., Jacovi M., Maarek Y., Pelleg D., Shtalhaim M., and Ur S.: The shark-search algorithm -An application: Tailored Web site mapping. In Proc. 7th Intl. World-Wide Web Conference, Brisbane, Australia, (April 1998)
  8. Stamou S., Krikos V., Kokosis P., Ntoulas A.: and Christodoulakis D. Web directory construction using lexical chains. In: the 10th NLDB Conference, 138-149, (2005)
  9. Wu X. and Palmer M.: Web semantics and lexical selection. In the 32nd ACL Meeting, (1994)