Academia.eduAcademia.edu

Outline

A SURVEY OF FOCUSED CRAWLER APPROACHES

2012, Journal of Global Research in Computer Science

Abstract

As a result of the rapid growth of the World Wide Web contents, the focused crawlers" challenges are increasing. The problem that focused crawlers deal with is to selectively download relevant pages from the World Wide Web. In this paper, we present a survey for the different approaches in focused crawler which we classify them into three categories. Ontology based focused crawlers depend on ontology to determine the relevant pages. Structure base focused crawlers take in accounting the web pages structure when evaluating the page relevance. Other focused crawlers have their own features that cannot be categorized to one of the previously focused crawlers approached.

Key takeaways
sparkles

AI

  1. Focused crawlers selectively download relevant web pages, improving precision for specific topics.
  2. The paper surveys focused crawler approaches, classifying them into three categories: ontology-based, structure-based, and others.
  3. Ontology-based crawlers evaluate page relevance using domain concepts to enhance retrieval accuracy.
  4. Structure-based crawlers analyze hyperlink relationships to determine relevance and prioritize downloads.
  5. Other crawlers leverage unique features, such as context-based approaches and social bookmarking, to assess page relevance.

References (19)

  1. A. Thukral, V. Mendiratta, A. Behl, H. Banati and P. "Bedi, FCHC: A Social Semantic Focused Crawler", in Communications in Computer and Information Science, Vol. 191, Part 5, pp. 273-283, 2011.
  2. M. Kumar and R. Vig, "Design of CORE: context ontology rule enhanced focused web crawler", International Conference on Advances in Computing, Communication and Control (ICAC3"09) pp. 494-497, 2009.
  3. A. Chandramouli, S. Gauch, and J. Eno, "A Cooperative Approach to Web Crawler URL Ordering", iIn Human Computer Systems Interaction, AISC 98, Part I, pp. 343-357, 2012
  4. P. Gupta, A. Sharma, J. P. Gupta, and K. Bhatia, "A Novel Framework for Context Based Distributed Focused Crawler (CBDFC)", Int. J.CCT, Vol. 1 , No. 1 , pp.13-26. 2009
  5. A. Patel, and N. Schmid, "Application of structured document parsing to focused web crawling", in Computer Standards & Interfaces 33 (2011) pp. 325-331
  6. A. Pirkola and T. Talvensaari, "Effects of Start URLs in Focused Web Crawling", in INFORUM 2009: 15 th Conference on Professional Information Resources Prague, May 27-29, 2009.
  7. S. Yang and C. Hsu, "An Ontology-Supported Web Focused- Crawler for Java Programs", Proc. of 2010 International Workshop on Mobile Systems, E-commerce, and Agent Technology, Jinhua, China, Jul. 5-6, 2010, pp. 266-271
  8. M. Jamali , H. Sayyadi , B. B. Hariri, and H. Abolhassani, "A Method for Focused Crawling Using Combination of Link Structure and Content Similarity", Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp.753-756, December 18-22, 2006.
  9. S. Chakrabarti, M. v. d. Berg, and B. Domc, "Focused crawling: a new approach to topic-specific Web resource discovery", Computer Networks, 31(11-16):1623-1640. 1999
  10. A. Pirkola, "Focused Crawling: A Means to Acquire Biological Data from the Web", in VLDB "07, September 23-28, 2007, Vienna, Austria.
  11. A. Micarelli and F. Gasparetti, "Adaptive Focused Crawling", in The Adaptive Web, LNCS 4321, pp. 231-262, 2007.
  12. Q. Xu and W. Zuo, "First-order Focused Crawling", WWW 2007, pp. 1159-1160.
  13. C. Su, Y. Gao, J. Yang, and B. Luo, "An efficient adaptive focused crawler based on ontology learning", Hybrid Intelligent Systems, 2005. HIS apos;05. Fifth International Conference on 6-9 Nov. 2005.
  14. W. Huang, L. Zhang, J. Zhang, M. Zhu, "Focused Crawling for Retrieving E-commerce Information Based on Learnable Ontology and Link Prediction" ieec, International Symposium on Information Engineering and Electronic Commerce, pp.574- 579, 2009.
  15. H. P. Luong, S. Gauch, and Q. Wang, "Ontology-Based Focused Crawling", Information, Process, and Knowledge Management, 2009 (eKNOW '09) 1-7 Feb. 2009 pp. 123-128
  16. N. Pahal, N. Chauhan, and A.K. Sharma, "Context-Ontology Driven Focused Crawling of Web Documents", A.K. Wireless Communication and Sensor Networks, 2007. WCSN apos:07. Third International Conference, 13-15 Dec. 2007 pp.121-124
  17. H. Dong, F. K. Hussain, and E. Chang, "State of the art in semantic focused crawlers" in 2009 IEEE International Conference on Industrial Technology (ICIT 2009), Gippsland, in press
  18. M. Bazarganigilani, A. Syed and S. Burki, "Focused web crawling using decay concept and genetic programming", In International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.1, No.1, pp:1-12, 2011.
  19. H. Zhang and J. Lu, "SCTWC: An online semi-supervised clustering approach to topical web crawlers", in Applied Soft ComputingVol. 10, No. 2, pp. 490-495, 2010.