Academia.eduAcademia.edu

Outline

Keyword query based focused Web crawler

2018, Procedia Computer Science

https://doi.org/10.1016/J.PROCS.2017.12.075

Abstract
sparkles

AI

This paper presents a keyword query-based focused web crawler designed to enhance the efficiency of web crawling processes by targeting relevant webpages. The proposed crawler utilizes metadata and a dynamic keyword list to refine its queries, allowing it to operate independently of the webpage's hierarchical structure. The effectiveness of the crawler is demonstrated through comparisons with traditional breadth-first search (BFS) crawlers, highlighting improvements in time efficiency and precision. Key methods discussed include the K level method for intra-domain crawling and max ancestor method for inter-domain relevancy calculations.

References (10)

  1. Brin, S., Page, L. (2012) "Reprint of: The anatomy of a large-scale hypertextual web search engine." Comput. Networks. 56 (18): 3825-3833. doi:10.1016/j.comnet.2012.10.007.
  2. Kumar, M., Bhatia, R., Rattan, D. (2017) "A survey of Web crawlers for information retrieval." Wiley Interdiscip. Rev. Data Min. Knowl. Discov. e1218. doi:10.1002/widm.1218.
  3. Shokouhi M, Chubak P, Raeesy Z. (2005) "Enhancing focused crawling with genetic algorithms." In Information Technology: Coding and Computing, 2005. ITCC 2005. International Conference on 2005, IEEE Apr 4, 2: 503-508.
  4. Chakrabarti, S., Van Den Berg, M., Dom, B. (1999) "Focused crawling: A new approach to topic-specific Web resource discovery." Comput. Networks. 31 (11), 1623-1640. doi:10.1016/S1389-1286(99)00052-3.
  5. Ipeirotis, P.G., Agichtein, E., Jain, P., Gravano, L. (2007) "Towards a query optimizer for text-centric tasks". ACM Trans. Database Syst. 32 (4): 21 doi:10.1145/1292609.1292611.
  6. Priyatam PN, Vaddepally SR, Varma V. (2012) "Domain specific search in indian languages." In Proceedings of the first workshop on Information and knowledge management for developing region 2012 Nov 2, ACM: 23-30.
  7. Tang, T.T., Hawking, D., Craswell, N. and Griffiths, K. (2005) "Focused crawling for both topical relevance and quality of medical information." In Proceedings of the 14th ACM international conference on Information and knowledge management, October 2005, ACM: 147-154.
  8. Altingovde IS, Ulusoy O. (2004) "Exploiting interclass rules for focused crawling." IEEE Intelligent Systems. 2004 Nov;19 (6):66-73.
  9. Kumar M, Bhatia R, Ohri A, Kohli A. (2016) "Design of focused crawler for information retrieval of Indian origin Academicians." In Advances in Computing, Communication, & Automation (ICACCA)(Spring), International Conference on 2016 Apr 8, IEEE:1-6.
  10. Zhao, F., Zhou, J., Nie, C., Huang, H., & Jin, H. (2016). SmartCrawler: A Two-stage Crawler for Efficiently Harvesting Deep-Web Interfaces. IEEE transactions on services computing, 9(4), 608-620.