Academia.eduAcademia.edu

Outline

A Framework for Incremental Hidden Web Crawler

International Journal

Abstract

Hidden Web's broad and relevant coverage of dynamic and high quality contents coupled with the high change frequency of web pages poses a challenge for maintaining and fetching up-to-date information. For the purpose, it is required to verify whether a web page has been changed or not, which is another challenge. Therefore, a mechanism needs to be introduced for adjusting the time period between two successive revisits based on probability of updation of the web page. In this paper, architecture is being proposed that introduces a technique to continuously update/refresh the Hidden Web repository.

References (20)

  1. Rosy Madaan et. al. / (IJCSE) International Journal on Computer Science and Engineering Vol. 02, No. 03, 2010, 753-758 REFERENCES
  2. Arvind Arasu, Junghoo Cho, Hector Garcia-Molina, Andreas Paepcke, and Sriram Raghavan, "Searching the Web", ACM Transactions on Internet Technology (TOIT), 1(1):2-43, August 2001.
  3. Michael K. Bergman, "The deep web: Surfacing hidden value", Journal of Electronic Publishing, 7(1), 2001.
  4. Sergei Brin and Lawrence Page, "The anatomy of a large-scale hypertextual Web search engine", Computer Networks and ISDN Systems, 30(1-7):107- 117, April 199
  5. J. Cho and H. Garcia-Molina. "The Evolution of the Web and Implications for an Incremental Crawler." In Proceedings of the Twenty-Sixth VLDB Conference, pp. 200-209, Cairo, Egypt, 2000.
  6. J. Cho and H. Garcia-Molina. "Estimating Frequency of Change." Technical report, DB Group, Stanford University, Nov 2001.
  7. Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid Information Services for Distributed Resource Sharing. In: 10th IEEE International Symposium on High Performance Distributed Computing, pp. 181--184. IEEE Press, New York (2001) [6] S. Raghavan and H. Garcia-Molina, "Crawling the Hidden Web", In Proc. of VLDB, pages 129-138, 2001.
  8. Rosy Madaan et. al. / (IJCSE) International Journal on Computer Science and Engineering Vol. 02, No. 03, 2010, 753-758
  9. Komal Kumar Bhatia, A.K.Sharma, "A Framework for an Extensible Domain-specific Hidden Web Crawler (DSHWC)", communicated to IEEE TKDE Journal Dec 2008.
  10. Komal Kumar Bhatia, A.K.Sharma, "A Framework for Domain-Specific Interface Mapper (DSIM)", International Journal of Computer Science and Network Security (IJCSNS 2008).
  11. Komal Kumar Bhatia, A.K.Sharma, "Merging Query Interfaces in Domain- specific Hidden Web Databases", accepted in International Journal of Computer Science, 2008.
  12. A.K. Sharma, Komal Kumar Bhatia, "Crawling the hidden web resources", Proc. of NCIT-2007, Delhi.
  13. Ashutosh Dixit and A.K Sharma, "Self Adjusting Refresh Time Based Architecture For Incremental Web Crawler", International Journal of Computer Science and Network Security (IJCSNS), Vol 8, No12, Dec 2008.
  14. Mike Burner, "Crawling towards Eternity: Building an archive of the World Wide Web", Web Techniques Magazine, 2(5), May 1997.
  15. Junghoo Cho and Hector Garcia-Molina. 2000, "The evolution of the web and implications for an incremental crawler". In Proceedings of the 26th International Conference on Very Large Databases
  16. A. K. Sharma, J. P. Gupta, D. P. Agarwal, " A novel approach towards management of Volatile Information" Journal of CSI, Vol. 33 No. 1, pp 18-27, Sept' 2003.
  17. Junghoo Cho and Hector Garcia-Molina. Estimating frequency of change, 2000.Submitted to VLDB 2000, Research track.
  18. Brian E. Brewington and George Cybenko. "How dynamic is the web.", In Proceedings of the Ninth International World-Wide Web Conference, Amsterdam, Netherlands, May 2000.
  19. Rosy Madaan received B.E. degree in Computer Science & Engineering with Hons. from Maharshi Dayanand University in 2005 and is persuing M.Tech. in Computer. Presently, she is working as Senior Lecturer in Computer Engineering department in B.S.A. Institute of Technology & Management, Faridabad. Her areas of interests are Search Engines, Crawlers and Hidden Web. Ashutosh Dixit received the B.E, M.Tech. degrees in Computer Science Engineering with Hons. from Maharshi Dayanand University in 2001 and 2004 respectively. Presently, he is working as Senior Lecturer in Computer Engineering department in YMCA University of Science & Technology, Faridabad. He is also persuing Ph.D in Computer Engineering and his areas of interests are Search Engines and Crawlers.
  20. Prof. A. K. Sharma received his M.Tech. (Computer Science & Technology) with Hons. from University of Roorkee in the year 1989 and Ph.D (Fuzzy Expert Systems) from JMI, New Delhi in the year 2000. From July 1992 to April 2002, he served as Assistant Professor and became Professor in Computer Engg. at YMCA University of Science & Technology, Faridabad in April 2002. He obtained his second Ph.D. in IT from IIIT & M, Gwalior in the year 2004. His research interests include Fuzzy Systems, Object Oriented Programming, Knowledge Representation and Internet Technologies. Dr. Komal Kumar Bhatia received the B.E, M.Tech. and Ph.D. degrees in Computer Science Engineering with Hons. from Maharshi Dayanand University in 2001, 2004 and 2009, respectively. Presently, he is working as Assistant Professor in Computer Engineering department in YMCA University of Science & Technology, Faridabad. He is also guiding Ph.Ds in Computer Engineering and his areas of interests are Search Engines, Crawlers and Hidden Web.