Existing focused crawlers (FCs) are based upon fixed model of web and thus are deficient in using available information. The premise of this paper is that ontology can play an important role in enhancing the efficiency of existing agent... more
As the World Wide Web is growing rapidly and data in the present day scenario is stored in a distributed manner. The need to develop a search Engine based architectural model for people to search through the Web. Broad web search engines... more
As the World Wide Web is growing rapidly and data in the present day scenario is stored in a distributed manner. The need to develop a search Engine based architectural model for people to search through the Web. Broad web search engines... more
Electrical submersible pump (ESP) of high power rating are mainly used in oil field application and to connect that ESP system long length cable is required in ESP system so harmonics and high voltage transient is produce due to series... more
Abstract: In this paper we describe Agro Explorer, a language independent search engine with multilingual information access facility. Instead of searching on plain text it does the search on the meaning representation, an Interlingua... more
Today, Internet is the most important part of human life but growth of internet is major problem of internet user due to internet down loading speed, quality of downloaded web pages and find out the relevant content in the millions number... more
Search engines are the instruments for website navigation and search, because the Internet is big and has expanded greatly. By continuously downloading web pages for processing, search engines provide search facilities and maintain... more
refabricate a proficient search structure is very important due to the current scale of the web. Search engines mine information from the web and a program called a web crawler, which efficiently surfs the web. A distributed crawler... more
E-Government applications in developing countries are still lagging behind e-Governments in advanced countries. For example, the use of information integration for Web portal content is still very limited. This paper proposes an automated... more
There has been a rapid growth of the worldwide web which has scaled beyond our imaginations. To surmount these challenges search engines are used. One of the most important type of crawler is Focused crawler which is used to index... more
Indexing the Web is becoming a laborious task for search engines as the Web exponentially grows in size and distribution. Presently, the most effective known approach to overcome this problem is the use of focused crawlers. A focused... more
A Query based Approach to Reduce the Web Crawler Traffic using HTTP Get Request and Dynamic Web Page
The functions of Web crawler download information from web for search engine. Web pages changed without any notice. Web crawler has to revisit web site to download updated and new web pages. It is estimated 40% of current web traffic is... more
To search any information on the web users extensively use the search engines. As the growth of the World Wide Web exceeded all expectations, the search engines rely on web crawlers to maintain the index of billions of pages for efficient... more
The internet is a vague collection of web pages containing vague amount of information arranged in multiple servers. The mere size of this collection is a daunting obstacle in getting necessary and relevant information. This is where... more
Hidden Web's broad and relevant coverage of dynamic and high quality contents coupled with the high change frequency of web pages poses a challenge for maintaining and fetching up-to-date information. For the purpose, it is required to... more
Middleware is an important part of many search engine web crawling processes. We developed a middleware, the Crawl Document Importer (CDI), which selectively imports documents and the associated metadata to the digital library CiteSeerX... more
Cloud computing services has become an important paradigm as it is reliable and provides a cost effective way of storing and hosting applications. Cloud storage is growing exponentially and monitor the data in a secure manner. Cloud... more
The rapid growth of the WorldWide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In the personalized search domain, an alternative to general purpose crawler called focused crawlers are... more
We have developed a web-repository crawler that is used for reconstructing websites when backups are unavailable. Our crawler retrieves web resources from the Internet Archive, Google, Yahoo and MSN. We examine the challenges of crawling... more
A web-crawler is a program that automatically and systematically tracks the links of a website and extracts information from its pages. Due to the different formats of websites, the crawling scheme for different sites can differ... more
A web-crawler is a program that automatically and systematically tracks the links of a website and extracts information from its pages. Due to the different formats of websites, the crawling scheme for different sites can differ... more
JavaScript Client-side hidden web pages (CSHW) contain dynamic material created as a result of specific user activities. The number of CSHW websites is increasing. Crawling the so-called Hidden Web is challenging, particularly when... more
Text classification also called (text categorization or text tagging) is a crucial and extensively used approach in Natural Language Processing (NLP), to predict unseen content documents into prearranged categories. In this paper, we... more
When crawling resources, for example, number of machines, crawl-time, and so on, are limited, so a crawler has to decide an optimal order in which to crawl and recrawl Web pages. Ideally, crawlers should request only those Web pages that... more
Online notepad services allow users to upload and share free text anonymously. Reviewing Pastebin, one of the most popular online notepad services websites, it is possible to find textual content that could be related to illegal... more
Focused crawlers are programs designed to browse the Web and download pages on a specific topic. They are used for answering user queries or for building digital libraries on a topic specified by the user. In this article we will show how... more
One of the main challenges in the domain of competitive intelligence is to harness important volumes of information from the web, and extract the most valuable pieces of information. As the amount of information available on the web grows... more
Specialized dictionaries are used to understand concepts in specific domains, especially where those concepts are not part of the general vocabulary, or having meanings that differ from ordinary languages. The first step in creating a... more
In this paper we present two methodologies for rapidly inducing multiple subject-specific taxonomies from crawled data. The first method involves a sentence-level words co-occurrence frequency method for building the taxonomy, while the... more
One of the main challenges in the domain of competitive intelligence is to harness important volumes of information from the web, and extract the most valuable pieces of information. As the amount of information available on the web grows... more
This paper presents a tracked robot composed of the proposed crawler mechanism, in which a planetary gear reducer is employed as the transmission device and provides two outputs in different forms with only one actuator. When the crawler... more
The World Wide Web is a huge source of hyperlinked information contained in hypertext documents. Search engines use web crawlers to collect these documents from web for the purpose of storage and indexing. However, many of these documents... more
Hidden Web's broad and relevant coverage of dynamic and high quality contents coupled with the high change frequency of web pages poses a challenge for maintaining and fetching up-to-date information. For the purpose, it is required to... more
A focused crawler aims at discovering as many web pages relevant to a target topic as possible, while avoiding irrelevant ones. Reinforcement Learning (RL) has been utilized to optimize focused crawling. In this paper, we propose TRES, an... more
The rapid growth of the WorldWide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In this paper we describe a new hypertext resource discovery system called a Focused Crawler. The goal of a... more
We describe the architecture of a hypertext resource discovery system using a relational database. Such a system can answer questions that combine page contents, metadata, and hyperlink structure in powerful ways, such as "find the number... more
We describe the architecture of a hypertext resource discovery system using a relational database. Such a system can answer questions that combine page contents, metadata, and hyperlink structure in powerful ways, such as "find the number... more
We describe the architecture of a hypertext resource discovery system using a relational database. Such a system can answer questions that combine page contents, metadata, and hyperlink structure in powerful ways, such as "find the number... more
The rapid growth of the WorldWide Web made it difficult for general purpose search engines, e.g. Google and Yahoo, to retrieve most of the relevant results in response to the user queries. A vertical search engine specialized in a... more
Crawler-based search engines are the mostly used search engines among web and Internet users , involve web crawling, storing in database, ranking, indexing and displaying to the user. But it is noteworthy that because of increasing... more
Previous studies have highlighted the high arrival rate of new content on the web. We study the extent to which this new content can be efficiently discovered by a crawler. Our study has two parts. First, we study the inherent difficulty... more
Indexing the Web is becoming a laborious task for search engines as the Web exponentially grows in size and distribution. Presently, the most effective known approach to overcome this problem is the use of focused crawlers. A focused... more
Text classification also called (text categorization or text tagging) is a crucial and extensively used approach in Natural Language Processing (NLP), to predict unseen content documents into prearranged categories. In this paper, we... more
Design inspiration comes from the continuous stimulation of external information and the continuous accumulation of knowledge. In order to obtain an ideal design inspiration from nature, researchers have proposed a large number of... more
The rapid growth of the WorldWide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In this paper we describe a new hypertext resource discovery system called a Focused Crawler. The goal of a... more