Academia.eduAcademia.edu

Web Search Engine

description2,037 papers
group8 followers
lightbulbAbout this topic
A web search engine is a software system designed to search for information on the World Wide Web. It indexes web pages and retrieves relevant results based on user queries, utilizing algorithms to rank the relevance and authority of the content.
lightbulbAbout this topic
A web search engine is a software system designed to search for information on the World Wide Web. It indexes web pages and retrieves relevant results based on user queries, utilizing algorithms to rank the relevance and authority of the content.

Key research themes

1. What are effective user search strategies and behaviors for locating specific information on the Web?

This research area investigates how users approach Web information searching, their success rates, search patterns, duration, step counts, and strategy effectiveness. Understanding these user-centric aspects is vital because Web search involves complex cognitive and technical skills, and user behavior directly impacts search outcomes and frustration levels.

Key finding: Through an empirical study involving 54 graduate students performing search tasks, this work identified typical characteristics of Web search processes including search duration, number of steps, and identified common search... Read more
by YM Chu
Key finding: Using a controlled pretest-posttest design, this study showed that explicit teaching of 'Technology Strategic Usefulness (TSU)' significantly enhanced high-school students' perceived usefulness and strategic intent in using... Read more
Key finding: This work developed and empirically evaluated the Relation Browser++ (RB++), a novel interface combining faceted category overviews and dynamic filtering to tightly couple browsing and searching across large information sets.... Read more
Key finding: The study proposed a proxy server architecture which caches search results in a domain-specific Web log to expedite repeated user queries. Experimental evaluations with duplicate queries across domains showed significant... Read more

2. How do search engine architectures and algorithms address scalability and efficiency in crawling and indexing vast Web content?

Research under this theme explores the design, implementation, and optimization of crawling architectures and indexing strategies that enable search engines to efficiently gather and organize vast and dynamic Web content. Scaling to billions of pages requires algorithms for distributed crawling, handling AJAX-based dynamic content, load balancing, and incremental updating while improving speed and reliability.

Key finding: Introduced a combined approach using an enhanced Hefty algorithm and bandwidth optimization to implement a distributed Web crawler minimizing redundant crawling and maximizing crawl throughput for drug-related websites. The... Read more
Key finding: This study presented a machine learning-based web crawler designed to extract article-like content from diverse web pages by leveraging visual, trivial HTML, and text-based features. The approach specifically addresses... Read more
Key finding: Provided a comprehensive survey of the evolution and methodologies in crawling AJAX-based Web applications, which present unique challenges due to dynamic content and multiple states per URL. Identified key issues such as... Read more
Key finding: Additionally, by emphasizing user difficulties in locating relevant information and highlighting the explosion of Web content, this work indirectly underscores the necessity for efficient crawling and indexing architectures... Read more

3. What are contemporary techniques in Search Engine Optimization (SEO), ranking algorithms, and semantic search that improve search relevance and page ranking?

This theme encapsulates advanced methodologies for optimizing Web page ranking and retrieval relevance, focusing on both technical page optimization (e.g., page speed, audit rules) and semantic understanding through ontology and knowledge representation. These approaches inform search engines to serve more accurate, relevant, and quality results to users, overcoming shortcomings of simple keyword matching.

Key finding: The study systematically analyzed the impact of Google PageSpeed audit rules on website performance, identifying a prioritized sequence of audit rules that, when applied, yielded over 80% performance improvement after... Read more
Key finding: Proposed an ontology-based semantic search engine framework for the tourism domain leveraging WordNet to construct synonym sets enabling deeper semantic query matching beyond keywords. Experiments showed improved retrieval... Read more
Key finding: Surveyed web mining techniques (structure, content, usage mining) and their application in developing ranking algorithms. Highlighted that combining hyperlink analysis (e.g., PageRank), content analysis, and user behavior... Read more
Key finding: This work evaluated how search engine query modifiers (operators) can be employed to effectively refine user queries, significantly reducing result set size and increasing precision. Empirical results with Google and Bing... Read more
Key finding: Discusses advanced Google search operators and techniques for optimizing query specificity, such as field searches (title:, link:), truncation, exclusion, proximity operators, and leveraging services like image and flight... Read more

All papers in Web Search Engine

Abstr act. The Internet has become for many the most important medium for staying informed about current news events. Some events cause heightened interest on a topic, which in turn yields a higher frequency of the search queries related... more
Information Retrieval Systems (IRS) are so popular through World Wide Web. Availability of Text Information related to all types of objects like Documents, Web Pages,Images, Videos and Audio files on web are increasing day by day in an... more
This paper reports the results of a study conducted to explore and compare the features of independently built Malaysian Web search engines, as well as evaluate their performance and search capabilities. Four Malaysian independently built... more
In online affiliate marketing networks advertising web sites offer their affiliates revenues based on provided web site traffic and associated leads and sales. Advertising web sites can have a network of thousands of affiliates providing... more
AOIS is a multi-agent system that supports the sharing of information among a community of users connected through the Internet. In respect to Web search engines, this system enhances the search through domain ontologies, avoids the... more
Search Engine for South-East Europe (SE4SEE) is a socio-cultural search engine running on the grid infrastructure. It offers a personalized, on-demand, country-specific, category-based Web search facility. The main goal of SE4SEE is to... more
Search Engine for South-East Europe (SE4SEE) is an application project aiming to develop a grid-enabled search engine that specifically targets the countries in the South-East Europe. It is one of the two selected regional applications... more
In recent years, we have witnessed the proliferation of applications that generate thousands of terabytes of data per day, due to the explosive increase in storage capacity across various devices. As a consequence, a new concept called... more
During the last years web search engines have moved from the simple but inefficient syntactical analysis (first generation) to the more robust and usable web graph analysis (second generation). Much of the current research is focussed on... more
Thinking of today's web search scenario which is mainly keyword based, leads to the need of effective and meaningful search provided by Semantic Web. Existing search engines are vulnerable to provide relevant answers to users query due to... more
Online questionnaire-based research is growing at a fast pace. Mouse-tracking methods provide a potentially important data source for this research by enabling the capture of respondents' online behaviour while answering questionnaire... more
There is growing interest in the field of human-computer interaction in the use of mouse movement data to infer e.g. user's interests, preferences and personality. Previous work has defined various patterns of mouse movement behavior.... more
The volume of world wide web ( WWW) is increasing enormously due to a world wide move to migrate information to online sources. To search some information on WWW, search engines are used, which when presented with queries, return a list... more
The Internet is increasingly used to find health information worldwide. Online health information search can be beneficial for novice users but due to the overwhelming quantity and uneven quality of online health information it may also... more
The volume of world wide web ( WWW) is increasing enormously due to a world wide move to migrate information to online sources. To search some information on WWW, search engines are used, which when presented with queries, return a list... more
(147) One of the major problems in the process of Information Retrieval (IR) arises at the stage where the user reviews the results list. This paper presents the latest research in a series of research works that aims at finding the most... more
(140) One of the major problems in the process of Information Retrieval (IR) arises at the stage where the user reviews the results list. This paper presents the latest research in a series of research works that aims at finding the most... more
Frequent requests from users to search engines on the World Wide Web are to search for information about people using personal names. Current search engines only return sets of documents containing the name queried, but, as several people... more
In this paper we present a semi-supervised learning method for a problem of learning to rank where we exploit Markov random walks and graph regularization in order to incorporate not only "labeled" web pages but also plenty of "unlabeled"... more
Search engines have become an integral part of our lives. To augment the power of such engines-even while offline-was our goal. To accomplish this, a distinct offline search engine was created to retrieve data from archives. An updated UI... more
World Wide Web is a collection of online resources and websites, including e-commerce, social sites, educational content, etc. To find relevant online resources, people search these by using search engines by providing their desired... more
As the World Wide Web is growing rapidly and data in the present day scenario is stored in a distributed manner. The need to develop a search Engine based architectural model for people to search through the Web. Broad web search engines... more
As the World Wide Web is growing rapidly and data in the present day scenario is stored in a distributed manner. The need to develop a search Engine based architectural model for people to search through the Web. Broad web search engines... more
We propose a simple approach to search engine personalization based on Web communities . User information -in particular, the Web communities whose neighborhoods the user has selected in the past-is used to change the order of the... more
Federated search has the potential of improving web search: the user becomes less dependent on a single search provider and parts of the deep web become available through a unified interface, leading to a wider variety in the retrieved... more
Many experts predict that the next huge step forward in Web information technology will be achieved by adding semantics to Web data, and will possibly consist of (some form of) the Semantic Web. In this paper, we present a novel approach... more
The volume of world wide web ( WWW) is increasing enormously due to a world wide move to migrate information to online sources. To search some information on WWW, search engines are used, which when presented with queries, return a list... more
This paper introduces the reader to the approach we are taking to develop an ontology that could be used to represent the knowledge inherent in filmed materials. Such an ontology could be used as the semantic basis for multimedia... more
Most of the ontology alignment tools use terminological techniques as the initial step and then apply the structural techniques to refine the results. Since each terminological similarity measure considers some features of similarity,... more
A new indexing method. called Compressed Multi-Framed Signature File (C-MFSF). that uses a partial query evaluation strategy with compressed signature bit slices is presented. In C-MFSF. a signature tile is divided into variable sized... more
In this paper, we address the problem of selection collections. This is important for locating responses in digital libraries. The aim of methods, which deal with the area of information retrieval, is to reduce the amount of the exchanged... more
The Semantic Web is usually envisaged as a collection of Web acces- sible RDF documents that re-use RDF schemas. These schemas are expected to be most often independently designed and hence not sharing many categories. We are unconvinced... more
In this paper we propose a novel parallel algorithm for frequent itemset mining. The algorithm is based on the filter-stream programming model, in which the frequent itemset mining process is represented as a data flow controlled by a... more
Search engines, an information retrieval tool are the main source of information for users' information need now a day. For every query, the search engine explores its repository and/or indexer to find the relevant documents/URLs for that... more
Purpose: Due to the exponential growth of internet users and internet traffic, information seekers are highly dependent upon search engines to extract relevant information. Due to the accessibility of a large amount of textual, audio,... more
In this paper, we put forward a technique for parallel crawling of the web. The World Wide Web today is growing at a phenomenal rate. It has enabled a publishing explosion of useful online information, which has produced the unfortunate... more
World Wide Web (WWW) is a huge repository of interlinked hypertext documents known as web pages. Users access these hypertext documents via Internet. Since its inception in 1990, WWW has become many folds in size, and now it contains more... more
The volume of world wide web ( WWW) is increasing enormously due to a world wide move to migrate information to online sources. To search some information on WWW, search engines are used, which when presented with queries, return a list... more
The World Wide Web has huge amount of information that is retrieved using information retrieval tool like Search Engine. Page repository of Search Engine contains the web documents downloaded by the crawler. This repository contains... more
Accurate gaze region estimation on the web is important for the purpose of placing marketing advertisements in web pages and monitoring authenticity of user’s response in web forms. To identify gaze region on the web, we need cheap, less... more
Existing search engines use web crawlers to gather web pages. The extracted information is used to build indexes, which are later used to answer user queries. This approach is useful for general queries, but ignores the special properties... more
The essence of a web page is an inherently predisposed issue, one that is built on behaviors, interests, and intelligence. There are relatively a ton of reasons web pages are critical to the new world, as the matter cannot be... more
Abstract. Search engines on the Web are valuable tools for searching information according to a user's interests whether an individual or a software agent. In the present article we describe the design and the operation mode of... more
This paper considers the use of controlled languages for query translation in a legislative document retrieval system. Problem statement and analysis of the approach are described. The use of controlled languages is motivated by the fact... more
World Wide Web content continuously grows in size and importance. Furthermore, users ask Web search engines to satisfy increasingly disparate information needs. New techniques and tools are constantly developed aimed at assisting users in... more
Abstract. Through their interaction with search engines, users provide implicit feedback that can be used to extract useful knowledge and improve the quality of the search process. This feedback is encoded in the form of a query log that... more
Download research papers for free!