Expert document retrieval via semantic measurement

Joel  Jeffrey

doi:10.1016/0957-4174(91)90040-L

Outline

Expert document retrieval via semantic measurement

Joel Jeffrey

1991, Expert Systems With Applications

https://doi.org/10.1016/0957-4174(91)90040-L

visibility

…

description

8 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

A new technology for intelligent full text document retrieval is presented. The retrieval of a document is treated as an expert system problem, recognizing that human document retrieval is expert behavior. The technology is semantic measurement. A working prototype system, LIBRARY, has been built based on the technology. Input is a request for information, in unrestricted technical English; output is all documents with measured content similar to that of the request, ranked in order of relevance. Retrieval is unaffected by similarity or dissimilarity of terms between request and document. LIBRARY's performance is comparable to that of an expert human librarian, representing a significant improvement over traditional document retrieval systems.

Evelyne Mounier

Proceedings of the 21st annual international conference on Documentation - SIGDOC '03, 2003

Information retrieval systems within voluminous textual documents raise specific problems, such as the choice of the retrieval-unit and the relevance of each response. For the selection of the retrieval-unit, several solutions have been proposed, such as the exploitation of the document logical structure. In most cases, a measure of the retrieval-unit relevance is assessed using criteria, such as the number of occurrences of query terms in the document and their position in the document.. Few systems are user centered designed and are adapted to the taskthey are supposed to assist: usually, these systems are based on paper-aid documentation electronically recorded with a standard information retrieval module. Sysrit (technical information retrieval system), a system under development, is aimed at users expert in the search of technical documents. The conception of Sysrit is based on observations made on these users. In this system, a technical document is automatically segmented into paragraphs (called information units. In order to improve the relevance of the responses given to the users, Sysrit proposes to tag the information units. Indeed, we make the assumption that a response is all the more relevant since it belongs to the same category as the query. We show that queries and information units can be first categorized in two types: the OBJECT (which corresponds to object descriptions) and the PRO type (which concerns procedural descriptions). A detailed study of the OBJECT type shows that it is heterogeneous and covers different sub-types: objects descriptions (DO), definitions (DFI) and specifications descriptions (DF). Upon experimental validation with expert users, we first proposed to categorize the type of each information unit as either OBJECT or PRO., and second to subcategorize the OBJECT units as DO, DFI or DF. We here focus on queries more than on information units. A corpus analysis and a validation by expert users confirm that this categorization can also be used to characterize queries. Moreover, the results of this analysis enable us to propose rules in order to automatically recognize and tag each type of queries.

downloadDownload free PDF View PDFchevron_right

A Hybrid Model for Document Retrieval Systems

Donald Kraft

2022

A methodology for the design of document retrieval systems is presented. First, a composite index term weighting model is developed based on term frequency statis tics, including document frequency, relative frequency within document and relative frequency within collection, which can be adjusted by selecting various coefficients to fit into different indexing environments. Then, a composite retrieval model is pro posed to process a user's information request in a weighted Phrase-Oriented Fixed-Level Expression (POFLE), which may apply more than Boolean operators, through two phases. That is, we have a search for documents which are topically relevant to the information request by means of a descriptor matching mechanism, which incor porate a partial matching facility based on a structurally-restricted relationship imposed by indexing model, and is more general than matching functions of the tradi tional Boolean model and vector space model, and then we have a ranking of these topically relevant documents, by means of two types of heuristic-based selection rules and a knowledge-based evaluation function, in descending order of a preference score which predicts the combined effect of user preference for quality, recency, fitness and reachability of documents. v

downloadDownload free PDF View PDFchevron_right

Intelligent techniques for effective information retrieval

TANVEER IRSHAD SIDDIQUI

ACM SIGIR Forum, 2006

With the explosive growth of information, it is becoming increasingly difficult to retrieve the relevant documents with statistical means only. This begets new challenges to IR community and motivates researchers to look for intelligent Information Retrieval (IR) systems that search and/or filter information automatically based on some higher level of understanding are required. This higher level of understanding can only be achieved through processing of text based on semantics, which is not possible by considering a document as a "bag of words". We make a humble effort in this direction by investigating techniques that attempt to utilize semantics to improve effectiveness in IR. The hypothesis is that with an improved representation of documents and by incorporating limited semantic knowledge, it is possible to improve the effectiveness of an IR system.We propose the use of Conceptual Graph (CG) formalism for representing text. The level of semantic details to be capture...

downloadDownload free PDF View PDFchevron_right

A logic basis for information retrieval

carolyn watters

Information Processing & Management, 1987

This paper examines the potential of recent work in artificial intelligence for the development of more effective information retrieval systems. The primary task in this research has been to examine and define the role of an expert system in the domain of bibliographic retrieval. Once such a goal can be described the available knowledge representations and techniques can be evaluated. This paper examines the role of an expert bibliographic retrieval system, examines an artificial intelligence view of information retrieval, and then describes a prototype expert information retrieval system that has been designed and implemented.

downloadDownload free PDF View PDFchevron_right

ERSE: an expert retrieval system for electronic databases

Peretz Shoval

This paper describes an expert systemfor information retrieval in electronic databases.: ERSE. The objective of the system is to support engineering professionals in formulation proper queries and submitting them to a retrieval database. The system consists of: a(a) a knowledge-base, which is a thesaurus of terms and semantic relationships, implemented as a semantic network; (b) a search and evaluation mechanism: the inference engine, which conducts a guided search aimed at finding appropriate query terms. While doing so it invokes relevant knowledge, evaluates it and suggests final findings to the use; (c) a database of patents in the domain of error-correction codes, implemented with a Relational database management system; (d) a retrieval mechanism, which measures the similarity between the system generated weighted quer, and the index terms of patents, and returns a rank-ordered set of patents. The user is then able to provide feed-back and improve the query accordingly; (e) user interfaces, including system capability to explain its findings/ decisions. The system is implemented in Prolog, C and Ingres DBMS, under Unix. The system design is described, and examples of its operation and evaluation of its perfomance are given.

downloadDownload free PDF View PDFchevron_right

Document Information Retrieval

Stefan Agne

Advances in Pattern Recognition, 2007

downloadDownload free PDF View PDFchevron_right

Document Retrieval, Automatic

Elizabeth Liddy

Encyclopedia of Language & Linguistics, 2006

Document Retrieval is the computerized process of producing a relevance ranked list of documents in response to an inquirer's request by comparing their request to an automatically produced index of the documents in the system. Everyone uses such systems today in the form of web-based search engines. While evolving from a fairly small discipline in the 1940s, to a large, profitable industry today, the field has maintained a healthy research focus, supported by test collections and large-scale annual comparative tests of systems. A document retrieval system is comprised of three core modules: document processor, query analyzer, and matching function. There are several theoretical models on which document retrieval systems are based: Boolean, Vector Space, Probabilistic, and Language Model.

downloadDownload free PDF View PDFchevron_right

Improving retrieval experience exploiting semantic representation of documents

Annalina Caputo, Giovanni Semeraro

… Web Applications and …, 2008

The traditional strategy performed by Information Retrieval (IR) systems is ranked keyword search: for a given query, a list of documents, ordered by relevance, is returned. Relevance computation is primarily driven by a basic string-matching operation. To date, several attempts have been made to deviate from the traditional keyword search paradigm, often by introducing some techniques to capture word meanings in documents and queries. The general feeling is that dealing explicitly with only semantic information does not improve significantly the performance of text retrieval systems. This paper presents SENSE (SEmantic N-levels Search Engine), an IR system that tries to overcome the limitations of the ranked keyword approach, by introducing semantic levels which integrate (and not simply replace) the lexical level represented by keywords. Semantic levels provide information about word meanings, as described in a reference dictionary, and named entities. We show how SENSE is able to manage documents indexed at three separate levels, keywords, word meanings, and entities, as well as to combine keyword search with semantic information provided by the two other indexing levels.

downloadDownload free PDF View PDFchevron_right

User modeling in intelligent information retrieval

Carlo Tasso

Information Processing & Management, 1987

The issue of exploiting user modeling techniques in the framework of cooperative interfaces to complex artificial systems has recently received increasing attention. In this paper we present the IR-NLI II system, an expert interface that allows casual users to access online information retrieval systems and encompasses user modeling capabilities. More specifically, an illustration of the user modeling subsystem is given by describing the organization of the user model proposed for the particular application area, together with its use during system operation. The techniques utilized for the construction of the model are presented as well. They are based on the use of stereotypes, which are descriptions of typical classes of users. More specifically, they include both declarative and procedural knowledge for describing the features of the class to which the stereotype is related, for assigning a user to that class, and for acquiring and validating the necessary information during system operation. I. INTRODUCTION The development of cooperative user interfaces for supporting information retrieval systems has become a well-defined research and application field. It comprises both traditional systems, including menu-driven interaction, extensive online help, and keyword recognition and extraction-consider, for example, the work of Marcus [1,2] and Doszkos and Rapp [3]-and more advanced interfaces based on artificial intelligence techniques. This class includes, among others, the work of Pollitt [4-61, the systems developed by Croft and Thompson [7] and by Defude [8,9], and the IR-NLI interface designed and implemented by the authors [lo-121. From a general viewpoint, the design of an expert interface to an information retrieval system encompasses two major tasks: l How to overcome the linguistic gap between the user and the system. l How to support the user at the conceptual level in the analysis of his information needs, in the formulation of an appropriate search strategy, and in the evaluation of the obtained results. These issues, in turn, pose several technical problems, which include: l Natural language understanding and dialogue management. l Representation of subject knowledge in the domain of the search (including available data bases, their content, terminology and organization). l Representation of technical knowledge about information retrieval (session structure, query language, techniques for strategy construction). l Elicitation and representation of the intermediary's skill and expertise. l Design of appropriate problem-solving methods for knowledge processing and inference management.

downloadDownload free PDF View PDFchevron_right

An expert intermediary system for interactive document retrieval

Carlo Tasso

Automatica, 1983

Constructing natural language interfaces to computer systems often requires achievement of advanced reasoning and expert capabilities in addition to basic natural language understanding. In this paper the above issue is tackled in the frame of an actual application concerning the design of a natural language interface for interactive document retrieval. After a short discussion of the peculiarities of this application, which requires both natural language understanding and reasoning capabilities, the general architecture and fundamental design criteria of a system presently being developed at the University of Udine are presented. The system, named IR-NLI, is aimed at allowing non-technical users to directly access through natural language the services offered by online databases. Attention is later focused on the basic functions of IR-NLI, namely understanding, dialogue and reasoning. An example of interaction with IR-NLI is fully worked out to introduce the main features of the system. Knowledge representation methods and algorithms adopted are then illustrated. Perspectives and direction for future research are also discussed.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (7)

Blair, D.C., & Maron, M.E. (1985). An evaluation of retrieval ef- fectiveness for a full-text document-retrieval system. Communi- cations of the ACM, 28, 289-299.
Thompson, W.B., & Thompson, R.H. ( 1987 ). 13R: A new approach to the design of document retrieval systems. Journal oftheAmer- ican Society for Information Science. 38(6), 389-404.
De Salvo, D.A., Glamm, A.E., & Liebowitz, J. (1987). Structured design of an expert system prototype at the national archives. In B.G. Silverman (Ed.), Expert systems for business (40-77)
Reading, MA: Addison Wesley.
Ossorio, P.G. (1966). Classification space. Multivariate Behavioral Research, 1,479-524.
Radecki, T. (1979). Fuzzy set theoretical approach to document retrieval. Information Processing and Management, 15 ( 5 ), 247- 259.
Salton, G. Automatic text processing. Reading, MA: Addison Wesley.

Peretz Shoval

Information Processing & Management, 1985

An expert system was developed in the area of information retrieval, with the objective of performing the job of an information specialist who assists users in selecting the right vocabulary terms for a database search. The system is composed of two components: one is the knowledge base, represented as a semantic network, in which the nodes are words, concepts and phrases comprising a cocabulary of the application area, and the links express semantic relationships between those nodes. The second component is the rules, or procedures, which operate upon the knowledge-base, analogous to the decision rules or work patterns of the information specialist. Two major stages comprise the consulting process of the system: During the “search” stage, relevant knowledge in the semantic network is activated, and search and evaluation rules are applied in order to find appropriate vocabulary terms to represent the user's need During the “suggest” stage, those terms are further evaluated, dynamically rank-ordered according to relevancy, and suggested to the user. Explanations to the findings can be provided by the system and backtracking is possible in order to find alternatives in case some suggested term is rejected by the user. This article presents the principle, procedures and rules that are utilized in the expert system.

downloadDownload free PDF View PDFchevron_right

An expert system for searching in full-text

Susan Gauch

1989

This project applies expert system technology to the task of searching online full-text documents. We are developing an intelligent search intermediary to help end-users locate relevant passages in large full-text databases. Our expert system automatically reformulates contextual Boolean queries to improve search results and presents retrieved passages in decreasing order of estimated relevance. It differs from other intelligent database functions in two ways: it works with semantically unprocessed text and the expert systems contains a knowledge base of search strategies independent of any particular content domain. The goals for our current project are to demonstrate the feasibility of the approach and to evaluate the effectiveness of the system through a controlled experiment. Whle the work we report here has limited objectives, the system and techniques are general and can be extended to large, real-world databases. IN Q 4 AV1± _S 4(. [-FC

downloadDownload free PDF View PDFchevron_right

Improving The Effectiveness of Texts Retrieval using Knowledge-Based Approach

WARSE The World Academy of Research in Science and Engineering

International Journal of Emerging Trends in Engineering Research, 2020

A predicate-based document query language is proposed to allow users to define the search criteria precisely and reliably, and their knowledge of the documents to be retrieved. A guided search tool is built as an intelligent user interface oriented to the natural language in order to help users formulate queries. Supported by a generator of intelligent questions, an inference engine, a query base. A problem is faced when using the modern IR systems. It's represented in the vocabulary problem. This problem is represented in the inconsistencies between the terms which are used to describe the terms and the documents that are used by the investigator for describing their need for knowledge. The researcher has an automated thesaurus. This device has been designed using the Vector Space Model (VSM). The researcher used the similarity calculation of Cosine in this method. He used the selected 242 abstract Arabic documents in this article. All these abstracts include the information and computer science process. This paper aimed at building and designing automated Arabic thesauri through the use of the term similarity which could be employed in any particular domain or field for improving the process of expansion and obtaining greater number of relevant documents for the user query. In terms of recall and precision rates, it was found that the similar thesaurus is more capable than the conventional information retrieval system to enhance the recall process and precision.

downloadDownload free PDF View PDFchevron_right

A Knowledge-Based Approach to Effective Document Retrieval

Gary Thomas

Journal of Systems Integration, 2001

This paper presents a knowledge-based approach to effective document retrieval. This approach is based on a dual document model that consists of a document type hierarchy and a folder organization. A predicate-based document query language is proposed to enable users to precisely and accurately specify the search criteria and their knowledge about the documents to be retrieved. A guided search tool is developed as an intelligent natural language oriented user interface to assist users formulating queries. Supported by an intelligent question generator, an inference engine, a question base, and a predicate-based query composer, the guided search collects the most important information known to the user to retrieve the documents that satisfy users' particular interests. A knowledge-based query processing and search engine is devised as the core component in this approach. Algorithms are developed for the search engine to effectively and ef®ciently retrieve the documents that match the query.

downloadDownload free PDF View PDFchevron_right

A simple solution for an intelligent librarian system

Serge Linckels

In this paper, we describe a method to retrieve documents based on the semantics of a user's question, rather than on keywords. It is based on domain ontology and on RDF (Resource Description Framework) described knowledge. The user enters his question in natural language, which is then converted into an interpreted sentence of semantically relevant words. Each word is classified in a hierarchy of concepts in order to compute the exact semantics and the degree of importance for each of the used words. The interpreted user question is then mapped to a general assertion. Finally, a semantic query is generated, according to the mapped general assertion and the object values derived from the user question. Hence, only semantically relevant documents (resources) are found. This semantic retrieval method was implemented in an e-learning tool called CHESt (Computer History Expert System). We also present in this paper the results drawn from experiments with this tool, which was used as...

downloadDownload free PDF View PDFchevron_right

An Information Retrieval using weighted Index Terms in Natural Language document collections

Osman Ali

IBIMA 2005, 2005

Indexing a document is the method for describing its content for sake of easier subsequent retrieval in a document storage. This paper describes the implementation of the automatic indexing of various term weighting schemes in an IR (Information Retrieval) system using CISI documents collection which constitutes of abstracts for information retrieval papers and NPL collection which constitutes of abstracts for electronic engineering documents. The system starts with a simple form of text representation in which extracts keywords that represent documents as vectors of weights that represent the importance of keywords in documents of the documents collection and then evaluates, compares the retrieval effectiveness of various search models based on automatic text-word indexing and presents experimental results conduct to study the improvements made on the effectiveness of the text retrieval by successively applying these approaches.

downloadDownload free PDF View PDFchevron_right

Latent Semantic Indexing based Intelligent Information Retrieval System for Digital Libraries

Aswani Kumar Cherukuri

Journal of Computing and Information Technology, 2006

To the information retrieval research community, a digital library can be viewed as an extended information retrieval system. The primary goal of an information retrieval system is to retrieve all the relevant documents, which are relevant to the user query. Disparities between the vocabulary of the system's authors and that of their users pose difficulties when information is processed without human intervention. In this paper, we present a novel approach to enhance the efficiency of the information retrieval system using intelligent information processing technique. Experiments carried out are giving most encouraging results.

downloadDownload free PDF View PDFchevron_right

New Information Retrieval Approach Based on Semantic Indexing by Meaning

lobna hlaoua

Proceedings of the 16th International Conference on Applied Computing 2019, 2019

An Information Retrieval System (IRS) offers a number of tools and techniques, which enable to locate and visualize the relevant information needed. This information, is expressed by the user in the form of a query natural language. However, the representation of documents and the query in a traditional IRS lead to a lexical-centered relevance estimation which is, in fact, less efficient than a semantic-focused estimation. As a consequence, the documents that are actually relevant are not being recovered if they do not share words with the query, while the documents non relevant, which are words in common with the query, are recovered even though at times they do not have the meaning intended. This paper tackles this problem while suggesting a solution in the level of indexation of an IRS allowing it to improve its performance. To be more precise, we suggest a new approach of semantic indexation allowing to lead to the exact meaning of each term in a document or query undergoing a contextual analysis at the sentence level. In fact, if the system is able to comprehend the need of the user, then consequently it is perfectly capable to respond to it. Add to that, we suggest a simple method allowing to apply any model of IR on our new index table without changing its original bases making it faster. In order to validate this proposed approach, this new created system is evaluated base on numerous collections naming "TIME", "BBC", "The Guardian" and "BigThink". The results based on the experiments indicate the efficacy of our hypothesis compared to traditional IR approaches.

downloadDownload free PDF View PDFchevron_right

Computational model for the processing of documents and support to the decision making in systems of information retrieval

juan Febles

2017

Disposing or not, of the necessary information at the right time, can mean the success or failure of any operation.. The field of information retrieval since its inception in the year 1950, has provided tools that allow users to find answers to their needs and questions. Information retrieval systems are the most used internationally, since they have interfaces and functionalities easy to understand. The main function of these systems is track the web, store the information found and then respond to user queries. Due to the large amount of information that have search engines, are a rich source of knowledge and support decision-making on information published on the web. Companies like Google do not provide concrete information of which models they use to develop the components of their search engines. In addition the calculation of the relevance of their documents responds to commercial and governmental policies, reason why it is difficult to develop systems as complex as the search engines without owning a computational model that supports the process of development of the same. The present article gives the design of a computational model for document processing and support decision-making in information retrieval systems used to design, development and deployment of searchers at national and international level.

downloadDownload free PDF View PDFchevron_right

Using semantic components to search for domain-specific documents: An evaluation from the system perspective and the user perspective

Peter Vedsted

Information Systems, 2009

We seek to leverage an expert user's knowledge about how information is organized in a domain and how information is presented in typical documents within a particular domain-specific collection, to effectively and efficiently meet the expert's targeted information needs. We have developed the semantic components model to describe important semantic content within documents. The semantic components model for a given collection (based on a general understanding of the type of information needs expected) consists of a set of document classes, where each class has an associated set of semantic components. Each semantic component instance consists of segments of text about a particular aspect of the main topic of the document and may not correspond to structural elements in the document. The semantic components model represents document content in a manner that is complementary to full text and keyword indexing. This paper describes how the semantic components model can be used to improve an information retrieval system. We present experimental evidence from a large interactive searching study that compared the use of semantic components in a system with full text and keyword indexing, where we extended the query language to allow users to search using semantic components, to a base system that did not have semantic components. We evaluate the systems from a system perspective, where semantic components were shown to improve document ranking for precision-oriented searches, and from a user perspective. We also evaluate the systems from a sessionbased perspective, evaluating not only the results of individual queries but also the results of multiple queries during a single interactive query session.

downloadDownload free PDF View PDFchevron_right

Expert document retrieval via semantic measurement

Sign up for access to the world's latest research

Abstract

Related papers

References (7)

Related papers

Related topics