Expert document retrieval via semantic measurement
1991, Expert Systems With Applications
https://doi.org/10.1016/0957-4174(91)90040-L…
8 pages
1 file
Sign up for access to the world's latest research
Abstract
A new technology for intelligent full text document retrieval is presented. The retrieval of a document is treated as an expert system problem, recognizing that human document retrieval is expert behavior. The technology is semantic measurement. A working prototype system, LIBRARY, has been built based on the technology. Input is a request for information, in unrestricted technical English; output is all documents with measured content similar to that of the request, ranked in order of relevance. Retrieval is unaffected by similarity or dissimilarity of terms between request and document. LIBRARY's performance is comparable to that of an expert human librarian, representing a significant improvement over traditional document retrieval systems.
Related papers
Proceedings of the 21st annual international conference on Documentation - SIGDOC '03, 2003
Information retrieval systems within voluminous textual documents raise specific problems, such as the choice of the retrieval-unit and the relevance of each response. For the selection of the retrieval-unit, several solutions have been proposed, such as the exploitation of the document logical structure. In most cases, a measure of the retrieval-unit relevance is assessed using criteria, such as the number of occurrences of query terms in the document and their position in the document.. Few systems are user centered designed and are adapted to the taskthey are supposed to assist: usually, these systems are based on paper-aid documentation electronically recorded with a standard information retrieval module. Sysrit (technical information retrieval system), a system under development, is aimed at users expert in the search of technical documents. The conception of Sysrit is based on observations made on these users. In this system, a technical document is automatically segmented into paragraphs (called information units. In order to improve the relevance of the responses given to the users, Sysrit proposes to tag the information units. Indeed, we make the assumption that a response is all the more relevant since it belongs to the same category as the query. We show that queries and information units can be first categorized in two types: the OBJECT (which corresponds to object descriptions) and the PRO type (which concerns procedural descriptions). A detailed study of the OBJECT type shows that it is heterogeneous and covers different sub-types: objects descriptions (DO), definitions (DFI) and specifications descriptions (DF). Upon experimental validation with expert users, we first proposed to categorize the type of each information unit as either OBJECT or PRO., and second to subcategorize the OBJECT units as DO, DFI or DF. We here focus on queries more than on information units. A corpus analysis and a validation by expert users confirm that this categorization can also be used to characterize queries. Moreover, the results of this analysis enable us to propose rules in order to automatically recognize and tag each type of queries.
2022
A methodology for the design of document retrieval systems is presented. First, a composite index term weighting model is developed based on term frequency statis tics, including document frequency, relative frequency within document and relative frequency within collection, which can be adjusted by selecting various coefficients to fit into different indexing environments. Then, a composite retrieval model is pro posed to process a user's information request in a weighted Phrase-Oriented Fixed-Level Expression (POFLE), which may apply more than Boolean operators, through two phases. That is, we have a search for documents which are topically relevant to the information request by means of a descriptor matching mechanism, which incor porate a partial matching facility based on a structurally-restricted relationship imposed by indexing model, and is more general than matching functions of the tradi tional Boolean model and vector space model, and then we have a ranking of these topically relevant documents, by means of two types of heuristic-based selection rules and a knowledge-based evaluation function, in descending order of a preference score which predicts the combined effect of user preference for quality, recency, fitness and reachability of documents. v
ACM SIGIR Forum, 2006
With the explosive growth of information, it is becoming increasingly difficult to retrieve the relevant documents with statistical means only. This begets new challenges to IR community and motivates researchers to look for intelligent Information Retrieval (IR) systems that search and/or filter information automatically based on some higher level of understanding are required. This higher level of understanding can only be achieved through processing of text based on semantics, which is not possible by considering a document as a "bag of words". We make a humble effort in this direction by investigating techniques that attempt to utilize semantics to improve effectiveness in IR. The hypothesis is that with an improved representation of documents and by incorporating limited semantic knowledge, it is possible to improve the effectiveness of an IR system.We propose the use of Conceptual Graph (CG) formalism for representing text. The level of semantic details to be capture...
Information Processing & Management, 1987
This paper examines the potential of recent work in artificial intelligence for the development of more effective information retrieval systems. The primary task in this research has been to examine and define the role of an expert system in the domain of bibliographic retrieval. Once such a goal can be described the available knowledge representations and techniques can be evaluated. This paper examines the role of an expert bibliographic retrieval system, examines an artificial intelligence view of information retrieval, and then describes a prototype expert information retrieval system that has been designed and implemented.
This paper describes an expert systemfor information retrieval in electronic databases.: ERSE. The objective of the system is to support engineering professionals in formulation proper queries and submitting them to a retrieval database. The system consists of: a(a) a knowledge-base, which is a thesaurus of terms and semantic relationships, implemented as a semantic network; (b) a search and evaluation mechanism: the inference engine, which conducts a guided search aimed at finding appropriate query terms. While doing so it invokes relevant knowledge, evaluates it and suggests final findings to the use; (c) a database of patents in the domain of error-correction codes, implemented with a Relational database management system; (d) a retrieval mechanism, which measures the similarity between the system generated weighted quer, and the index terms of patents, and returns a rank-ordered set of patents. The user is then able to provide feed-back and improve the query accordingly; (e) user interfaces, including system capability to explain its findings/ decisions. The system is implemented in Prolog, C and Ingres DBMS, under Unix. The system design is described, and examples of its operation and evaluation of its perfomance are given.
Advances in Pattern Recognition, 2007
Encyclopedia of Language & Linguistics, 2006
Document Retrieval is the computerized process of producing a relevance ranked list of documents in response to an inquirer's request by comparing their request to an automatically produced index of the documents in the system. Everyone uses such systems today in the form of web-based search engines. While evolving from a fairly small discipline in the 1940s, to a large, profitable industry today, the field has maintained a healthy research focus, supported by test collections and large-scale annual comparative tests of systems. A document retrieval system is comprised of three core modules: document processor, query analyzer, and matching function. There are several theoretical models on which document retrieval systems are based: Boolean, Vector Space, Probabilistic, and Language Model.
… Web Applications and …, 2008
The traditional strategy performed by Information Retrieval (IR) systems is ranked keyword search: for a given query, a list of documents, ordered by relevance, is returned. Relevance computation is primarily driven by a basic string-matching operation. To date, several attempts have been made to deviate from the traditional keyword search paradigm, often by introducing some techniques to capture word meanings in documents and queries. The general feeling is that dealing explicitly with only semantic information does not improve significantly the performance of text retrieval systems. This paper presents SENSE (SEmantic N-levels Search Engine), an IR system that tries to overcome the limitations of the ranked keyword approach, by introducing semantic levels which integrate (and not simply replace) the lexical level represented by keywords. Semantic levels provide information about word meanings, as described in a reference dictionary, and named entities. We show how SENSE is able to manage documents indexed at three separate levels, keywords, word meanings, and entities, as well as to combine keyword search with semantic information provided by the two other indexing levels.
Information Processing & Management, 1987
The issue of exploiting user modeling techniques in the framework of cooperative interfaces to complex artificial systems has recently received increasing attention. In this paper we present the IR-NLI II system, an expert interface that allows casual users to access online information retrieval systems and encompasses user modeling capabilities. More specifically, an illustration of the user modeling subsystem is given by describing the organization of the user model proposed for the particular application area, together with its use during system operation. The techniques utilized for the construction of the model are presented as well. They are based on the use of stereotypes, which are descriptions of typical classes of users. More specifically, they include both declarative and procedural knowledge for describing the features of the class to which the stereotype is related, for assigning a user to that class, and for acquiring and validating the necessary information during system operation. I. INTRODUCTION The development of cooperative user interfaces for supporting information retrieval systems has become a well-defined research and application field. It comprises both traditional systems, including menu-driven interaction, extensive online help, and keyword recognition and extraction-consider, for example, the work of Marcus [1,2] and Doszkos and Rapp [3]-and more advanced interfaces based on artificial intelligence techniques. This class includes, among others, the work of Pollitt [4-61, the systems developed by Croft and Thompson [7] and by Defude [8,9], and the IR-NLI interface designed and implemented by the authors [lo-121. From a general viewpoint, the design of an expert interface to an information retrieval system encompasses two major tasks: l How to overcome the linguistic gap between the user and the system. l How to support the user at the conceptual level in the analysis of his information needs, in the formulation of an appropriate search strategy, and in the evaluation of the obtained results. These issues, in turn, pose several technical problems, which include: l Natural language understanding and dialogue management. l Representation of subject knowledge in the domain of the search (including available data bases, their content, terminology and organization). l Representation of technical knowledge about information retrieval (session structure, query language, techniques for strategy construction). l Elicitation and representation of the intermediary's skill and expertise. l Design of appropriate problem-solving methods for knowledge processing and inference management.
Automatica, 1983
Constructing natural language interfaces to computer systems often requires achievement of advanced reasoning and expert capabilities in addition to basic natural language understanding. In this paper the above issue is tackled in the frame of an actual application concerning the design of a natural language interface for interactive document retrieval. After a short discussion of the peculiarities of this application, which requires both natural language understanding and reasoning capabilities, the general architecture and fundamental design criteria of a system presently being developed at the University of Udine are presented. The system, named IR-NLI, is aimed at allowing non-technical users to directly access through natural language the services offered by online databases. Attention is later focused on the basic functions of IR-NLI, namely understanding, dialogue and reasoning. An example of interaction with IR-NLI is fully worked out to introduce the main features of the system. Knowledge representation methods and algorithms adopted are then illustrated. Perspectives and direction for future research are also discussed.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (7)
- Blair, D.C., & Maron, M.E. (1985). An evaluation of retrieval ef- fectiveness for a full-text document-retrieval system. Communi- cations of the ACM, 28, 289-299.
- Thompson, W.B., & Thompson, R.H. ( 1987 ). 13R: A new approach to the design of document retrieval systems. Journal oftheAmer- ican Society for Information Science. 38(6), 389-404.
- De Salvo, D.A., Glamm, A.E., & Liebowitz, J. (1987). Structured design of an expert system prototype at the national archives. In B.G. Silverman (Ed.), Expert systems for business (40-77)
- Reading, MA: Addison Wesley.
- Ossorio, P.G. (1966). Classification space. Multivariate Behavioral Research, 1,479-524.
- Radecki, T. (1979). Fuzzy set theoretical approach to document retrieval. Information Processing and Management, 15 ( 5 ), 247- 259.
- Salton, G. Automatic text processing. Reading, MA: Addison Wesley.