The SRI MUC-5 JV-FASTUS In-formation Extraction System
1993, Muc
…
15 pages
1 file
Sign up for access to the world's latest research
Abstract
AI
AI
The SRI MUC-5 JV-FASTUS information extraction system is an advancement in extracting relevant information from texts, specifically tailored for tasks like identifying details from terrorism-related articles. Unlike comprehensive text understanding systems, FASTUS employs nondeterministic finite-state transducers for efficient task-specific extraction, allowing for rapid development and high performance in various domains. The MUC-5 evaluation confirmed the system's efficacy and readiness for real-world applications, demonstrating its capability to handle large-scale information extraction tasks effectively.
Related papers
Computing Research Repository, 1997
FASTUS is a system for extracting information from natural language text for entry into a database and for other applications. It works essentially as a cascaded, nondeterministic finite-state automaton. There are five stages in the operation of FASTUS. In Stage 1, names and other fixed form expressions are recognized. In Stage 2, basic noun groups, verb groups, and prepositions and some other particles are recognized. In Stage 3, certain complex noun groups and verb groups are constructed. Patterns for events of interest are identified in Stage 4 and corresponding "event structures" are built. In Stage 5, distinct event structures that describe the same event are identified and merged, and these are used in generating database entries. This decomposition of language processing enables the system to do exactly the right amount of domain-independent syntax, so that domain-dependent semantic and pragmatic processing can be applied to the right larger-scale structures. FASTUS is very efficient and effective, and has been used successfully in a number of applications.
2013
In this paper we present a novel methodology for automatic information extraction from natural language texts, based on the integration of linguistic rules, multiple ontologies and inference resources, integrated with an abstraction layer for linguistic annotation and data representation. The SAURON system was developed to implement and integrate the methodology phases. The knowledge domain of legal realm has been used for the case study scenario through a corpus collected from the State Superior Court website in Brazil. The main contribution presented is related to the exploration of the flexibility of linguistic rules and domain knowledge representation, through their manipulation and integration by a reasoning system. Therefore, it is possible to the system to continuously interact with linguistic and domain experts in order to improve the set of linguistic rules or the ontology components. The results from the case study indicate that the proposed approach is effective for the l...
Intelligent Data Analysis, 2008
In today's information age, the amount of text documents available electronically (on the Web, on corporate intranets, on news wires and elsewhere) is overwhelming. Search engines and information retrieval, while useful to find documents that satisfy a certain query, offer little help with analyzing the unstructured documents themselves. Text Mining is the automated process of analyzing unstructured, natural language text in order to discover information and knowledge that are difficult to retrieve. Information Extraction (IE) centers on finding entities and relations in free text and provides a solid foundation for text mining. In this paper we present a modular IE system, based on the DIAL language. DIAL allows users to implement IE solutions for various domains rapidly, based on a common Natural Language Processing (NLP) infrastructure. We demonstrate in detail an implementation of a system for extracting relations in the intelligence news domain. We present an evaluation of our system and discuss enhancements for other domains, such as emails.
Multi-source, Multilingual Information Extraction and Summarization, 2012
In this chapter we present a brief overview of Information Extraction, which is an area of natural language processing that deals with finding factual information in free text. In formal terms, facts are structured objects, such as database records. Such a record may capture a real-world entity with its attributes mentioned in text, or a real-world event, occurrence, or state, with its arguments or actors: who did what to whom, where and when. Information is typically sought in a particular target setting, e.g., corporate mergers and acquisitions. Searching for specific, targeted factual information constitutes a large proportion of all searching activity on the part of information consumers. There has been a sustained interest in Information Extraction for over two decades, due to its conceptual simplicity on one hand, and to its potential utility on the other. Although the targeted nature of this task makes it more tractable than some of the more open-ended tasks in NLP, it is replete with challenges as the information landscape evolves, which also makes it an exciting research subject. 2.1 Introduction The recent decades witnessed a rapid proliferation of textual information available in digital form in a myriad of repositories on the Internet and intranets. A significant part of such information-e.g., online news, government documents, corporate reports, legal acts, medical alerts and records, court rulings, and social media
2009
An Information Extraction activity is a complex process that can be decomposed into several tasks. This decomposition brings the following advantages: (i) for each task it becomes possible to choose the best technique independently from the other tasks; (ii) an Information Extraction program can be developed as a set of independent modules (one for each task), making it easy to perform local debugging; (iii) it becomes easy to customize the Information Extraction activity through reordering, selection or even composition the tasks. This paper presents a commonly used decomposition of the Information Extraction activities and gives detail about the most used machine learning and rule-based techniques for each task.
Advancing Information Management through Semantic Web Concepts and Ontologies, 2013
Natural Language Processing (NLP) provides tools to extract explicitly stated information from text documents. These tools include Named Entity Recognition (NER) and Parts-Of-Speech (POS). The extracted information represents discrete entities in the text and some relationships that may exist among them. To perform intelligent analysis on the extracted information a context has to exist in which this information is placed. The context provides an environment to link information that is extracted from multiple documents and offers a big picture of the domain. Analysis can then be provided by adding inference capabilities to the environment. The ODIX platform provides an environment for bringing together information extraction, ontology, and intelligent analysis. The platform design relies on existing NLP tools to provide the information extraction capabilities. It also utilizes a Web crawler to collect text documents from the Web. The context is provided by a domain ontology that is loaded at run time. The ontology offers limited inference capabilities and external intelligent agents offer more advanced reasoning capabilities. User involvement is key to the success of the analysis process. At every step of the process, the user has the opportunity to direct the system, set selection criteria, correct errors, or add additional information.
1994
Many natural language researchers are now turning their attention to a relatively new task orientation known as information extraction. Information extraction systems are predicated on an I/O orientation that makes it possible to conduct formal evaluations and meaningful cross-system comparisons. This paper presents the challenge of information extraction and shows how information extraction systems are currently being evaluated. We describe a specific system developed at the University of Massachusetts, identify key research issues of general interest, and conclude with some observations about the role of performance evaluations as a stimulus for basic research.
1992
This paper first briefly describes the architecture of PLUM, BBN's text processing system, and then reports on some experiments evaluating the effectiveness of the design at the component level. Three features are unusual in PLUM's architecture: a domainindependent deterrninistie parser, processing of (the resulting) fragments at the semantic and discourse level, and probabilistie models.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (1)
- CN: "BRIDGESTONE " (0,1) Head: BRIDGESTONE NG: "SPORTS " (1,2) Head: SPORTS ACTIVE/PASSIVE: "SAID " (3,4) Head: SAID NG: "FRIDAY " (4,5) Head: FRIDAY NG: "IT " (5,6) Head: IT ACTIVE: "HAS SET " (6,8) Head: SET PREP: "UP " (8,9) Head: UP NG: "JOINT-VENTURE " (9,12) Head: JOINT-VENTURE PREP: "IN " (12,13) Head: IN LOC: "TAIWAN " (13,14) Head: TAIWAN PREP: "WITH " (14,15) Head: WITH NG: "LOCAL CONCERN " (15,18) Head: CONCERN CONJ: "AND " (18,19) Head: AND NG: "JAPANESE TRADING HOUSE " (19,23) Head: HOUSE INF: "TO PRODUCE " (23,25) Head: PRODUCE NG: "GOLF CLUBS " (25,27) Head: CLUBS INF: "TO BE " (27,29) Head: BE ACTIVE/PASSIVE: "SHIPPED " (29,30) Head: SHIPPED PREP: "TO " (30,31) Head: TO LOC: "JAPAN " (31,32) Head: JAPAN <ACTIVITY-592-22> := INDUSTRY: <INDUSTRY-592-22> ACTIVITY-SITE: (Taiwan (COUNTRY) -)