The SRI MUC-5 JV-FASTUS In-formation Extraction System

David Israel

Outline

Title

Abstract

Introduction and Background

Results of the Evaluation

Conclusion

References

The SRI MUC-5 JV-FASTUS In-formation Extraction System

David Israel

1993, Muc

visibility

…

description

15 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract
AI

The SRI MUC-5 JV-FASTUS information extraction system is an advancement in extracting relevant information from texts, specifically tailored for tasks like identifying details from terrorism-related articles. Unlike comprehensive text understanding systems, FASTUS employs nondeterministic finite-state transducers for efficient task-specific extraction, allowing for rapid development and high performance in various domains. The MUC-5 evaluation confirmed the system's efficacy and readiness for real-world applications, demonstrating its capability to handle large-scale information extraction tasks effectively.

John Bear

Computing Research Repository, 1997

FASTUS is a system for extracting information from natural language text for entry into a database and for other applications. It works essentially as a cascaded, nondeterministic finite-state automaton. There are five stages in the operation of FASTUS. In Stage 1, names and other fixed form expressions are recognized. In Stage 2, basic noun groups, verb groups, and prepositions and some other particles are recognized. In Stage 3, certain complex noun groups and verb groups are constructed. Patterns for events of interest are identified in Stage 4 and corresponding "event structures" are built. In Stage 5, distinct event structures that describe the same event are identified and merged, and these are used in generating database entries. This decomposition of language processing enables the system to do exactly the right amount of domain-independent syntax, so that domain-dependent semantic and pragmatic processing can be applied to the right larger-scale structures. FASTUS is very efficient and effective, and has been used successfully in a number of applications.

downloadDownload free PDF View PDFchevron_right

Natural Language Tools for Information Extraction for Soft Target Exploitation and Fusion Final Report for Letter Subcontract No. S690000049

Shane Axtell

2007

downloadDownload free PDF View PDFchevron_right

Towards a Decision Support System for Text Intepretation

Alessia Bellusci

downloadDownload free PDF View PDFchevron_right

Exploring the inference role in automatic information extraction from texts

Denis A de Araujo

2013

In this paper we present a novel methodology for automatic information extraction from natural language texts, based on the integration of linguistic rules, multiple ontologies and inference resources, integrated with an abstraction layer for linguistic annotation and data representation. The SAURON system was developed to implement and integrate the methodology phases. The knowledge domain of legal realm has been used for the case study scenario through a corpus collected from the State Superior Court website in Brazil. The main contribution presented is related to the exploration of the flexibility of linguistic rules and domain knowledge representation, through their manipulation and integration by a reasoning system. Therefore, it is possible to the system to continuously interact with linguistic and domain experts in order to improve the set of linguistic rules or the ontology components. The results from the case study indicate that the proposed approach is effective for the l...

downloadDownload free PDF View PDFchevron_right

A modular information extraction system

Maya Gorodetsky

Intelligent Data Analysis, 2008

In today's information age, the amount of text documents available electronically (on the Web, on corporate intranets, on news wires and elsewhere) is overwhelming. Search engines and information retrieval, while useful to find documents that satisfy a certain query, offer little help with analyzing the unstructured documents themselves. Text Mining is the automated process of analyzing unstructured, natural language text in order to discover information and knowledge that are difficult to retrieve. Information Extraction (IE) centers on finding entities and relations in free text and provides a solid foundation for text mining. In this paper we present a modular IE system, based on the DIAL language. DIAL allows users to implement IE solutions for various domains rapidly, based on a common Natural Language Processing (NLP) infrastructure. We demonstrate in detail an implementation of a system for extracting relations in the intelligence news domain. We present an evaluation of our system and discuss enhancements for other domains, such as emails.

downloadDownload free PDF View PDFchevron_right

Information Extraction: Past, Present and Future

Jakub Piskorski

Multi-source, Multilingual Information Extraction and Summarization, 2012

In this chapter we present a brief overview of Information Extraction, which is an area of natural language processing that deals with finding factual information in free text. In formal terms, facts are structured objects, such as database records. Such a record may capture a real-world entity with its attributes mentioned in text, or a real-world event, occurrence, or state, with its arguments or actors: who did what to whom, where and when. Information is typically sought in a particular target setting, e.g., corporate mergers and acquisitions. Searching for specific, targeted factual information constitutes a large proportion of all searching activity on the part of information consumers. There has been a sustained interest in Information Extraction for over two decades, due to its conceptual simplicity on one hand, and to its potential utility on the other. Although the targeted nature of this task makes it more tractable than some of the more open-ended tasks in NLP, it is replete with challenges as the information landscape evolves, which also makes it an exciting research subject. 2.1 Introduction The recent decades witnessed a rapid proliferation of textual information available in digital form in a myriad of repositories on the Internet and intranets. A significant part of such information-e.g., online news, government documents, corporate reports, legal acts, medical alerts and records, court rulings, and social media

downloadDownload free PDF View PDFchevron_right

Information Extraction tasks: a survey

Luisa Coheur

2009

An Information Extraction activity is a complex process that can be decomposed into several tasks. This decomposition brings the following advantages: (i) for each task it becomes possible to choose the best technique independently from the other tasks; (ii) an Information Extraction program can be developed as a set of independent modules (one for each task), making it easy to perform local debugging; (iii) it becomes easy to customize the Information Extraction activity through reordering, selection or even composition the tasks. This paper presents a commonly used decomposition of the Information Extraction activities and gives detail about the most used machine learning and rule-based techniques for each task.

downloadDownload free PDF View PDFchevron_right

Enhancing Information Extraction with Context and Inference

Franz Kurfess

Advancing Information Management through Semantic Web Concepts and Ontologies, 2013

Natural Language Processing (NLP) provides tools to extract explicitly stated information from text documents. These tools include Named Entity Recognition (NER) and Parts-Of-Speech (POS). The extracted information represents discrete entities in the text and some relationships that may exist among them. To perform intelligent analysis on the extracted information a context has to exist in which this information is placed. The context provides an environment to link information that is extracted from multiple documents and offers a big picture of the domain. Analysis can then be provided by adding inference capabilities to the environment. The ODIX platform provides an environment for bringing together information extraction, ontology, and intelligent analysis. The platform design relies on existing NLP tools to provide the information extraction capabilities. It also utilizes a Web crawler to collect text documents from the Web. The context is provided by a domain ontology that is loaded at run time. The ontology offers limited inference capabilities and external intelligent agents offer more advanced reasoning capabilities. User involvement is key to the success of the analysis process. At every step of the process, the user has the opportunity to direct the system, set selection criteria, correct errors, or add additional information.

downloadDownload free PDF View PDFchevron_right

Evaluating an information extraction system

Wendy Lehnert

1994

Many natural language researchers are now turning their attention to a relatively new task orientation known as information extraction. Information extraction systems are predicated on an I/O orientation that makes it possible to conduct formal evaluations and meaningful cross-system comparisons. This paper presents the challenge of information extraction and shows how information extraction systems are currently being evaluated. We describe a specific system developed at the University of Massachusetts, identify key research issues of general interest, and conclude with some observations about the role of performance evaluations as a stimulus for basic research.

downloadDownload free PDF View PDFchevron_right

A new approach to text understanding

Sean Boisen

1992

This paper first briefly describes the architecture of PLUM, BBN's text processing system, and then reports on some experiments evaluating the effectiveness of the design at the component level. Three features are unusual in PLUM's architecture: a domainindependent deterrninistie parser, processing of (the resulting) fragments at the semantic and discourse level, and probabilistie models.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (1)

CN: "BRIDGESTONE " (0,1) Head: BRIDGESTONE NG: "SPORTS " (1,2) Head: SPORTS ACTIVE/PASSIVE: "SAID " (3,4) Head: SAID NG: "FRIDAY " (4,5) Head: FRIDAY NG: "IT " (5,6) Head: IT ACTIVE: "HAS SET " (6,8) Head: SET PREP: "UP " (8,9) Head: UP NG: "JOINT-VENTURE " (9,12) Head: JOINT-VENTURE PREP: "IN " (12,13) Head: IN LOC: "TAIWAN " (13,14) Head: TAIWAN PREP: "WITH " (14,15) Head: WITH NG: "LOCAL CONCERN " (15,18) Head: CONCERN CONJ: "AND " (18,19) Head: AND NG: "JAPANESE TRADING HOUSE " (19,23) Head: HOUSE INF: "TO PRODUCE " (23,25) Head: PRODUCE NG: "GOLF CLUBS " (25,27) Head: CLUBS INF: "TO BE " (27,29) Head: BE ACTIVE/PASSIVE: "SHIPPED " (29,30) Head: SHIPPED PREP: "TO " (30,31) Head: TO LOC: "JAPAN " (31,32) Head: JAPAN <ACTIVITY-592-22> := INDUSTRY: <INDUSTRY-592-22> ACTIVITY-SITE: (Taiwan (COUNTRY) -)

John Bear

1993

downloadDownload free PDF View PDFchevron_right

FASTUS: A System for Extracting Information from Natural-Language Text

John Bear

1992

FASTUS is a system for extracting information from free text in English, and potentially other languages as well, for entry into a database, and potentially for other applications. It works essentially as a cascaded, nondeterministic finite state automaton. There are four steps in the operation of FASTUS. In Step 1 sentences are scanned for certain trigger words to determine whether further processing should be done. In Step 2 noun groups, verb groups, and prepositions and some other particles are recognized. The input to Step 3 is the sequence of phrases recognized in Step 2; patterns of interest are identified in Step 3 and corresponding "incident structures" are built up. In Step 4 incident structures that derive from the same incident are identified and merged, and these are used in generating database entries. FASTUS is an order of magnitude faster than any comparable system; it can process a news report in an average of less than eleven seconds. This translates directly into fast development time. In the three and a half weeks between its first use and the MUC-4 evaluation in May 1992, we were able to build up its domain knowledge to a point where it was among the leaders in the evaluation.

downloadDownload free PDF View PDFchevron_right

FASTUS: Extracting Information from Natural Language Texts

David Israel

2000

downloadDownload free PDF View PDFchevron_right

Information extraction research and applications: current progress and future directions

David Israel

Proceedings of a …, 1998

downloadDownload free PDF View PDFchevron_right

FASTUS: A finite-state processor for information extraction from real-world text

David Israel

… JOINT CONFERENCE ON …, 1993

Approaches to text processing that rely on parsing the text with a context-free grammar tend to be slow and error-prone because of the massive ambiguity of long sentences. In contrast, FASTUS employs a nondeterministic finite-state language model that produces a phrasal decomposition of a sentence into noun groups, verb groups and particles. Another finite-state machine recognizes domain-specific phrases based on combinations of the heads of the constituents found in the first pass. FAS-TUS has been evaluated on several blind tests that demonstrate that state-of-the-art performance on information-extraction tasks is obtainable with surprisingly little computational effort.

downloadDownload free PDF View PDFchevron_right

Partnering enhanced-NLP with semantic analysis in support of information extraction

John Seng, Hisham Assal, Franz Kurfess

Ontology-Driven Software Engineering on - ODiSE'10, 2010

Information extraction using Natural Language Processing (NLP) tools focuses on extracting explicitly stated information from textual material. This includes Named Entity Recognition (NER), which produces entities and some of the relationships that may exist among them. Intelligent analysis requires examining the entities in the context of the entire document. While some of the relationships among the recognized entities may be preserved during extraction, the overall context of a document may not be preserved. In order to perform intelligent analysis on the extracted information, we provide an ontology, which describes the domain of the extracted information, in addition to rules that govern the classification and interpretation of added elements. The ontology is at the core of an interactive system that assists analysts with the collection, extraction, organization, analysis and retrieval of information, with the topic of "terrorism financing" as a case study. User interaction provides valuable assistance in assigning meaning to extracted information. The system is designed as a set of tools to provide the user with the flexibility and power to ensure accurate inference. This case study demonstrates the information extraction features as well as the inference power that is supported by the ontology.

downloadDownload free PDF View PDFchevron_right

SERAPHIN, main sentences automatic extraction system. 1

Jean-Luc Minel

INTRODUCTION The SERAPHIN project intends to give a solution to the problem of the over-whelming textual information which is drowning people in research centers and big companies. It is being dealt with under the partnership of : (1) CAMS, a research laboratory which developed several applications on text comprehension founded on the Cognitive and Applicative Grammar of J.P. Descls, and (2) the Company EDF/DER which is developing tools and methods for assistance to natural language processing. The needs in textual document analysis and abstracting of a Company such as EDF/DER are important and multiform; they deal with "strategic" domains for technological survey such as domotics, environment, green house effect, electric vehicle... Several types of applications may be developed: assistance to technological survey, automatic constitution of bibliographic periodicals, automatic writing aid for internal reports synthesis in Companies, aided technological survey, info

downloadDownload free PDF View PDFchevron_right

A Hybrid Agent for Automatically Determining and Extracting the 5Ws of Filipino News Articles

Charibeth Cheng

2017

As the number of sources of unstructured data continues to grow exponentially, manually reading through all this data becomes notoriously time consuming. Thus, there is a need for faster understanding and processing of this data. This can be achieved by automating the task through the use of information extraction. In this paper, we present an agent that automatically detects and extracts the 5Ws, namely the who, when, where, what, and why from Filipino news articles using a hybrid of machine learning and linguistic rules. The agent caters specifically to the Filipino language by working with its unique features such as ambiguous prepositions and markers, focus instead of subject and predicate, dialect influences, and others. In order to be able to maximize machine learning algorithms, techniques such as linguistic tagging and weighted decision trees are used to preprocess and filter the data as well as refine the final results. The results show that the agent achieved an accuracy o...

downloadDownload free PDF View PDFchevron_right

The SRI MUC-5 JV-FASTUS In-formation Extraction System

Sign up for access to the world's latest research

AbstractAI

Related papers

References (1)

Related papers

Abstract
AI