Specification of elaborated text structures
1999, AGILE deliverable …
Sign up for access to the world's latest research
Abstract
AI
AI
This report contains the TEXS2-Bu, TEXS2-Cz and TEXS2-Ru deliverables. The purpose of this set of deliverables is to present detailed descriptions of the text structures occurring in the AGILE corpora and in other sources used, and to specify the text structures and the style to be generated by the intermediate prototype. In addition, we present our initial ideas concerning the implementation of the text structuring module, which is to be described in detail in the deliverable TEXM2. Our approach to text structuring is set within the broader framework of systemic functional linguistics which is employed in the KPML system we use for tactical generation. The descriptions of text structures we present are based on an additional corpus study which extends the corpus study carried out in Work Package 3 and reported in [AGILE 3.1]. Our main aim in the corpus study has been to survey the variety of possible text structuring strategies and the corresponding repertoire of linguistic realisations used in each of the languages when expressing instructions. We concentrated on a selection of basic grammatical features and their deployment in the studied texts. In this report, we explain the choice of the analysed features and we present and discuss our observations.
Related papers
The paper discuses some problems of presentation and processing of linguistic knowledge, needed for the development of real-size Bulgarian linguistic resource to be used in a multilingual text generation system, covering software manuals sub-language. The sub-language volume is specified by corpus analysis. The text generation is based on the Systemic Functional Linguistics theory. A method for developing lexico-grammar resources by re-using an existing resource for another language is described and illustrated with three examples.
Proceedings of the workshop on Strategic computing natural language - HLT '86, 1986
The US military is an information-rich, computer intensive organization. It needs to have easy, understandable access to a wide variety of information. Currently, information is often in obscure computer notations that are only understood after extensive training and practice. Although easy discourse between users and machines is an important objective for any situation, this issue is particularly critical in regards to automated decision aids such as expert system based battle management systems that have to carry on a dialog with a force commander. A commander cain not afford to miss important information, nor is it reasonable to expect force commanders to undergo highly specialized training to understand obscure computer dialects which differ from machine to machine.
1997
The syntactic generator in the dialog translation system Verbmobil is fed by a microplanning component which { after a lexical choice step { generates an annotated dependency structure for the selected words. In order to make maximal use of this input, the Head-driven Phrase-Structure Grammar (HPSG) which is the basis for the syntactic generator is preprocessed to create the complete set of maximal projections from all lexical types in the grammar. With these projections, the generation task consists of nding a suitable combination of such projections. Although there remains a certain trade-o , this setup eliminates the need to apply the HPSG schemata online and allows the use of simpler and cheaper uni cation steps. The preprocessing we employ is also known as a`compilation' of HPSG to a Tree Adjoining Grammar (TAG) since the resulting projections are the elementary trees of a TAG grammar.
We are investigating computer-assisted methods for identifying plan operators at both the conversational strategy and surface generation levels. We are using standard-conforming SGML markup on our corpus in order to be able to process it mechanically. We are using C4.5 to identify rules of the form "when is goal x implemented with plan y?". We are currently testing these methods in the knowledge acquisition process for the text generation component of CIRCSIM-Tutor v. 3, a natural-language based intelligent tutoring system. Introduction CIRCSIM-Tutor is a conversational intelligent tutoring system (ITS) which uses natural language for both input and output. The text generation component of CIRCSIMTutor v. 3, which we are currently implementing, uses a two-phase architecture consonant with the consensus architecture described by Reiter (1994). A global, top-down tutorial planner chooses and instantiates a logic form for the system to say based on available information The t...
1985
The following paper concerns a general scheme for multilingual text generation, as opposed to just translation. Our system processes the text as a whole, from which it extracts a representation of the meaning of the text. From this representation, a new text is generated, using a text modei and action ruies.
Lrec, 2008
The Teko corpus composing model offers a decentralized, dynamic way of collecting high-quality text corpora for linguistic research. The resulting corpus consists of independent text sets. The sets are composed in cooperation with linguistic research projects, so each of them responds to a specific research need. The corpora are morphologically annotated and XML-based, with in-built compatibilty with the Kaino user interface used in the corpus server of the Research Institute for the Languages of Finland. Furthermore, software for extracting standard quantitative reports from the text sets has been created during the project. The paper describes the project, and estimates its benefits and problems. It also gives an overview of the technical qualities of the corpora and corpus interface connected to the Teko project.
1983
Programming a computer to write text which meets a prior need is a challenging research task. As part of such research, Nigel, a large programmed grammar of English, has been created in the framework of systemic linguistics begun by Halliday. In addition to specifying functions and structures of English, Nigel has a novel semantic stratum which specifies the situations in which each grammatical feature should be used. The report consists of three papers on Nigel: an introductory overview, the script of a demonstration of its use in generation, and an exposition of how Nigel relates to the systemic framework. Although the effort to develop Nigel is significant both as computer science research and as linguistic inquiry, the outlook of the report is oriented to its linguistic significance. U\ ip Unclassified SECURITY CLASSIFICATION OP THIS PAGEI(Pbeln DlM Enteed)
Language, 1992
Proceedings of the 13th conference on Computational linguistics -, 1990
The data flow in natural language generation (NLG) starts with a 'world' state, represented by structures of an application program (e.g., an expert system) that has text generation needs and an impetus to produce a natural language text. The output of generation is a natural language text. The generation process involves the tasks of a) delimiting the content of the eventual text, b) plano ning its structure, c) selecting lexieal, syntactic and word order me,'ms of realizing this structure and d) actually realizing the textusing the latter. In advanced generation systems these processes are treated not in a monolithic way, but rather as components of a large, modular generator. NLG researchers experiment with various ways of delimiting the modules of the generation process and control architectures to drive these modules (see, for instance, . But regardless of the decisions about general (intermodular) or local (intramodular) control flow, knowledge structures have to be defined to support processing and facilitate communication among the modules.
1999
We present the rags (Reference Architecture for Generation Systems) framework: a specification of an abstract Natural Language Generation (NLG) system architecture to support sharing, re-use, comparison and evaluation of NLG technologies. We argue that the evidence from a survey of actual NLG systems calls for a different emphasis in a reference proposal from that seen in similar initiatives in information extraction and multimedia interfaces. We introduce the framework itself, in particular the two-level data model that allows us to support the complex data requirements of NLG systems in a flexible and coherent fashion, and describe our efforts to validate the framework through a range of implementations.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.