Specification of elaborated text structures

Danail Dochev

Outline

Title

Abstract

Introduction (Cu)

Corpus Investigation Objectives

Text Structuring Findings

Summary

Method-List

Procedure-List

Conclusions and Further Work

Specification of elaborated text structures

Danail Dochev

1999, AGILE deliverable …

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract
AI

This report contains the TEXS2-Bu, TEXS2-Cz and TEXS2-Ru deliverables. The purpose of this set of deliverables is to present detailed descriptions of the text structures occurring in the AGILE corpora and in other sources used, and to specify the text structures and the style to be generated by the intermediate prototype. In addition, we present our initial ideas concerning the implementation of the text structuring module, which is to be described in detail in the deliverable TEXM2. Our approach to text structuring is set within the broader framework of systemic functional linguistics which is employed in the KPML system we use for tactical generation. The descriptions of text structures we present are based on an additional corpus study which extends the corpus study carried out in Work Package 3 and reported in [AGILE 3.1]. Our main aim in the corpus study has been to survey the variety of possible text structuring strategies and the corresponding repertoire of linguistic realisations used in each of the languages when expressing instructions. We concentrated on a selection of basic grammatical features and their deployment in the studied texts. In this report, we explain the choice of the analysed features and we present and discuss our observations.

Kamenka Staykova

The paper discuses some problems of presentation and processing of linguistic knowledge, needed for the development of real-size Bulgarian linguistic resource to be used in a multilingual text generation system, covering software manuals sub-language. The sub-language volume is specified by corpus analysis. The text generation is based on the Systemic Functional Linguistics theory. A method for developing lexico-grammar resources by re-using an existing resource for another language is described and illustrated with three examples.

downloadDownload free PDF View PDFchevron_right

Text generation for strategic computing

Christian M . I . M . Matthiessen

Proceedings of the workshop on Strategic computing natural language - HLT '86, 1986

The US military is an information-rich, computer intensive organization. It needs to have easy, understandable access to a wide variety of information. Currently, information is often in obscure computer notations that are only understood after extensive training and practice. Although easy discourse between users and machines is an important objective for any situation, this issue is particularly critical in regards to automated decision aids such as expert system based battle management systems that have to carry on a dialog with a force commander. A commander cain not afford to miss important information, nor is it reasonable to expect force commanders to undergo highly specialized training to understand obscure computer dialects which differ from machine to machine.

downloadDownload free PDF View PDFchevron_right

AGILE: Automatic drafting of technical documents in Czech, Russian and Bulgarian

Elke Teich

1997

The syntactic generator in the dialog translation system Verbmobil is fed by a microplanning component which { after a lexical choice step { generates an annotated dependency structure for the selected words. In order to make maximal use of this input, the Head-driven Phrase-Structure Grammar (HPSG) which is the basis for the syntactic generator is preprocessed to create the complete set of maximal projections from all lexical types in the grammar. With these projections, the generation task consists of nding a suitable combination of such projections. Although there remains a certain trade-o , this setup eliminates the need to apply the HPSG schemata online and allows the use of simpler and cheaper uni cation steps. The preprocessing we employ is also known as a`compilation' of HPSG to a Tree Adjoining Grammar (TAG) since the resulting projections are the elementary trees of a TAG grammar.

downloadDownload free PDF View PDFchevron_right

SGML-Based Markup as a Step toward Improving Knowledge Acquisition for Text Generation

Michael Glass

We are investigating computer-assisted methods for identifying plan operators at both the conversational strategy and surface generation levels. We are using standard-conforming SGML markup on our corpus in order to be able to process it mechanically. We are using C4.5 to identify rules of the form "when is goal x implemented with plan y?". We are currently testing these methods in the knowledge acquisition process for the text generation component of CIRCSIM-Tutor v. 3, a natural-language based intelligent tutoring system. Introduction CIRCSIM-Tutor is a conversational intelligent tutoring system (ITS) which uses natural language for both input and output. The text generation component of CIRCSIMTutor v. 3, which we are currently implementing, uses a two-phase architecture consonant with the consensus architecture described by Reiter (1994). A global, top-down tutorial planner chooses and instantiates a logic form for the system to say based on available information The t...

downloadDownload free PDF View PDFchevron_right

Using A Text Model For Analysis And Generation

eric fimbel, H. Groscot

1985

The following paper concerns a general scheme for multilingual text generation, as opposed to just translation. Our system processes the text as a whole, from which it extracts a representation of the meaning of the text. From this representation, a new text is generated, using a text modei and action ruies.

downloadDownload free PDF View PDFchevron_right

Process Model for Composing High-quality Text Corpora

Mikko Lounela

Lrec, 2008

The Teko corpus composing model offers a decentralized, dynamic way of collecting high-quality text corpora for linguistic research. The resulting corpus consists of independent text sets. The sets are composed in cooperation with linguistic research projects, so each of them responds to a specific research need. The corpora are morphologically annotated and XML-based, with in-built compatibilty with the Kaino user interface used in the corpus server of the Research Institute for the Languages of Finland. Furthermore, software for extracting standard quantitative reports from the text sets has been created during the project. The paper describes the project, and estimates its benefits and problems. It also gives an overview of the technical qualities of the corpora and corpus interface connected to the Teko project.

downloadDownload free PDF View PDFchevron_right

Nigel: A Systemic Grammar for Text Generation

Christian M . I . M . Matthiessen

1983

Programming a computer to write text which meets a prior need is a challenging research task. As part of such research, Nigel, a large programmed grammar of English, has been created in the framework of systemic linguistics begun by Halliday. In addition to specifying functions and structures of English, Nigel has a novel semantic stratum which specifies the situations in which each grammatical feature should be used. The report consists of three papers on Nigel: an introductory overview, the script of a demonstration of its use in generation, and an exposition of how Nigel relates to the systemic framework. Although the effort to develop Nigel is significant both as computer science research and as linguistic inquiry, the outlook of the report is oriented to its linguistic significance. U\ ip Unclassified SECURITY CLASSIFICATION OP THIS PAGEI(Pbeln DlM Enteed)

downloadDownload free PDF View PDFchevron_right

Current Research in Natural Language Generation

Robert Dale

Language, 1992

downloadDownload free PDF View PDFchevron_right

Meaning representation and text planning

Sergei Nirenburg

Proceedings of the 13th conference on Computational linguistics -, 1990

The data flow in natural language generation (NLG) starts with a 'world' state, represented by structures of an application program (e.g., an expert system) that has text generation needs and an impetus to produce a natural language text. The output of generation is a natural language text. The generation process involves the tasks of a) delimiting the content of the eventual text, b) plano ning its structure, c) selecting lexieal, syntactic and word order me,'ms of realizing this structure and d) actually realizing the textusing the latter. In advanced generation systems these processes are treated not in a monolithic way, but rather as components of a large, modular generator. NLG researchers experiment with various ways of delimiting the modules of the generation process and control architectures to drive these modules (see, for instance, . But regardless of the decisions about general (intermodular) or local (intramodular) control flow, knowledge structures have to be defined to support processing and facilitate communication among the modules.

downloadDownload free PDF View PDFchevron_right

Towards a Reference Architecture for Natural Language Generation Systems

Lynne Cahill

1999

We present the rags (Reference Architecture for Generation Systems) framework: a specification of an abstract Natural Language Generation (NLG) system architecture to support sharing, re-use, comparison and evaluation of NLG technologies. We argue that the evidence from a survey of actual NLG systems calls for a different emphasis in a reference proposal from that seen in similar initiatives in information extraction and multimedia interfaces. We introduce the framework itself, in particular the two-level data model that allows us to support the complex data requirements of NLG systems in a flexible and coherent fashion, and describe our efforts to validate the framework through a range of implementations.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

Danail Dochev

2000

downloadDownload free PDF View PDFchevron_right

Toward a multidimensional framework to guide the automated generation of text types

Julia Lavid

Proceedings of the Seventh International Workshop on Natural Language Generation - INLG '94, 1994

downloadDownload free PDF View PDFchevron_right

Introduction to Text Linguistics

Hanane Bellazi

Contents 0 Foreword vi I Basic notions Textuality. The seven standards of textuality: cohesion; coherence; intentionality; acceptability; informativity; situationality; intertextuality. Constitutive versus regulative principles: efficiency; effectiveness; appropriateness. II. The evolution of text linguistics Historical background of text linguistics: rhetoric; stylistics; literary studies; anthropology; tagmemics; sociology; discourse analysis; functional sentence perspective. Descriptive structural linguistics: system levels; Harris's discourse analysis; Coseriu's work on settings; Harweg's model of substitution; the text as a unit above the sentence. Transformational grammar: proposals of Heidolph and Isenberg; the Konstanz project; Petöfi's text-structure/worldstructure theory; van Dijk's text grammars; Mel'cuk's text-meaning model; the evolving notion of transformation. III. The procedural approach Pragmatics. Systems and systemization. Description and explanation. Modularity and interaction. Combinatorial explosion. Text as a procedural entity. Processing ease and processing depth. Thresholds of termination. Virtual and actual systems. Cybernetic regulation. Continuity. Stability. Problem solving: depth-first search, breadth-first search, and means-end analysis. Mapping. Procedural attachment. Pattern-matching. Phases of text production: planning; ideation; development; expression; parsing;

downloadDownload free PDF View PDFchevron_right

Document structure

Donia Scott

Computational Linguistics, 2003

We argue the case for abstract document structure as a separate descriptive level in the analysis and generation of written texts. The purpose of this representation is to mediate between the message of a text (i.e., its discourse structure) and its physical presentation (i.e., its organization into graphical constituents like sections, paragraphs, sentences, bulleted lists, figures, and footnotes). Abstract document structure can be seen as an extension of Nunberg's "text-grammar"; it is also closely related to "logical" markup in languages like HTML and L a T E X. We show that by using this intermediate representation, several subtasks in language generation and language understanding can be defined more cleanly.

downloadDownload free PDF View PDFchevron_right

Natural language generation

David McDonald

Handbook of Natural Language Processing, 2000

We report here on a significant new set of capabilities that we have incorporated into our language generation system MUMBLE. Their impact will be to greatly simplify the work of any text planner that uses MUMBLE as ita linguistics component since MUMBLE can now take on many of the planner's text organization and decision-making problems with markedly less hand-tailoring of algorithms in either component.

downloadDownload free PDF View PDFchevron_right

AGILE; A system for multilingual generation of technical instructions

Donia Scott, John Bateman, Anthony Hartley

MT Summit VIII, Machine …, 2001

This paper presents a multilingual Natural Language Generation system that produces technical instruction texts in Bulgarian, Czech and Russian. It generates several types of texts, common for software manuals, in two styles. We illustrate the system's functionality with examples of its input and output behaviour. We discuss the criteria and procedures adopted for evaluating the system and summarise their results. The system embodies novel approaches to providing multilingual documentation, ranging from the re-use of a large-scale, broad coverage grammar of English in order to develop the lexico-grammatical resources necessary for the generation in the three target languages, through to the adoption of a 'knowledge editing' approach to specifying the desired content of the texts to be generated independently of the target languages in which those texts finally appear.

downloadDownload free PDF View PDFchevron_right

A Development Environment for an MTT-Based Sentence Generator

LEO WANNER

2000

downloadDownload free PDF View PDFchevron_right

Across the Line of Speech and Writing Variation: Proceedings of the 2nd International Conference on Linguistic and Psycholinguistic Approaches to Text Structuring (LPTS 2011)

Liesbeth Degand

Corpora and Language in Use is a series aimed at publishing research monographs and conference proceedings in the area of corpus linguistics and language in use. The main focus is on corpus data, but research that compares corpus data to other kinds of empirical data, such as experimental or questionnaire data, is also of interest, as well as studies focusing on the design and use of new methods and tools for processing language texts. The series also welcomes volumes that show the relevance of corpus analysis to application fields such as lexicography, language learning and teaching, or natural language processing.

downloadDownload free PDF View PDFchevron_right

SPLAT: A sentence-plan authoring tool

Chrysanne Dimarco

1996

SPLAT (Sentence Plan Language Authoring Tool) is an authoring tool intended to facilitate the creation of sentence-plan specifications for the Penman natural language generation system. SPLAT uses an examplebased approach in the form of sentence.plan templates to aid the user in creating and maintaining sentence plans. SPLAT also contains a sentence bank, a user-extensible collection of sentence plans annotated in various ways. The sentence bank can be searched for candidate plans that can then be used in the creation of new sentence plans specific to the domain of interest. SPLAT's graphical environment provides additional support to the user in the form of menu-driven access to Penman's linguistic resources and management of partially built sentence plans. 1 I n t r o d u c t i o n As natural language generation systems become more complex and sophisticated, the mode of input to these systems is becoming correspondingly more difficult to specify and manage. Currently, sent...

downloadDownload free PDF View PDFchevron_right

Description-directed Natural Language Generation

P. James, David McDonald

1985

downloadDownload free PDF View PDFchevron_right

Specification of elaborated text structures

Sign up for access to the world's latest research

AbstractAI

Related papers

Related papers

Abstract
AI