Morphosyntactic Resources for Automatic Speech Recognition
2008, Language Resources and Evaluation
Sign up for access to the world's latest research
Abstract
Texts generated by automatic speech recognition (ASR) systems have some specificities, related to the idiosyncrasies of oral productions or the principles of ASR systems, that make them more difficult to exploit than more conventional natural language written texts. This paper aims at studying the interest of morphosyntactic information as a useful resource for ASR. We show the ability of automatic
Related papers
2002
Abstract The purpose of this paper is to present the development of a morphossyntactic disambiguation system (or part-of-speech tagging system) which is intended to be used as a component of a Text-to-Speech (TTS) system for European Portuguese. In the development of the tagger, we compared two approaches: a probabilistic-based approach and a hybrid approach. Besides comparing these two approaches, this paper considers the effects of the different classes of errors on the performance of the complete TTS system.
Dictionary-based methods in morphological analysis can provide accurate lemmatization and rich annotation, including part-of-speech, number, gender, etc. A morphological guesser can be used to process out-ofvocabulary words. Industrial text processing applications require high performance, which suggests the need to merge these two types of applications. In this paper we discuss the conversion of a pre-existing high coverage morphosyntactic lexicon into a deterministic finite-state device which: preserves accurate lemmatization and annotation for vocabulary words, allows acquisition and exploitation of implicit morphological knowledge from the dictionaries in the form of ending guessing rules to process out-of-vocabulary words, allows seamless integration of additional hand-crafted ending guessing rules.
Lecture Notes in Computer Science, 2006
The aim of our paper is to study the interest of part of speech (POS) tagging to improve speech recognition. We first evaluate the part of misrecognized words that can be corrected using POS information; the analysis of a short extract of French radio broadcast news shows that an absolute decrease of the word error rate by 1.1% can be expected. We also demonstrate quantitatively that traditional POS taggers are reliable when applied to spoken corpus, including automatic transcriptions. This new result enables us to effectively use POS tag knowledge to improve, in a postprocessing stage, the quality of transcriptions, especially correcting agreement errors.
This paper describes the construction and usage of the MOR and GRASP programs for part of speech tagging and syntactic dependency analysis of the corpora in the CHILDES and TalkBank databases. We have written MOR grammars for 11 languages and GRASP analyses for three. For English data, the MOR tagger reaches 98% accuracy on adult corpora and 97% accuracy on child language corpora. The paper discusses the construction of MOR lexicons with an emphasis on compounds and special conversational forms. The shape of rules for controlling allomorphy and morpheme concatenation are discussed. The analysis of bilingual corpora is illustrated in the context of the Cantonese-English bilingual corpora. Methods for preparing data for MOR analysis and for developing MOR grammars are discussed. We believe that recent computational work using this system is leading to significant advances in child language acquisition theory and theories of grammar identification more generally.
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96
This paper describes a new system for speech analysis, ANGIE, which characterizes word substructure in terms of a trainable grammar. ANGIE capture morpho-phonemic and phonological phenomena through a hierarchical framework. The terminal categories can be alternately letters or phone units, yielding a reversible letter-tosound/sound-to-letter system. In conjunction with a segment network and acoustic phone models, the system can produce phonemicto-phonetic alignments for speech waveforms. For speech recognition, ANGIE uses a one-pass bottom-up best-first search strategy. Evaluated in the ATIS domain, ANGIE achieved a phone error rate of 36%, as compared with 40% achieved with a baseline phone-bigram based recognizer under similar conditions. ANGIE potentially offers many attractive features, including dynamic vocabulary adaptation, as well as a framework for handling unknown words. Previous experiments have yielded improved pronunciation accuracy without this layer.
Eighth Annual Conference …, 2007
A coupled acoustic-and language-modeling approach is presented for the recognition of spontaneous speech primarily in agglutinative languages. The effectiveness of the approach in large vocabulary spontaneous speech recognition is demonstrated on the Hungarian MALACH corpus. The derivation of morphs from word forms is based on a statistical morphological segmentation tool while the mapping of morphs into graphemes is obtained trivially by splitting each morph into individual letters. Using morphs instead of words in language modeling gives significant WER reductions in case of both phoneme-and grapheme-based acoustic modeling. The improvements are larger after speaker adaptation of the acoustic models. In conclusion, morphophonemic and the proposed morpho-graphemic ASR approaches yield the same best WERs, which are significantly lower than the word-based baselines but essentially without language dependent rules or pronunciation dictionaries in the latter case.
2007
We analyze subword-based language models (LMs) in large-vocabulary continuous speech recognition across four "morphologically rich" languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. By estimating n-gram LMs over sequences of morphs instead of words, better vocabulary coverage and reduced data sparsity is obtained. Standard word LMs suffer from high out-of-vocabulary (OOV) rates, whereas the morph LMs can recognize previously unseen word forms by concatenating morphs. We show that the morph LMs generally outperform the word LMs and that they perform fairly well on OOVs without compromising the accuracy obtained for in-vocabulary words.
Analysis and Synthesis of Speech
In this contribution MORPHON is outlined. This module provides the text-to-speech System with phonological rules. It will be argued that such rules are needed because the pronunciation of a sentence does not consist of the concatenaüon of the pronunciation of the constituting morphemes, but the pronunciation of morphemes is modified in certain contexts. These rules can only apply properly if exceptions can be listed in a lexicon, and if rules can refer to morphological and morpho-syntactic Information. Therefore a lexicon-based approach to text-tophoneme transcription conversion was chosen. Finally, the pronunciation accuracy of MORPHON is compared with that of two rule based text-to-phoneme transcription Systems.
We describe the Corpus of Spoken Icelandic (ÍS-TAL) which is made up of 15 hours of spontaneous naturally occurring conversations, 31 conversations in all. The corpus comprises 184,080 tokens, 14,297 types and 9,221 lemmas. It has been transcribed using standard orthography. We present a list of the 30 most common lemmas in the corpus and compare it to a list of the most frequent lemmas in the written language, concluding that the differences between the two lists are smaller than expected. We have tagged the corpus morphologically with a statistical tagger that had been trained on written texts. The results are much better than we expected, and the tagging accuracy is as least as high as for the written texts. The final part of the paper is a report on a work in progress. We have been experimenting with converting the morphological tagging into a shallow syntactic markup by applying a few simple hand-written rules. Even though the analysis we get by using this procedure is bound to be incomplete and contain several errors, we conclude that the results are promising and we can use this method to build a simple yet useful treebank with minimal effort.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (1)
- References T. Brants. 2000. TnT -a statistical part-of-speech tagger. In Proc. of the Conference on Applied Natural Language Processing (ANLP).