Assessing Three Representation Methods for Sign Language Machine Translation and Evaluation
Having no standard written format, sign languages must be transcribed in some way in order to be ... more Having no standard written format, sign languages must be transcribed in some way in order to be processable for machine translation (MT). Previous research into MT for sign languages (SLs) has shown little consistency or agreement on the appropriate transcription methodology for the SLs. In this paper, we take a corpus of 200 SL utterances and explore the effects of three different representations on the MT process using a randomly and a specially selected testset. We use the DCU MATREX MT system and show that using an XML-based markup achieves the best results over other formats in terms of BLEU scores. We discuss the meaning of these results in the context of evaluating a representation of a language as opposed to the final form.
Morrissey Sara Experiments in Sign Language Machine Translation Using Examples in Cascon 2006 Dublin Symposium 17 October 2006 Dublin Ireland, 2006
Generously sponsored by a joint IBM-IRCSET scholarship: Research in the National Centre for Langu... more Generously sponsored by a joint IBM-IRCSET scholarship: Research in the National Centre for Language Technology at Dublin City University: Sign langauges (SLs) are the first langauge of the Deaf communities worldwide and, just like other minority languages are poorly resourced and in many cases lack political and social recognition. As a result of this, users of minority languages are often required to have multilingual competencies in non-L1 languages. In the case of SLs, this causes considerable hindrance to Deaf people as the average literacy competencies of a Deaf adult are equated with those of a 10-year old. To alleviate this, we propose the development of an automatic machine translation system to translate from spoken language text to SLs through the medium of a signing mannequin.
Morrissey Sara an Assessment of Appropriate Sign Language Representation For Machine Translation in the Healthcare Domain in Sign Language Corpora Linguistic Issues Workshop 2009 24 25 July 2009 London Uk, 2009
With ever increasing computing power and advances in 3D animation technologies it is no surprise ... more With ever increasing computing power and advances in 3D animation technologies it is no surprise that 3D avatars for sign language (SL) generation are advancing too. Traditionally these avatars have been driven by somewhat expensive and inflexible motion capture technologies and perhaps this is the reason avatars do not feature in all but a few user interfaces (UIs). SL synthesis is a competing technology that is less costly, more versatile and may prove to be the answer to the current lack of access for the Deaf in HCI. This paper outlines the current state of the art in SL synthesis for HCI and how we propose to advance this by improving avatar quality and realism with a view to ameliorating communication and computer interaction for the Deaf community as part of a wider localisation project.
Machine Translation (MT) for sign languages (SLs) can fa- cilitate communication between Deaf and... more Machine Translation (MT) for sign languages (SLs) can fa- cilitate communication between Deaf and hearing people by translating information into the native and preferred language of the individuals. In this paper, we discuss automatic translation from English to Irish SL (ISL) in the domain of airport information. We describe our data col- lection processes and the architecture of the MaTrEx system used for our translation work. This is followed by an outline of the additional animation phase that transforms the translated output into animated ISL. Through a set of experiments, evaluated both automatically and manually, we show that MT has the potential to assist Deaf people by providing information in their first language.
I hereby certify that this material, which I now submit for assessment on the programme of study ... more I hereby certify that this material, which I now submit for assessment on the programme of study leading to the award of Ph.D. is entirely my own work, that I have exercised reasonable care to ensure that the work is original, and does not to the best of my knowledge breach any law of copyright, and has not been taken from the work of others save and to the extent that such work has been cited and acknowledged within the text of my work.
I hereby certify that this material, which I now submit for assessment on the programme of study ... more I hereby certify that this material, which I now submit for assessment on the programme of study leading to the award of Ph.D. is entirely my own work, that I have exercised reasonable care to ensure that the work is original, and does not to the best of my knowledge breach any law of copyright, and has not been taken from the work of others save and to the extent that such work has been cited and acknowledged within the text of my work.
In this paper we consider the problems of applying corpus-based techniques to minority languages ... more In this paper we consider the problems of applying corpus-based techniques to minority languages that are neither politically recognised nor have a formally accepted writing system, namely sign languages. We discuss the adoption of an annotated form of sign language data as a suitable corpus for the development of a data-driven machine translation (MT) system, and deal with issues that arise from its use. Useful software tools that facilitate easy annotation of video data are also discussed. Furthermore, we address the problems of using traditional MT evaluation metrics for sign language translation. Based on the candidate translations produced from our example-based machine translation system, we discuss why standard metrics fall short of providing an accurate evaluation and suggest more suitable evaluation methods.
Users of sign languages are often forced to use a language in which they have reduced competence ... more Users of sign languages are often forced to use a language in which they have reduced competence simply because documentation in their preferred format is not available. While some research exists on translating between natural and sign languages, we present here what we believe to be the first attempt to tackle this problem using an example-based (EBMT) approach. Having obtained a set of English-Dutch Sign Language examples, we employ an approach to EBMT using the 'Marker Hypothesis' (Green, 1979), analogous to the successful system of (Way & Gough, 2003), (Gough & Way, 2004a) and (Gough & Way, 2004b). In a set of experiments, we show that encouragingly good translation quality may be obtained using such an approach.
In this paper we present a hybrid statistical machine translation (SMT)-example-based MT (EBMT) s... more In this paper we present a hybrid statistical machine translation (SMT)-example-based MT (EBMT) system that shows significant improvement over both SMT and EBMT baseline systems. First we present a runtime EBMT system using a subsentential translation memory (TM). The EBMT system is further combined with an SMT system for effective hybridization of the pair of systems. The hybrid system shows significant improvement in translation quality (0.82 and 2.75 absolute BLEU points) for two different language pairs ( ...
In recent years data-driven methods of machine translation (MT) have overtaken rule-based approac... more In recent years data-driven methods of machine translation (MT) have overtaken rule-based approaches as the predominant means of automatically translating between languages. A pre-requisite for such an approach is a parallel corpus of the source and target languages. Technological developments in sign language (SL) capturing, analysis and processing tools now mean that SL corpora are becoming increasingly available. With transcription and language analysis tools being mainly designed and used for linguistic purposes, we ...
Five years ago, a number of papers reported an experimental implementation of an Example Based Ma... more Five years ago, a number of papers reported an experimental implementation of an Example Based Machine Translation (EBMT) system using proportional analogy. This approach, a type of analogical learning, was attractive because of its simplicity; and the paper reported considerable success with the method using various language pairs. In this paper, we describe our attempt to use this approach for tackling English-Hindi Named Entity (NE) Transliteration. We have implemented our own EBMT system using proportional analogy and have found that the analogy-based system on its own has low precision but a high recall due to the fact that a large number of names are untransliterated with the approach. However, mitigating problems in analogy-based EBMT with SMT and vice-versa have shown considerable improvement over the individual approach.
In this paper we present a novel way of integrating Translation Memory into an Example-based Mach... more In this paper we present a novel way of integrating Translation Memory into an Example-based Machine Translation System (EBMT) to deal with the issue of low resources. We have used a dialogue of 380 sentences as the example-base for our system. The translation units in the Translation Memories are automatically extracted based on the aligned phrases (words) of a statistical machine translation (SMT) system. We attempt to use the approach to improve translation from English to Bangla as many statistical machine translation systems have difficulty with such small amounts of training data. We have found the approach shows improvement over a baseline SMT system.
In this paper, we describe the first data-driven automatic sign-languageto-speech translation sys... more In this paper, we describe the first data-driven automatic sign-languageto-speech translation system. While both sign language (SL) recognition and translation techniques exist, both use an intermediate notation system not directly intelligible for untrained users. We combine a SL recognizing framework with a state-of-the-art phrase-based machine translation (MT) system, using corpora of both American Sign Language and Irish Sign Language data. In a set of experiments we show the overall results and also illustrate the importance of including a vision-based knowledge source in the development of a complete SL translation system.
Systems that automatically process sign language rely on appropriate data. We therefore present t... more Systems that automatically process sign language rely on appropriate data. We therefore present the ATIS sign language corpus that is based on the domain of air travel information. It is available for five languages, English, German, Irish sign language, German sign language and South African sign language. The corpus can be used for different tasks like automatic statistical translation and automatic sign language recognition and it allows the specific modelling of spatial references in signing space.
In this paper, we address the issue of applying example-based machine translation (EBMT) methods ... more In this paper, we address the issue of applying example-based machine translation (EBMT) methods to overcome some of the difficulties encountered with statistical machine translation (SMT) techniques. We adopt two different EBMT approaches and present an approach to augment output quality by strategically combining both EBMT approaches with the SMT system to handle issues arising from the use of SMT. We use these approaches for English to Turkish translation using the IWSLT09 dataset. Improved evaluation scores (4% relative BLEU improvement) were achieved when EBMT was used to translate sentences for which SMT failed to produce an adequate translation.
Marker-Based EBMT Approach to EBMT based on the Marker Hypothesis "The Marker Hypothesis states t... more Marker-Based EBMT Approach to EBMT based on the Marker Hypothesis "The Marker Hypothesis states that all natural languages have a closed set of specific words or morphemes which appear in a limited set of grammatical contexts and which signal that context." (Green, 1979). Universal psycholinguistic constraint: languages are marked for syntactic structure at surface level by closed set of lexemes or morphemes.
In this paper, we describe the first data-driven automatic sign-language- to-speech translation s... more In this paper, we describe the first data-driven automatic sign-language- to-speech translation system. While both sign language (SL) recognition and translation techniques exist, both use an intermediate notation system not directly intelligible for untrained users. We combine a SL recogniz- ing framework with a state-of-the-art phrase-based machine translation (MT) system, using corpora of both Ameri- can Sign Language and Irish
Uploads
Papers by Sara Morrissey