Relating Translation Quality Barriers to Source-Text Properties

Federico Gaspari

Outline

Title

Abstract

Introduction

Data, Pre-Processing and Experimental Set-Up

Results

Analysis

Conclusions and Future Work

Relating Translation Quality Barriers to Source-Text Properties

Federico Gaspari

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

This paper aims to automatically identify which linguistic phenomena represent barriers to better MT quality. We focus on the translation of news data for two bidirectional language pairs: EN↔ES and EN↔DE. Using the diagnostic MT evaluation toolkit DELiC4MT and a set of human reference translations, we relate translation quality barriers to a selection of 9 source-side PoS-based linguistic checkpoints. Using output from the winning SMT, RbMT, and hybrid systems of the WMT 2013 shared task, translation quality barriers are investigated (in relation to the selected linguistic checkpoints) according to two main variables: (i) the type of the MT approach, i.e. statistical, rule-based or hybrid, and (ii) the human evaluation of MT output, ranked into three quality groups corresponding to good, near miss and poor. We show that the combination of manual quality ranking and automatic diagnostic evaluation on a set of PoS-based linguistic checkpoints is able to identify the specific quality barriers of different MT system types across the four translation directions under consideration.

Marianne Starlander

2014

We propose facilitating the error annotation task of translation quality assessment by introducing an annotation process which consists of two separate steps that are similar to the ones required in the European Standard for translation companies EN 15038: an error analysis for errors relating to acceptability (where the target text as a whole is taken into account, as well as the target text in context), and one for errors relating to adequacy (where source segments are compared to target segments). We present a fine-grained error taxonomy suitable for a diagnostic and comparative analysis of machine translated texts, post-edited texts and human translations. Categories missing in existing metrics have been added, such as lexical issues, coherence issues, and text type-specific issues.

downloadDownload free PDF View PDFchevron_right

Empirical Study of a Two-Step Approach to Estimate Translation Quality

Jesús González Rubio

We present a method to estimate the quality of automatic translations when reference translations are not available. Quality estimation is addressed as a two-step regression problem where multiple features are combined to predict a quality score. Given a set of features, we aim at automatically extracting the variables that better explain translation quality, and use them to predict the quality score. The soundness of our approach is assessed by the encouraging results obtained in an exhaustive experimentation with several feature sets. Moreover, the studied approach is highly-scalable allowing us to employ hundreds of features to predict translation quality.

downloadDownload free PDF View PDFchevron_right

Towards hybrid quality-oriented machine translation – on linguistics and probabilities in MT

Paul Meurer

2007

We present a hybrid MT architecture, combining state-of-the-art linguistic processing with advanced stochastic techniques. Grounded in a theoretical reflection on the division of labor between rule-based and probabilistic elements in the MT task, we summarize per-component approaches to ranking, including empirical results when evaluated in isolation. Combining component-internal scores and a number of additional sources of (probabilistic) information, we explore discriminative re-ranking of n-best lists of candidate translations through an eclectic combination of knowledge sources, and provide evaluation results for various configurations. 1 Background—Motivation Machine Translation is back in fashion, with data-driven approaches and specifically Statistical MT (SMT) as the predominant paradigm— both in terms of scientific interest and evaluation results inMT competitions. But (fullyautomated) machine translation remains a hard— if not ultimately impossible—challenge. The task enco...

downloadDownload free PDF View PDFchevron_right

Predicting MT Quality as a Function of the Source Language

David Rojas

2006

This paper describes one phase of a large-scale machine translation (MT) quality assurance project. We explore a novel approach to discriminating MT-unsuitable source sentences by predicting the expected quality of the output. 1 The resources required include a set of source/MT sentence pairs, human judgments on the output, a source parser, and an MT system. We extract a number of syntactic, semantic, and lexical features from the source sentences only and train a classifier that we call the "

downloadDownload free PDF View PDFchevron_right

Fine-grained evaluation of Quality Estimation for Machine translation based on a linguistically-motivated Test Suite

Vivien Macketanz, Eleftherios Avramidis

We present an alternative method of evaluating Quality Estimation systems, which is based on a linguistically-motivated Test Suite. We create a test-set consisting of 14 linguistic error categories and we gather for each of them a set of samples with both correct and erroneous translations. Then, we measure the performance of 5 Quality Estimation systems by checking their ability to distinguish between the correct and the erroneous translations. The detailed results are much more informative about the ability of each system. The fact that different Quality Estimation systems perform differently at various phenomena confirms the usefulness of the Test Suite.

downloadDownload free PDF View PDFchevron_right

Quality estimation for translation selection

Kashif ullah Shah ktk

We describe experiments on quality estimation to select the best translation among multiple options for a given source sentence. We consider a realistic and challenging setting where the translation systems used are unknown, and no relative quality assessments are available for the training of prediction models. Our findings indicate that prediction errors are higher in this blind setting. However, these errors do not have a negative impact in performance when the predictions are used to select the best translation, compared to non-blind settings. This holds even when test conditions (text domains, MT systems) are different from model building conditions. In addition, we experiment with quality prediction for translations produced by both translation systems and human translators. Although the latter are on average of much higher quality, we show that automatically distinguishing the two types of translation is not a trivial problem.

downloadDownload free PDF View PDFchevron_right

Scaling the ISLE taxonomy: Development of metrics for the multi-dimensional characterisation of Machine Translation quality

Michelle Vanni

Proceedings of Machine Translation Summit VIII, 2001

The DARPA MT evaluations of the early 1990s, along with subsequent work on the MT Scale, and the International Standards for Language Engineering (ISLE) MT Evaluation framework represent two of the principal efforts in Machine Translation Evaluation (MTE) over the past decade. We describe a research program that builds on both of these efforts. This paper focuses on the selection of MT output features suggested in the ISLE framework, as well as the development of metrics for the features to be used in the study. We define each metric and describe the rationale for its development. We also discuss several of the finer points of the evaluation measures that arose as a result of verification of the measures against sample output texts from three machine translation systems.

downloadDownload free PDF View PDFchevron_right

Packing It All Up in Search for a Language Independent MT Quality Measure Tool – Part Two

Kimmo Kettunen

This study describes first usage of a particular implementation of Normalized Compression Distance (NCD) as a machine translation quality evaluation tool. NCD has been introduced and tested for clustering and classification of different types of data and found a reliable and general tool. As far as we know NCD in its Complearn implementation has not been evaluated as a MT quality tool yet, and we wish to show that it can also be used for this purpose. We show that NCD scores given for MT outputs in different languages correlate highly with scores of a state-of-the-art MT evaluation metrics, METEOR 0.6. Our experiments are based on translations between one source and three target languages with a smallish sample that has available reference translations, UN’s Universal Declaration of Human Rights. Secondly we shall also briefly describe and discuss results of a larger scale evaluation of NCD as an MT metric with WMT08 Shared Task Evaluation Data. These evaluations confirm further that NCD is a noteworthy MT metric both in itself and also enriched with basic language tools, stemming and Wordnet.

downloadDownload free PDF View PDFchevron_right

Scaling the ISLE taxonomy: development of metrics for the multi-dimensional characterization of machine translation quality

Michelle Vanni

2001

downloadDownload free PDF View PDFchevron_right

Automatic evaluation of output quality for machine translation systems

jaden wu

Machine Translation, 1993

Automatic evaluation of output quality for machine translation systems is a difficult task. The Institute of Computational Linguistics of Peking University has developed an automatic evaluation system called MTE. This paper introduces the basic principles of MTE, its implementation techniques and the practice experiences.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

Bogdan Babych

Proceedings of LREC 2004, 2004

downloadDownload free PDF View PDFchevron_right

TerrorCat: a Translation Error Categorization-based MT Quality Metric

Mark Fishel, R. Sennrich

2012

We present TerrorCat, a submission to the WMT'12 metrics shared task. TerrorCat uses frequencies of automatically obtained translation error categories as base for pairwise comparison of translation hypotheses, which is in turn used to generate a score for every translation. The metric shows high overall correlation with human judgements on the system level and more modest results on the level of individual sentences.

downloadDownload free PDF View PDFchevron_right

QuEst – Design, Implementation and Extensions of a Framework for Machine Translation Quality Estimation

Eleftherios Avramidis

The Prague Bulletin of Mathematical Linguistics, 2013

In this paper we present QE, an open source framework for machine translation quality estimation. The framework includes a feature extraction component and a machine learning component. We describe the architecture of the system and its use, focusing on the feature extraction component and on how to add new feature extractors. We also include experiments with features and learning algorithms available in the framework using the dataset of the WMT13 Quality Estimation shared task.

downloadDownload free PDF View PDFchevron_right

Translation Quality Assessment: A Brief Survey on Manual and Automatic Methods

Lifeng Han

MoTra21: Workshop on Modelling Translation: Translatology in the Digital Age , 2021

To facilitate effective translation modeling and translation studies, one of the crucial questions to address is how to assess translation quality. From the perspectives of accuracy, reliability, repeatability and cost, translation quality assessment (TQA) itself is a rich and challenging task. In this work, we present a high-level and concise survey of TQA methods, including both manual judgement criteria and automated evaluation metrics, which we classify into further detailed sub-categories. We hope that this work will be an asset for both translation model researchers and quality assessment researchers. In addition, we hope that it will enable practitioners to quickly develop a better understanding of the conventional TQA field, and to find corresponding closely relevant evaluation solutions for their own needs. This work may also serve inspire further development of quality assessment and evaluation methodologies for other natural language processing (NLP) tasks in addition to machine translation (MT), such as automatic text summarization (ATS), natural language understanding (NLU) and natural language generation (NLG).

downloadDownload free PDF View PDFchevron_right

An efficient and user-friendly tool for machine translation quality estimation

Kashif ullah Shah ktk

We present a new version of QUEST-an open source framework for machine translation quality estimation-which brings a number of improvements: (i) it provides a Web interface and functionalities such that non-expert users, e.g. translators or lay-users of machine translations, can get quality predictions (or internal features of the framework) for translations without having to install the toolkit, obtain resources or build prediction models; (ii) it significantly improves over the previous runtime performance by keeping resources (such as language models) in memory; (iii) it provides an option for users to submit the source text only and automatically obtain translations from Bing Translator; (iv) it provides a ranking of multiple translations submitted by users for each source text according to their estimated quality. We exemplify the use of this new version through some experiments with the framework.

downloadDownload free PDF View PDFchevron_right

Quality estimation for machine translation output using linguistic analysis and decoding features

Eleftherios Avramidis

We describe a submission to the WMT12 Quality Estimation task, including an extensive Machine Learning experimentation. Data were augmented with features from linguistic analysis and statistical features from the SMT search graph. Several Feature Selection algorithms were employed. The Quality Estimation problem was addressed both as a regression task and as a discretised classification task, but the latter did not generalise well on the unseen testset. The most successful regression methods had an RMSE of 0.86 and were trained with a feature set given by Correlation-based Feature Selection. Indications that RMSE is not always sufficient for measuring performance were observed.

downloadDownload free PDF View PDFchevron_right

The UU Submission to the Machine Translation Quality Estimation Task

Sara Stymne

Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, 2016

This paper outlines the UU-SVM system for Task 1 of the WMT16 Shared Task in Quality Estimation. Our system uses Support Vector Machine Regression to investigate the impact of a series of features aiming to convey translation quality. We propose novel features measuring reordering and noun translation errors. We show that we can outperform the baseline when we combine it with a subset of our new features.

downloadDownload free PDF View PDFchevron_right

Word Transition Entropy as an Indicator for Expected Machine Translation Quality, Proceedings of the Workshop on Automatic and Manual Metrics for Operational Translation Evaluation. MTE 2014

Michael Carl

downloadDownload free PDF View PDFchevron_right

Improving Evaluation of Document-level Machine Translation Quality Estimation

Carla Parra

Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 2017

Meaningful conclusions about the relative performance of NLP systems are only possible if the gold standard employed in a given evaluation is both valid and reliable. In this paper, we explore the validity of human annotations currently employed in the evaluation of document-level quality estimation for machine translation (MT). We demonstrate the degree to which MT system rankings are dependent on weights employed in the construction of the gold standard, before proposing direct human assessment as a valid alternative. Experiments show direct assessment (DA) scores for documents to be highly reliable, achieving a correlation of above 0.9 in a self-replication experiment, in addition to a substantial estimated cost reduction through quality controlled crowdsourcing. The original gold standard based on post-edits incurs a 10-20 times greater cost than DA.

downloadDownload free PDF View PDFchevron_right

Relating Translation Quality Barriers to Source-Text Properties

Sign up for access to the world's latest research

Abstract

Related papers

Related papers