Tuning Statistical Machine Translation Parameters
2004, Ijit
Sign up for access to the world's latest research
Abstract
Statistical Machine Translation (SMT) involves many tasks including modeling, training, decoding, and evaluation. In this work, we present a methodology for optimizing the training process to get better translation quality using the well known GIZA++ SMT toolkit. The methodology is based on adjusting the parameters of GIZA++ that affect the generation of the translation model. When applying the methodology, an average improvement of has been achieved in the translation quality.
Related papers
International Journal of Advanced Research in Computer Science and Software Engineering
The process Machine translation is a combination of many complex sub-processes and the quality of results of each sub-process executed in a well defined sequence determine the overall accuracy of the translation. Statistical Machine Translation approach considers each sentence in target language as a possible translation of any source language sentence. The possibility is calculated by probability and as obvious, sentence with highest probability is treated as the best translation. SMT is the most favoured approach not only because of its good results for corpus rich language pairs, but also for the tools that SMT approach has been enhanced with in past two and half decades. The paper gives a brief introduction to SMT: its steps and different tools available for each step.
Computational Linguistics and Intelligent Text Processing, 2015
In statistical machine translation systems, it is a common practice to use one set of weighting parameters in scoring the candidate translations from a source language to a target language. In this paper, we challenge the assumption that only one set of weights is sufficient to pick the best candidate translation for all source language sentences. We propose a new technique that generates a different set of weights for each input sentence. Our technique outperforms the popular tuning algorithm MERT on different datasets using different language pairs.
Research on statistical machine translation has focused on particular translation directions, typically with English as the target language, e.g., from Arabic to English. When we reverse the translation direction, the multiple reference translations turn into multiple possible inputs, which offers both challenges and opportunities. We propose and evaluate several strategies for making use of these multiple inputs: (a) select one of the datasets, (b) select the best input for each sentence, and (c) synthesize an input for each sentence by fusing the available inputs. Surprisingly, we find out that it is best to tune on the hardest available input, not on the one that yields the highest BLEU score. This finding has implications on how to pick good translators and how to select useful data for parameter optimization in SMT.
Proceedings of the Sixth Workshop on Statistical Machine Translation}
The focus of our workshop was to use parallel corpora for machine translation. Recent experimentation has shown that the performance of SMT systems varies greatly with the source language. In this workshop we encouraged researchers to investigate ways to improve the performance of SMT systems for diverse languages, including morphologically more complex languages, languages with partial free word order, and low-resource languages.
2016
Statistical Machine Translation (SMT) systems are based on bilingual sentence aligned data. The quality of translation depends on the data provided for translation learning. A huge parallel corpus is required for performing the statistical machine translation. The aim of this paper is to explore SMT using the Moses toolkit for creating a German-English translator. To perform the German to English translation, a parallel corpus of this language pair has been provided. Larger the size of the data provided for the training of the Moses decoder, more accurate is the translated output.
International Journal of Press-politics, 2009
This paper presents the results of the WMT09 shared tasks, which included a translation task, a system combination task, and an evaluation task. We conducted a large-scale manual evaluation of 87 machine translation systems and 22 system combination entries. We used the ranking of these systems to measure how strongly automatic metrics correlate with human judgments of translation quality, for more than 20 metrics. We present a new evaluation technique whereby system output is edited and judged for correctness.
2006
Most of statistical machine translation systems are combinations of various models, and tuning of the scaling factors is an important step. However, this optimisation problem is hard because the objective function has many local minima and the available algorithms cannot achieve a global optimum. Consequently, optimisations starting from different initial settings can converge to fairly different solutions. We present tuning experiments with the Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm, and compare them to tuning with the widely used downhill simplex method. With IWSLT 2006 Chinese-English data, both methods showed similar performance, but SPSA was more robust to the choice of initial settings.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.