Tuning Statistical Machine Translation Parameters

Ahmed Rafea

Outline

Natural Language Processing

Tuning Statistical Machine Translation Parameters

Ahmed Rafea

2004, Ijit

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Statistical Machine Translation (SMT) involves many tasks including modeling, training, decoding, and evaluation. In this work, we present a methodology for optimizing the training process to get better translation quality using the well known GIZA++ SMT toolkit. The methodology is based on adjusting the parameters of GIZA++ that affect the generation of the translation model. When applying the methodology, an average improvement of has been achieved in the translation quality.

Mir Aadil

International Journal of Advanced Research in Computer Science and Software Engineering

The process Machine translation is a combination of many complex sub-processes and the quality of results of each sub-process executed in a well defined sequence determine the overall accuracy of the translation. Statistical Machine Translation approach considers each sentence in target language as a possible translation of any source language sentence. The possibility is calculated by probability and as obvious, sentence with highest probability is treated as the best translation. SMT is the most favoured approach not only because of its good results for corpus rich language pairs, but also for the tools that SMT approach has been enhanced with in past two and half decades. The paper gives a brief introduction to SMT: its steps and different tools available for each step.

downloadDownload free PDF View PDFchevron_right

Adaptive Tuning for Statistical Machine Translation (AdapT)

irfan lazuardi

Computational Linguistics and Intelligent Text Processing, 2015

In statistical machine translation systems, it is a common practice to use one set of weighting parameters in scoring the candidate translations from a source language to a target language. In this paper, we challenge the assumption that only one set of weights is sufficient to pick the best candidate translation for all source language sentences. We propose a new technique that generates a different set of weights for each input sentence. Our technique outperforms the popular tuning algorithm MERT on different datasets using different language pairs.

downloadDownload free PDF View PDFchevron_right

Parameter Optimization for Statistical Machine Translation: It Pays to Learn from Hard Examples

Preslav Nakov, Francisco J Guzmán

Research on statistical machine translation has focused on particular translation directions, typically with English as the target language, e.g., from Arabic to English. When we reverse the translation direction, the multiple reference translations turn into multiple possible inputs, which offers both challenges and opportunities. We propose and evaluate several strategies for making use of these multiple inputs: (a) select one of the datasets, (b) select the best input for each sentence, and (c) synthesize an input for each sentence by fusing the available inputs. Surprisingly, we find out that it is best to tune on the hardest available input, not on the one that yields the highest BLEU score. This finding has implications on how to pick good translators and how to select useful data for parameter optimization in SMT.

downloadDownload free PDF View PDFchevron_right

Proceedings of the Sixth Workshop on Statistical Machine Translation}

Joel Tetreault

Proceedings of the Sixth Workshop on Statistical Machine Translation}

The focus of our workshop was to use parallel corpora for machine translation. Recent experimentation has shown that the performance of SMT systems varies greatly with the source language. In this workshop we encouraged researchers to investigate ways to improve the performance of SMT systems for diverse languages, including morphologically more complex languages, languages with partial free word order, and low-resource languages.

downloadDownload free PDF View PDFchevron_right

Statistical Machine Translation

sumit goswami

2016

Statistical Machine Translation (SMT) systems are based on bilingual sentence aligned data. The quality of translation depends on the data provided for translation learning. A huge parallel corpus is required for performing the statistical machine translation. The aim of this paper is to explore SMT using the Moses toolkit for creating a German-English translator. To perform the German to English translation, a parallel corpus of this language pair has been provided. Larger the size of the data provided for the training of the Moses decoder, more accurate is the translated output.

downloadDownload free PDF View PDFchevron_right

Findings of the 2009 Workshop on Statistical Machine Translation

Josh Schroeder

International Journal of Press-politics, 2009

downloadDownload free PDF View PDFchevron_right

Tuning Machine Translation Parameters with SPSA

Patrik Lambert

2006

Most of statistical machine translation systems are combinations of various models, and tuning of the scaling factors is an important step. However, this optimisation problem is hard because the objective function has many local minima and the available algorithms cannot achieve a global optimum. Consequently, optimisations starting from different initial settings can converge to fairly different solutions. We present tuning experiments with the Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm, and compare them to tuning with the widely used downhill simplex method. With IWSLT 2006 Chinese-English data, both methods showed similar performance, but SPSA was more robust to the choice of initial settings.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

Ahmed Rafea

IRI -2005 IEEE International Conference on Information Reuse and Integration, Conf, 2005., 2005

downloadDownload free PDF View PDFchevron_right

Training a Statistical Machine Translation System without GIZA

Evgeny Matusov

The IBM Models enjoy great popularity in the machine translation community because they offer high quality word alignments and a free implementation is available with the GIZA ++ Toolkit (Och and Ney, 2003). Several methods have been developed to overcome the asymmetry of the alignment generated by the IBM Models. A remaining disadvantage, however, is the high model complexity. This paper describes a word alignment training procedure for statistical machine translation that uses a simple and clear statistical model, different from the IBM models. The main idea of the algorithm is to generate a symmetric and monotonic alignment between the target sentence and a permutation graph representing different reorderings of the words in the source sentence. The quality of the generated alignment is shown to be comparable to the standard GIZA ++ training in an SMT setup.

downloadDownload free PDF View PDFchevron_right

Proceedings of the Fourth Workshop on Statistical Machine Translation - StatMT '09

Eric Wehrli

2009

This paper presents the results of the WMT09 shared tasks, which included a translation task, a system combination task, and an evaluation task. We conducted a large-scale manual evaluation of 87 machine translation systems and 22 system combination entries. We used the ranking of these systems to measure how strongly automatic metrics correlate with human judgments of translation quality, for more than 20 metrics. We present a new evaluation technique whereby system output is edited and judged for correctness. 2.1 Test data The test data for this year's task was created by hiring people to translate news articles that were drawn from a variety of sources during the period from the end of September to mid-October of 2008. A total of 136 articles were selected, in roughly equal amounts from a variety of Czech, English, French, German, Hungarian, Italian and Spanish news sites: 2

downloadDownload free PDF View PDFchevron_right

Moses: Open source toolkit for statistical machine translation

Malaroi Torano

Annual meeting- …, 2007

We describe an open-source toolkit for statistical machine translation whose novel contributions are (a) support for linguistically motivated factors, (b) confusion network decoding, and (c) efficient data formats for translation models and language models. In addition to the SMT decoder, the toolkit also includes a wide variety of tools for training, tuning and applying the system to many translation tasks.

downloadDownload free PDF View PDFchevron_right

Quality Translation Enhancement Using Sequence Knowledge and Pruning in Statistical Machine Translation

TELKOMNIKA JOURNAL, Teddy Mantoro

TELKOMNIKA Telecommunication Computing Electronics and Control, 2018

Machine translation has two important parts, a learning process which followed by a translation process. Unfortunately, most of the translation process requires complex operations and in-depth knowledge of the languages in order to give a good quality translation. This study proposes a better approach, which does not require in-depth knowledge of the linguistic properties of the languages, but it produces a good quality translation. This study evaluated 28 different parameters in IRSTLM language modeling, which resulting 270 millions experiments, and proposes a sequence evaluation mechanism based on a maximum evaluation of each parameter in producing a good quality translation based on NIST and BLEU. The parallel corpus and statistical machine learning for English and Bahasa Indonesia were used in this study. The pruning process, user interface, and the personalization of translation have a very important role in implementing of this machine translation. The result is quite promising. It shows that pruning process increases of the translation process time. The particular sequence knowledge/value parameter in translation process has a better performance than the other method using in-depth linguistic knowledge approaches. All these processes, including the process of parsing from a stand-alone mode to an online mode, are also discussed in detail.

downloadDownload free PDF View PDFchevron_right

Tuning Statistical Machine Translation Parameters

Sign up for access to the world's latest research

Abstract

Related papers

Related papers

Related topics