MARIA STASIMIOTI

Translation as a Microtask: Investigating a Paradox

by Vilelmini Sosoni and MARIA STASIMIOTI

Crowdsourcing platforms are frequently used for the creation of parallel translation data for the... more Crowdsourcing platforms are frequently used for the creation of parallel translation data for the training of Statistical Machine Translation (SMT) models (Zaidan & Callison Burch, 2011). In an effort to shed some light into the effects of microtasking on the translation process and the translator’s role, the aim of the paper is to investigate how amateur and expert workers perform when faced with translation microtasks and their effect on quality. The research is based on the crowdsourcing activities carried out in the framework of the TraMOOC (Translation for Massive Open Online Courses) research and innovation project (www.tramooc.eu).

Download

The 14th Conference of The Association for Machine Translation in the Americas WORKSHOP PROCEEDING 1st Workshop on Post-Editing in Modern-Day Translation

by Vilelmini Sosoni and MARIA STASIMIOTI

1st Workshop on Post-Editing in Modern-Day Translation/The 14th Conference of The Association for Machine Translation in the Americas , 2020

Machine Translation (MT) has been increasingly used in industrial translation production scenario... more Machine Translation (MT) has been increasingly used in industrial translation production scenarios thanks to the development of Neural Machine Translation (NMT) models and the improvement of MT output, especially at the level of fluency. In particular, in an effort to speed up the translation process and reduce costs, MT output is used as raw
translation to be subsequently post-edited by translators. However, post-editing (PE) has been found to differ from both human translation and revision of human translation in terms of the cognitive processes
and the practical goals and processes employed. In addition, translators remain sceptical towards PE and question its real benefits. The paper seeks to investigate the effort required for full PE and compare it with the effort required for manual translation, focusing on the English-Greek language pair and NMT output. In particular, eye-tracking and keystroke logging data are used to measure the effort expended by translators while translating from scratch and the effort required while post-editing the NMT output. The findings indicate that the effort is lower when post-editing than when translating from scratch, while they also suggest that experience in PE plays a role.

Download

Machine Translation Quality: A comparative evaluation of SMT, NMT and tailored NMT EAMT20200615 35731 tcccz0

by Vilelmini Sosoni, MARIA STASIMIOTI, and Despina Mouratidi

Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, 2020

The present study aims to compare three systems: a generic statistical machine translation, a gen... more The present study aims to compare three systems: a generic statistical machine translation, a generic neural machine translation and a tailored-NMT system focusing on the English to Greek language pair. The comparison is carried out following a mixed-methods approach, i.e. automatic metrics, as well as side-by-side ranking, adequacy and
fluency rating, measurement of actual post editing effort and human error analysis performed by 16 postgraduate Translation students. The findings reveal a higher score for both the generic NMT and the tailored-NMT outputs as regards automatic metrics and human evaluation metrics, with the tailored-NMT output faring even better than the generic NMT output.

Download

Undergraduate Translation Students’ Performance and Attitude vis-àvis Machine Translation and Post-editing: Does Training Play a Role?

by Vilelmini Sosoni and MARIA STASIMIOTI

Translating and the Computer 41 Proceedings, 2019

In an effort to meet the demands in speed, productivity and low-cost, the translation industry ha... more In an effort to meet the demands in speed, productivity and low-cost, the translation industry has turned to Machine Translation (MT) and Post-editing (PE). Nowadays, MT output is used as raw translation to be
further post-edited by a translator (Lommel and DePalma, 2016). Yet, translators still approach PE with caution and scepticism and question its real benefits (Koponen 2012; Gaspari et al 2014; Moorkens 2018). In
addition, attitudes to MT and PE seem to affect PE effort and performance (Witczak, 2016; Çetiner and İşisağ, 2019). Under that light, this study aims to investigate the attitudes and perceptions of undergraduate translation students towards MT and PE and their performance before and after they receive training in MT
and PE. Questionnaires are used to capture their attitudes and perceptions, a calculation of the technical effort and the temporal effort expended by the students while post-editing is also used, while a human evaluation of he post-edited output is carried out to assess their performance and the quality of the post-edited texts. The analysis reveals a change in the students’ attitudes and perceptions; they report a more positive attitude toward MT and PE, they are more confident and faster, while they avoid over-editing.

Download

MT output and post-editing effort: Insights from a comparative analysis of SMT and NMT output and implications for training

by MARIA STASIMIOTI and Vilelmini Sosoni

Fit-For-Market Translator and Interpreter Training in a Digital Age (Language and Linguistics), 2019

Recent technological advances have given rise to a wider availability of Machine Translation (MT)... more Recent technological advances have given rise to a wider availability of Machine Translation (MT) systems for various language pairs, while the advent of neural machine translation (NMT) models have led to an improved MT quality, especially regarding fluency and in comparison to statistical machine translation (SMT) models. MT is thus increasingly used in industrial settings, a fact that has also attracted interest in the ways that translators and post-editors are, or should be, trained. This case study seeks to explore the effort involved in post-editing NMT and SMT outputs, and the ways in which the quality of MT systems used as well as the errors found in the raw MT output should be taken into consideration in post-editing (PE) training. To this end, the study examines, by means of eye-tracking and keystroke logging data, the performance of twenty professional Greek translators while postediting SMT and NMT outputs.

Download

Improving Machine Translation of Educational Content via Crowdsourcing

by Vilelmini Sosoni, MARIA STASIMIOTI, and Federico Gaspari

The limited availability of in-domain training data is a major issue in the training of applicati... more The limited availability of in-domain training data is a major issue in the training of application-specific neural machine translation models. Professional outsourcing of bilingual data collections is costly and often not feasible. In this paper we analyze the influence of using crowdsourcing as a scalable way to obtain translations of target in-domain data having in mind that the translations can be of a lower quality. We apply crowdsourcing with carefully designed quality controls to create parallel corpora for the educational domain by collecting translations of texts from MOOCs from English to eleven languages, which we then use to fine-tune neural machine translation models previously trained on general-domain data. The results from our research indicate that crowdsourced data collected with proper quality controls consistently yields performance gains over general-domain baseline systems, and systems fine-tuned with pre-existing in-domain corpora.

Download

A Multilingual Wikified Data Set of Educational Material

by Vilelmini Sosoni and MARIA STASIMIOTI

We present a parallel wikified data set of parallel texts in eleven language pairs from the educa... more We present a parallel wikified data set of parallel texts in eleven language pairs from the educational domain. English sentences are lined up to sentences in eleven other languages (BG, CS, DE, EL, HR, IT, NL, PL, PT, RU, ZH) where names and noun phrases (entities) are manually annotated and linked to their respective Wikipedia pages. For every linked entity in English, the corresponding term or phrase in the target language is also marked and linked to its Wikipedia page in that language. The annotation process was performed via crowdsourcing. In this paper we present the task, annotation process, the encountered difficulties with crowdsourcing for complex annotation, and the data set in more detail. We demonstrate the usage of the data set for Wikification evaluation. This data set is valuable as it constitutes a rich resource consisting of annotated data of English text linked to translations in eleven languages including several languages such as Bulgarian and Greek for which not many LT resources are available.

Download

Translation Crowdsourcing: Creating a Multilingual Corpus of Online Educational Content

by Vilelmini Sosoni and MARIA STASIMIOTI

The present work describes a multilingual corpus of online content in the educational domain, i.e... more The present work describes a multilingual corpus of online content in the educational domain, i.e. Massive Open Online Course material, ranging from course forum text to subtitles of online video lectures, that has been developed via large-scale crowdsourcing. The English source text is manually translated into 11 European and BRIC languages using the CrowdFlower platform. During the process several challenges arose which mainly involved the in-domain text genre, the large text volume, the idiosyncrasies of each target language, the limitations of the crowdsourcing platform, as well as the quality assurance and workflow issues of the crowdsourcing process. The corpus constitutes a product of the EU-funded TraMOOC project and is utilised in the project in order to train, tune and test machine translation engines.

Download

Translation vs Post-editing of NMT Output: Measuring effort in the English-Greek language pair

Machine Translation (MT) has been increasingly used in industrial translation production scenario... more Machine Translation (MT) has been increasingly used in industrial translation production scenarios thanks to the development of Neural Machine Translation (NMT) models and the improvement of MT output, especially at the level of fluency. In particular, in an effort to speed up the translation process and reduce costs, MT output is used as raw translation to be subsequently post-edited by translators. However, post-editing (PE) has been found to differ from both human translation and revision of human translation in terms of the cognitive processes and the practical goals and processes employed. In addition, translators remain sceptical towards PE and question its real benefits. The paper seeks to investigate the effort required for full PE and compare it with the effort required for manual translation, focusing on the English-Greek language pair and NMT output. In particular, eye-tracking and keystroke logging data are used to measure the effort expended by translators while transla...

Download

NoDeeLe: A Novel Deep Learning Schema for Evaluating Neural Machine Translation Systems

Proceedings of the Translation and Interpreting Technology Online Conference TRITON 2021

Improving Machine Translation of Educational Content via Crowdsourcing

The limited availability of in-domain training data is a major issue in the training of applicati... more The limited availability of in-domain training data is a major issue in the training of application-specific neural machine translation models. Professional outsourcing of bilingual data collections is costly and often not feasible. In this paper we analyze the influence of using crowdsourcing as a scalable way to obtain translations of target in-domain data having in mind that the translations can be of a lower quality. We apply crowdsourcing with carefully designed quality controls to create parallel corpora for the educational domain by collecting translations of texts from MOOCs from English to eleven languages, which we then use to fine-tune neural machine translation models previously trained on general-domain data. The results from our research indicate that crowdsourced data collected with proper quality controls consistently yields performance gains over general-domain baseline systems, and systems fine-tuned with pre-existing in-domain corpora.

Download

A Multilingual Wikified Data Set of Educational Material

We present a parallel wikified data set of parallel texts in eleven language pairs from the educa... more We present a parallel wikified data set of parallel texts in eleven language pairs from the educational domain. English sentences are lined up to sentences in eleven other languages (BG, CS, DE, EL, HR, IT, NL, PL, PT, RU, ZH) where names and noun phrases (entities) are manually annotated and linked to their respective Wikipedia pages. For every linked entity in English, the corresponding term or phrase in the target language is also marked and linked to its Wikipedia page in that language. The annotation process was performed via crowdsourcing. In this paper we present the task, annotation process, the encountered difficulties with crowdsourcing for complex annotation, and the data set in more detail. We demonstrate the usage of the data set for Wikification evaluation. This data set is valuable as it constitutes a rich resource consisting of annotated data of English text linked to translations in eleven languages including several languages such as Bulgarian and Greek for which n...

Download

Translation Crowdsourcing: Creating a Multilingual Corpus of Online Educational Content

The present work describes a multilingual corpus of online content in the educational domain, i.e... more The present work describes a multilingual corpus of online content in the educational domain, i.e. Massive Open Online Course material, ranging from course forum text to subtitles of online video lectures, that has been developed via large-scale crowdsourcing. The English source text is manually translated into 11 European and BRIC languages using the CrowdFlower platform. During the process several challenges arose which mainly involved the in-domain text genre, the large text volume, the idiosyncrasies of each target language, the limitations of the crowdsourcing platform, as well as the quality assurance and workflow issues of the crowdsourcing process. The corpus constitutes a product of the EU-funded TraMOOC project and is utilised in the project in order to train, tune and test machine translation engines.

Download

Machine Translation Quality: A comparative evaluation of SMT, NMT and tailored-NMT outputs

The present study aims to compare three systems: a generic statistical machine translation (SMT),... more The present study aims to compare three systems: a generic statistical machine translation (SMT), a generic neural machine translation (NMT) and a tailored-NMT system focusing on the English to Greek language pair. The comparison is carried out following a mixed-methods approach, i.e. automatic metrics, as well as side-by-side ranking, adequacy and fluency rating, measurement of actual post editing (PE) effort and human error analysis performed by 16 postgraduate Translation students. The findings reveal a higher score for both the generic NMT and the tailored-NMT outputs as regards automatic metrics and human evaluation metrics, with the tailored-NMT output faring even better than the generic NMT output.

Download

Undergraduate Translation Students’ Performance and Attitude vis-à- vis Machine Translation and Post-editing: Does Training Play a Role?

In an effort to meet the demands in speed, productivity and low-cost, the translation industry ha... more In an effort to meet the demands in speed, productivity and low-cost, the translation industry has turned to Machine Translation (MT) and Post-editing (PE). Nowadays, MT output is used as raw translation to be further post-edited by a translator (Lommel and DePalma, 2016). Yet, translators still approach PE with caution and scepticism and question its real benefits (Koponen 2012; Gaspari et al 2014; Moorkens 2018). In addition, attitudes to MT and PE seem to affect PE effort and performance (Witczak, 2016; Çetiner and İşisağ, 2019). Under that light, this study aims to investigate the attitudes and perceptions of undergraduate translation students towards MT and PE and their performance before and after they receive training in MT and PE. Questionnaires are used to capture their attitudes and perceptions, a calculation of the technical effort and the temporal effort expended by the students while post-editing is also used, while a human evaluation of the post-edited output is carrie...

Download

Translation vs Post-editing of NMT Output: Insights from the English-Greek language pair

Uploads

Conference Presentations by MARIA STASIMIOTI

Publications by MARIA STASIMIOTI

Papers by MARIA STASIMIOTI

Log In