This article investigates (a) whether register discrimination can successfully exploit linguistic information reflecting the evolution of a language (such as the diglossia phenomenon of the Modern Greek language) and (b) what kind of... more
This article investigates (a) whether register discrimination can successfully exploit linguistic information reflecting the evolution of a language (such as the diglossia phenomenon of the Modern Greek language) and (b) what kind of linguistic information and which statistical techniques may be employed to distinguish among individual styles within one register. Using clustering techniques and features reflecting the diglossia phenomenon, we have successfully discriminated registers in Modem
- by Marina Vassiliou and +1
- •
In this paper an innovative approach is presented for MT, which is based on pattern matching techniques, relies on extensive target language monolingual corpora and employs a series of similarity weights between the source and the target... more
In this paper an innovative approach is presented for MT, which is based on pattern matching techniques, relies on extensive target language monolingual corpora and employs a series of similarity weights between the source and the target language. Our system is based on the notion of 'patterns', which are viewed as 'models' of target language strings, whose final form is defined by the corpus. * Author names are given in alphabetical order.
- by Marina Vassiliou and +2
- •
- Machine Translation
The innovative feature of the system presented in this paper is the use of pattern-matching techniques to retrieve translations resulting in a flexible, language-independent approach, which employs a limited amount of explicit a priori... more
The innovative feature of the system presented in this paper is the use of pattern-matching techniques to retrieve translations resulting in a flexible, language-independent approach, which employs a limited amount of explicit a priori linguistic knowledge. Furthermore, while all state-of-the-art corpus-based approaches to Machine Translation (MT) rely on bitexts, this system relies on extensive target language monolingual corpora. The translation process distinguishes three phases: 1) pre-processing with 'light' rule and statisticsbased NLP techniques 2) search & retrieval, 3) synthesising. At Phase 1, the source language sentence is mapped onto a lemma-to-lemma translated string. This string then forms the input to the search algorithm, which retrieves similar sentences from the corpus (Phase 2). This retrieval process is performed iteratively at increasing levels of detail, until the best match is detected. The best retrieved sentence is sent to the synthesising algorithm (Phase 3), which handles phenomena such as agreement.
This article investigates (a) whether register discrimination can successfully exploit linguistic information reflecting the evolution of a language (such as the diglossia phenomenon of the Modern Greek language) and (b) what kind of... more
This article investigates (a) whether register discrimination can successfully exploit linguistic information reflecting the evolution of a language (such as the diglossia phenomenon of the Modern Greek language) and (b) what kind of linguistic information and which statistical techniques may be employed to distinguish among individual styles within one register. Using clustering techniques and features reflecting the diglossia phenomenon, we have successfully discriminated registers in Modem
This article investigates (a) whether register discrimination can successfully exploit linguistic information reflecting the evolution of a language (such as the diglossia phenomenon of the Modern Greek language) and (b) what kind of... more
This article investigates (a) whether register discrimination can successfully exploit linguistic information reflecting the evolution of a language (such as the diglossia phenomenon of the Modern Greek language) and (b) what kind of linguistic information and which statistical techniques may be employed to distinguish among individual styles within one register. Using clustering techniques and features reflecting the diglossia phenomenon, we have successfully discriminated registers in Modem
- by Marina Vassiliou and +1
- •
METIS-II was a EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input without resorting to parallel corpora. The idea was to use 'basic' linguistic tools and representations and to link... more
METIS-II was a EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input without resorting to parallel corpora. The idea was to use 'basic' linguistic tools and representations and to link them with patterns and statistics from the monolingual target-language corpus. The METIS-II project has four partners, translating from their 'home' languages Greek, Dutch, German, and Spanish into English.
- by Marina Vassiliou and +2
- •
- Cognitive Science, Machine Translation
In this paper, we explain why we have adopted pattern matching for MT purposes and why we have embedded it into a hybrid approach. "Patterns" here are understood as independent meaningful sub-sentential segments received in a systematic... more
In this paper, we explain why we have adopted pattern matching for MT purposes and why we have embedded it into a hybrid approach. "Patterns" here are understood as independent meaningful sub-sentential segments received in a systematic way. We describe the nature and size of the patterns used as well as the comparison algorithm developed. We discuss results obtained by matching patterns of different types and complexity in four different language pairs. Our experiments indicate that better results are obtained when matching the longest possible patterns.
- by Marina Vassiliou and +1
- •
- Machine Translation
METIS-II, the MT system presented in this paper, does not view translation as a transfer process between a source language (SL) and a target one (TL), but rather as a matching procedure of patterns within a language pair. More... more
METIS-II, the MT system presented in this paper, does not view translation as a transfer process between a source language (SL) and a target one (TL), but rather as a matching procedure of patterns within a language pair. More specifically, translation is considered to be an assignment problem, i.e. a problem of discovering each time the best matching patterns between SL and TL, which the system is called to solve by employing patternmatching techniques.
- by Marina Vassiliou and +2
- •
In this paper we report on the set of controlled language specifications defined for Modern Greek and the development of the respective style checker. We will focus on the effectiveness and suitability of these specifications by assessing... more
In this paper we report on the set of controlled language specifications defined for Modern Greek and the development of the respective style checker. We will focus on the effectiveness and suitability of these specifications by assessing the performance of a commercial machine translation system over controlled texts and will comment on the evaluation results. For our experiments we have used the SYSTRAN MT system (English-into-Greek language pair). We will show that an improvement in translation is feasible, when a text compliant with controlled language specifications enters a MT system. Finally, we will propose a third parameter for setting CL specifications.
The present article introduces a phrasealignment approach that involves the processing of a small bilingual corpus in order to extract suitable structural information. This is used in the PRESEMT project, whose aim is the quick... more
The present article introduces a phrasealignment approach that involves the processing of a small bilingual corpus in order to extract suitable structural information. This is used in the PRESEMT project, whose aim is the quick development of phrase-based Machine Translation (MT) systems for new language pairs. A main bottleneck of such systems is the need to create compatible parsing schemes in the source and target languages. This bottleneck is overcome by combining two modules, the Phrase aligner module and the Phrasing model generator, both of them being based on pattern recognition principles.
- by Marina Vassiliou and +2
- •
METIS-II was a EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input without resorting to parallel corpora, but relying only on monoligual target language corpora and employing a palette... more
METIS-II was a EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input without resorting to parallel corpora, but relying only on monoligual target language corpora and employing a palette of statistical, pattern-matching and rule-based methods. The METIS-II project has four partners, translating from their 'home' languages Greek, Dutch, German, and Spanish into English. The idea was to use 'basic' linguistic tools and representations and to link them with patterns and statistics from the monolingual target-language corpus. The paper reports on the background of the ideas, their implementation, the resources used, and the results obtained. It also gives a few examples of how METIS-II has continued beyond its lifetime and the original scope of the project. On the basis of the results and experiences obtained, we conclude that the approach is promising and offers the potential for development in various directions.
- by Marina Vassiliou
- •
In this paper we describe the METIS-II system and its evaluation on each of the language pairs: Dutch, German, Greek, and Spanish to English. The METIS-II system envisaged developing a data-driven approach in which no parallel corpus is... more
In this paper we describe the METIS-II system and its evaluation on each of the language pairs: Dutch, German, Greek, and Spanish to English. The METIS-II system envisaged developing a data-driven approach in which no parallel corpus is required and in which no full parser or extensive rule sets are needed. We describe the evaluation on a development test set and on a test set taken from Europarl, and compare our results with SYSTRAN. We also provide some further analysis, namely researching the impact of the number and source of the reference translations and analysing the results according to test text type. The results are expectably lower for the METIS system, but not at an unattainable distance from a mature system like SYSTRAN.
- by Marina Vassiliou and +1
- •
- Machine Translation, IT Evaluation
The present article describes AMP, a system for automated morphological processing of Ancient Greek word forms. It is considered a hybrid approach, combining pattern recognition techniques with limited linguistic knowledge to achieve... more
The present article describes AMP, a system for automated morphological processing of Ancient Greek word forms. It is considered a hybrid approach, combining pattern recognition techniques with limited linguistic knowledge to achieve accurate segmentation into stem and ending, and is expected to substantially contribute to the creation and/or enrichment of Greek morphological lexica. Though the current implementation concerns Attic dialect word forms, its modularity ensures its extensibility to other dialects and/or synchronies of the Greek language with minor modifications.
- by Marina Vassiliou and +1
- •
Monolingual Corpus-based MT using Chunks Stella Markantonatou1, Sokratis Sofianopoulos2, Vassiliki Spilioti3, Yiorgos Tambouratzis4, Marina Vassiliou5, Olga Yannoutsou6, Nikos Ioannou7 Machine Translation Department, Institute for... more
Monolingual Corpus-based MT using Chunks Stella Markantonatou1, Sokratis Sofianopoulos2, Vassiliki Spilioti3, Yiorgos Tambouratzis4, Marina Vassiliou5, Olga Yannoutsou6, Nikos Ioannou7 Machine Translation Department, Institute for Language & ...
- by Olga Yannoutsou and +2
- •
- Machine Translation
In the present article, a hybrid approach is pro- posed for implementing a machine translation system using a large monolingual corpus cou- pled with a bilingual lexicon and basic NLP tools. In the first phase of the METIS system, a... more
In the present article, a hybrid approach is pro- posed for implementing a machine translation system using a large monolingual corpus cou- pled with a bilingual lexicon and basic NLP tools. In the first phase of the METIS system, a source language (SL) sentence, after being tagged, lemmatised and translated by a flat lemma-to-lemma lexicon, was matched against a tagged
- by Sokratis Sofianopoulos and +2
- •
- Machine Translation
The current work consists in a series of statistical analyses of Greek infinitival structures, aiming at illustrating the syntactic behaviour of the Greek infinitive through time. Spanning the period 5 BC – AD 16, the specific structures... more
The current work consists in a series of statistical analyses of Greek infinitival structures, aiming at illustrating the syntactic behaviour of the Greek infinitive through time. Spanning the period 5 BC – AD 16, the specific structures are drawn from texts of various authors and divergent topics, being representative of the four synchronies of the Greek language. The text corpus employed exceeds 5 million words in size, within which the infinitival occurrences approximate 102,000. To the best of our knowledge, measurements of such scale are presented for the first time, allowing a diachronic study of the infinitive use supported by statistical tests.
- by Marina Vassiliou
- •
The current paper presents a languageindependent methodology, which facilitates the creation of machine translation (MT) systems for various language pairs. This methodology is implemented in the PRESEMT hybrid MT system. PRESEMT has the... more
The current paper presents a languageindependent methodology, which facilitates the creation of machine translation (MT) systems for various language pairs. This methodology is implemented in the PRESEMT hybrid MT system. PRESEMT has the lowest possible requirements on specialised resources and tools, given that for many languages (especially less widely used ones) only limited linguistic resources are available. In PRESEMT, the main translation process comprises two phases. The first one, Structure selection, determines the overall structure of a target language (TL) sentence, drawing on syntactic information from a small bilingual corpus. The second phase, Translation equivalent selection, relies on models extracted solely from monolingual corpora to implement translation disambiguation, determine intra-phrase word order and handle functional words. This paper proposes extracting information for disambiguation from the monolingual corpus. Experimental results indicate that such information substantially contributes in improving translation quality.
The present article provides a comprehensive review of the work carried out on developing PRESEMT, a hybrid language-independent machine translation (MT) methodology. This methodology has been designed to facilitate rapid creation of MT... more
The present article provides a comprehensive review of the work carried out on developing PRESEMT, a hybrid language-independent machine translation (MT) methodology. This methodology has been designed to facilitate rapid creation of MT systems for unconstrained language pairs, setting the lowest possible requirements on specialised resources and tools. Given the limited availability of resources for many languages, only a very small bilingual corpus is required, while language modelling is performed by sampling a large target language (TL) monolingual corpus. The article summarises implementation decisions, using the Greek-English language pair as a test case. Evaluation results are reported, for both objective and subjective metrics. Finally, main error sources are identified and directions are described to improve this hybrid MT methodology.
This document contains a brief presentation of the PRESEMT project that aims in the development of a novel language-independent methodology for the creation of a flexible and adaptable MT system.