The purpose of the study is to describe the process shift caused by applying certain translation techniques. The research used descriptive qualitative method by applying purposive sampling technique. The source of the data was text of... more
We present the ACL 60/60 evaluation sets for multilingual translation of ACL 2022 technical presentations into 10 target languages. This dataset enables further research into multilingual speech translation under realistic recording... more
This paper presents a thorough examination of two prominent speech-to-text translation (STT) models: the end-to-end (E2E) model and the cascade model. STT is a critical technology in today's multilingual society, facilitating... more
This paper evaluates different approaches on speech to sign language machine translation. The framework of the application focuses on assisting deaf people to apply for the passport or related information. In this context, the main aim is... more
This paper presents a methodology for adapting an advanced communication system for deaf people in a new domain. This methodology is a user-centered design approach consisting of four main steps: requirement analysis, parallel corpus... more
This paper describes the development of LSESpeak, a spoken Spanish generator for Deaf people. This system integrates two main tools: a sign language into speech translation system and an SMS (Short Message Service) into speech translation... more
This paper introduces several important features of the Chinese large vocabulary continuous speech recognition system in the NICT/ATR multi-lingual speech-to-speech translation system. The features include: (1) a flexible way to derive an... more
We describe our recent effort implementing SRI's UMPCbased Pashto speech-to-speech (S2S) translation system on a smart phone running the Android operating system. In order to maintain very low latencies of system response on... more
The paper presents the design concept of the VoiceTRAN Communicator that integrates speech recognition, machine translation and text-to-speech synthesis using the DARPA Galaxy architecture. The aim of the project is to build a robust... more
Elaborar un diagnóstico integral en salud de personas trabajadoras de dos plantas enfocadas a la producción de aluminio y plástico de Villa Hidalgo, San Luis Potosí, para orientar el diseño de intervenciones públicas y privadas... more
We present a scalable corpus-based concatenation text-to-speech (TTS) system, which can be used in a variety of applications, ranging from server-based systems to embedded applications. For embedded applications, limited memory and... more
The paper describes prosodic annotation procedures of the GOPOLIS Slovenian speech data database and methods for automatic classification of different prosodic events. Several statistical parameters concerning duration and loudness of... more
A small self-voicing Web browser designed for blind users is presented. The Web browser was built from the GTK Web browser Dillo, which is a free software project in terms of the GNU general public license. Additional functionality has... more
Previous studies demonstrated that a dynamic phone-informed compression of the input audio is beneficial for speech translation (ST). However, they required a dedicated model for phone recognition and did not test this solution for direct... more
Statistical machine translation, as well as other areas of human language processing, have recently pushed toward the use of large scale n-gram language models. This paper presents efficient algorithmic and architectural solutions which... more
The audio segmentation mismatch between training data and those seen at run-time is a major problem in direct speech translation. Indeed, while systems are usually trained on manually segmented corpora, in real use cases they are often... more
A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation
Recently, neural machine translation (NMT) has been extended to multilinguality, that is to handle more than one translation direction with a single system. Multilingual NMT showed competitive performance against pure bilingual systems.... more
The IWSLT 2016 Evaluation Campaign featured two tasks: the translation of talks and the translation of video conference conversations. While the first task extends previously offered tasks with talks from a different source, the second... more
The IWSLT 2017 evaluation campaign has organised three tasks. The Multilingual task, which is about training machine translation systems handling many-to-many language directions, including so-called zero-shot directions. The Dialogue... more
The IWSLT 2015 Evaluation Campaign featured three tracks: automatic speech recognition (ASR), spoken language translation (SLT), and machine translation (MT). For ASR we offered two tasks, on English and German, while for SLT and MT a... more
Evaluation campaigns are the most successful modality for promoting the assessment of the state of the art of a field on a specific task. Within the field of Machine Translation (MT), the International Workshop on Spoken Language... more
This paper reports on the shared tasks organized by the 20th IWSLT Conference. The shared tasks address 9 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing,... more
Automatic subtitling is the task of automatically translating the speech of an audiovisual product into short pieces of timed text, in other words, subtitles and their corresponding timestamps. The generated subtitles need to conform to... more
Many existing speech translation benchmarks focus on native-English speech in high-quality recording conditions, which often do not match the conditions in real-life use-cases. In this paper, we describe our speech translation system for... more
We discuss a set of methods for the creation of IESTAC: a English-Italian speech and text parallel corpus designed for the training of end-toend speech-to-text machine translation models and publicly released as part of this work. We... more
KANT is an interlingual MT system for multi-lingual translation of technical documents, written using a controlled vocabulary and grammar. KANT is comprised of a set of software modules (parser, interpreter, mapper, generator) which work... more
This paper proposes the use of Factored Translation Models (FTMs) for improving a Speech into Sign Language Translation System. These FTMs allow incorporating syntactic-semantic information during the translation process. This new... more
This paper describes the first experiments of a speech to sign language translation system in a real domain. The developed system is focused on the sentences spoken by an officer when assisting people in applying for, or renewing the... more
This paper describes the development of and the first experiments in a Spanish to sign language translation system in a real domain. The developed system focuses on the sentences spoken by an official when assisting people applying for,... more
Self-Supervised Learning (SSL) models have been successfully applied in various deep learning-based speech tasks, particularly those with a limited amount of data. However, the quality of SSL representations depends highly on the... more
Transcription bottlenecks", created by a shortage of effective human transcribers are one of the main challenges to endangered language (EL) documentation. Automatic speech recognition (ASR) has been suggested as a tool to overcome such... more
All performed speeches in the Icelandic parliament, Althingi, are transcribed and published. An automatic speech recognition system (ASR) has been developed to reduce the manual work involved. To our knowledge, this is the first open... more
In presenting this dissertation in partial fulfillment of the requirements for the doctoral degree at the University of Washington, I agree that the Library shall make its copies freely available for inspection. I further agree that... more
The purpose of the study is to describe the process shift caused by applying certain translation techniques. The research used descriptive qualitative method by applying purposive sampling technique. The source of the data was text of... more
Automatic Speech Recognition (ASR) has experienced remarkable progress, transitioning from rule-based systems to deep learning methodologies that enhance interactions between humans and machines. Earlier ASR systems depended on Hidden... more
En reconnaissance de mots manuscrits, la capacité de rejeter les mots qui n'appartiennent pas au lexique ou présentent une ambiguïté est indispensable pour fiabiliser un système de reconnaissance utilisé en condition réelle. Dans cet... more
Cross-language communication in healthcare is urgently needed. Daily and nightly throughout the world, thousands of conversations are required between caregiversdoctors, nurses, administrators, volunteers, and othersand patients or family... more
The paper describes the process of creation of domain-specific speech corpora containing air traffic control (ATC) communication prompts. Since the ATC domain is highly specific both from the acoustic point-of-view (significant level of... more
Language preservation has become increasingly urgent with globalization and rapid technological advancement. Minority languages, such as Hakka, are particularly vulnerable. This study aims to expedite the research and development of Hakka... more
This paper describes the Natural Language Engineering and Pattern Recognition group (ELiRF) approaches and results towards the Similar Segments of Social Speech Task of Me-diaEval 2013. The task involves finding segments similar to a... more
This paper presents an automated speech recognition (ASR) system that transcribes audio from YouTube videos into accurate text using OpenAI's Whisper model. Leveraging tools such as yt_dlp, FFmpeg, and PyTorch, the system creates a robust... more
The speech-to-speech translation system Verbmobil requires a multilingual setting. This consists of recognition engines in the three languages German, English and Japanese that run in one common framework together with a language... more
Performance and usability of real-world speech-to-speech translation systems, like the one developed within the Nespole! project, are affected by several aspects that go beyond the pure translation quality provided by the Human Language... more
State of the art voice cloning methodologies employed conventional concatenative and parametric synthesis techniques, effective and efficient though they have remained, producing mechanical, if somewhat constrained speech. Advancements in... more