Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze,... more
On-line keyword searching from documents in Chinese tends to use inverted indexing as the main technique, which has its difficulties. Suffix Array is widely used for processing text in Western languages. However, it fails to get widely... more
Linguistics and the science of Anthropology have much in common. In fact, to a large extent the two fields overlap. Field workers utilize research models of the ethnographic type as well as approaches that are experimental, methods that... more
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze,... more
This paper reports on the first Oromo-English CLIR system that is based on dictionary-based query translation techniques. The basic objective of the study is to design and develop an Oromo-English CLIR system with a view to enable Afaan... more
TWLT is an acronym of Twente Workshop(s) on Language Technology. These workshops on natural language theory and technology are organised by the Parlevink Project, a language theory and technology project of the . For each workshop... more
Sentence alignment consists in estimating which sentence or sentences in the source language correspond with which sentence or sentences in a target language. We present in this paper a new approach to aligning sentences from a parallel... more
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze,... more
Integrating Different Strategies for Cross-Language Information Retrieval in the MIETTA Project Paul Buitelaar, Klaus Netter, Feiyu Xu DFKI Language Technology Lab Stuhlsatzenhausweg 3, 66123 Saarbrücken, Germany {paulb, netter, feiyu}@... more
Today we are living in modern Internet era. We can get all our information from the internet anytime and from anywhere using a desktop PC or a smart phone. However, the underlying technology for relevant information retrieval from the... more
Cross-language information retrieval (CLIR) systems cater for the requirements of users who need to access a pool of information published in a language that they do not speak. A CLIR system uses an information retrieval (IR) architecture... more
We have aligned Japanese and English news articles and sentences to make a large parallel corpus. We first used a method based on cross-language information retrieval (CLIR) to align the Japanese and English articles and then used a... more
Resumo -Este artigo procura identificar dificuldades encontradas por estudantes da área médica na busca de informação técnica de medicina. O objetivo principal é mostrar a importância da recuperação de informação na área médica por meio... more
This paper reviews literature on dictionary-based cross-language information retrieval (CLIR) and presents CLIR research done at the University of Tampere (UTA). The main problems associated with dictionary-based CLIR, as well as... more
In this paper, we describe a web-based multilingual tool for Arabic information retrieval based on ontology in the legal domain. We illustrate the manual construction of the ontology and the way it is edited using Protégé2000. Using... more
Data available on the web is growing at an exponential rate, creating Knowledge or extracting information is of paramount importance. Information Retrieval (IR) plays a crucial role in Knowledge management as it helps us to find the... more
Despite all existing methods to identify a person, such as fingerprint, iris, and facial recognition, the personal name is still one of the most common ways to identify an individual. In this paper, we propose two novel algorithms: The... more
We employ Automorphology, an MDL-based algorithm that determines the suffixes present in a language-sample with no prior knowledge of the language in question, and describe our experiments on the usefulness of this approach for... more
In this paper, we present our Hindi to English and Marathi to English CLIR systems developed as part of our participation in the CLEF 2007 Ad-Hoc Bilingual task. We take a query translation based approach using bi-lingual dictionaries.... more
It is well known that the use of a good Machine Transliteration system improves the retrieval performance of Cross-Language Information Retrieval (CLIR) systems when the query and document languages have different orthography and phonetic... more
How to conduct legal research in the English legal system using only official resources from the courts, and without having to subscribe to a subscription from a legal information provider.
Information Retrieval, the results is not encouraging. Proper names are problematic for cross language information retrieval (CLIR), detecting and extracting proper noun in Arabic language is a primary key for improving the effectiveness... more
This paper describes the participation of MIRACLE in NTCIR 2005 CLIR task. Although our group has a strong background and long expertise in Computational Linguistics and Information Retrieval applied to European languages and using Latin... more
The main objective of our project is to extract clinical information from thoracic radiology reports in Portuguese using Machine Translation (MT) and cross language information retrieval techniques. To accomplish this task we need to... more
Цель работы: изучение особенностей научно-технического стиля в аспекте машинного перевода. Задачи работы: описать в общих чертах стилистику научно-технического текста; описать принципы работы, основные типы систем машинного перевода;... more
In this article we present an advanced version of Dual-PECCS, a cognitively-inspired knowledge representation and reasoning system aimed at extending the capabilities of artificial systems in conceptual categorization tasks. It combines... more
Searching is inherently a user-centered process; people pose the questions for which machines seek answers, and ultimately people judge the degree to which retrieved documents meet their needs. Rapid development of interactive systems... more
This paper introduces a general framework for the use of translation probabilities in cross-language information retrieval based on the notion that information retrieval fundamentally requires matching what the searcher means with what... more
The problem of finding documents written in a language that the searcher cannot read is perhaps the most challenging application of cross-language information retrieval technology. In interactive applications, that task involves at least... more
In our participation to the 2010 LogCLEF track we focused on the analysis of the European Library (TEL) logs and in particular we experimented with the identification of the natural language used in the queries. Language identification is... more
European languages. Monolingual Chinese retrieval experiments, by contrast often find that character bigrams perform as well as (and sometimes better than) automatically segmented words. During the Mandarin-English Information (MEI)... more
Starting in 1997, the National Institute of Standards and Technology conducted 3 years of evaluation of cross-language information retrieval systems in the Text REtrieval Conference (TREC). Twentytwo participating systems used topics... more
This paper reports the work of Middlesex University for the CLEF bilingual task. We have carried out experiments using Portuguese queries to retrieve documents in English. The approach used was Latent Semantic Indexing, which is an... more
The paper presents a study of a challeng-ing task in machine translation and cross-language information retrieval transla-tion of toponyms. Due to their linguistic and extra-linguistic nature, toponyms de-serve a special treatment. The... more
Dies ist eine Internet-Sonderausgabe des Aufsatzes „Language-specific encoding in multilingual corpora: Requirements and solutions“ von Jost Gippert (1999). Sie sollte nicht zitiert werden. Zitate sind der Originalausgabe in Multilinguale... more
We present a report on our participation in the Indonesian-English ad hoc bilingual task of the 2006 Cross-Language Evaluation Forum (CLEF). This year we compare the use of several language resources to translate Indonesian queries into... more
This article presents an ongoing project that aims to design and develop a robust and agile web-based application capable of semi-automatically compiling monolingual and multilingual comparable corpora, which we named iCompileCorpora. The... more
The goal of Cross Language Information Retrieval (CLIR) is to provide users with access to information that is in a different language from their queries. It has the ability to issue a query in one language and retrieve documents in... more
In this our rst joint participation as the CoLesIR group, our team has participated in the Portuguese monolingual ad-hoc task and in all robust ad-hoc tasks |all monolingual tasks, the English-to-German bilingual task, and the... more
Large-scale parallel corpus has become a reliable resource to cross the language barriers between the user and the web. These parallel texts provide the primary training material for statistical translation models and testing machine... more
The paper reports on experiments carried out in transitive translation, a branch of cross-language information retrieval (CLIR). By transitive translation we mean translation of search queries into the language of the document collection... more
For TREC 10 we participated in the Named Page Finding Task and the Cross-Lingual Task. In the web track, we explored the use of linear combinations of term collections based on document structure. Our goal was to examine the effects of... more
Information Retrieval, the results is not encouraging. Proper names are problematic for cross language information retrieval (CLIR), detecting and extracting proper noun in Arabic language is a primary key for improving the effectiveness... more
While a lot of research has focused on the effectiveness of system functionality, few studies have examined information needs and social aspects related to cross-language information retrieval. This chapter aims to speculate the human and... more
In this paper, we present some approaches to improve translation accuracy in web-based translation extraction. In previous work, the term extraction techniques that researchers used are proposed under large static corpus. We proposed some... more
Named entity (NE) translation plays an important role in many applications, such as information retrieval and machine translation. In this paper, we focus on translating NEs from Korean to Chinese in order to improve Korean-Chinese... more