CLIR Evaluation at TREC

Harman, Donna; Braschler, Martin; Hess, Michael; Kluck, Michael; Peters, Carol; Schäuble, Peter; Sheridan, Páraic

doi:10.1007/3-540-44645-1_2

Outline

CLIR Evaluation at TREC

Michael Kluck

2000

https://doi.org/10.1007/3-540-44645-1_2

visibility

…

description

16 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Starting in 1997, the National Institute of Standards and Technology conducted 3 years of evaluation of cross-language information retrieval systems in the Text REtrieval Conference (TREC). Twentytwo participating systems used topics (test questions) in one language to retrieve documents written in English, French, German, and Italian. A large-scale multilingual test collection has been built and a new technique for building such a collection in a distributed manner was devised.

Min-Yen Kan

2000

This year the Eurospider team, with help from Columbia, focused on trying different combinations of translation approaches. We investigated the use and integration of pseudo-relevance feedback, multilingual similarity thesauri and machine translation. We also looked at different ways of merging individual crosslanguage retrieval runs to produce multilingual result lists. We participated in both the CLIR main task and the GIRT sub task.

downloadDownload free PDF View PDFchevron_right

Combining Evidence for Cross-Language Information Retrieval

Christof Monz

Lecture Notes in Computer Science, 2003

This paper describes the official runs of our team for CLEF 2002. We took part in the monolingual tasks for each of the seven non-English languages for which CLEF provides document collections (Dutch, Finnish, French, German, Italian, Spanish, and Swedish). We also conducted our first experiments for the bilingual task (English to Dutch, and English to German), and took part in the GIRT and Amaryllis tasks. Finally, we experimented with the combination of runs.

downloadDownload free PDF View PDFchevron_right

Integrating different strategies for cross-language information retrieval in the MIETTA project

Feiyu Xu

Proceedings of TWLT14, …, 1998

TWLT is an acronym of Twente Workshop(s) on Language Technology. These workshops on natural language theory and technology are organised by the Parlevink Project, a language theory and technology project of the . For each workshop proceedings are published containing the papers that were presented. TWLT 14, has been organised together with the German Research Center for Artificial Intelligence, DFKI Saarbrücken, Germany. The idea for this workshop grew out of a longstanding cooperation between the University of Twente, TNO-TPD in Delft and DFKI. This co-operation manifested itself for the first time in the Twenty-One project, which inspired a whole series of other projects, such as Pop-Eye and Olive, but which also led to a close contact and exchange with independently established projects such as Mulinex and MIETTA for which DFKI was responsible. All of these projects had in common that they were funded by the Telematics Application Programme of the European Commission, all, except for Twenty-One, by the Language Engineering Sector.

downloadDownload free PDF View PDFchevron_right

Literature review of cross-language information retrieval

Mustafa Abusalah

Transactions on Engineering, Computing …, 2005

Abstract-Classical Information Retrieval (IR) is the sifting out of the documents most relevant to a user's information requirement (expressed as a query), from a large electronic store of documents. A search engine performs IR by retrieving relevant web pages from the ...

downloadDownload free PDF View PDFchevron_right

I R ] 1 0 N ov 2 02 1 Cross-language Information Retrieval

Petra Galuščáková

2021

Two key assumptions shape the usual view of ranked retrieval: (1) that the searcher can choose words for their query that might appear in the documents that they wish to see, and (2) that ranking retrieved documents will suffice because the searcher will be able to recognize those which they wished to find. When the documents to be searched are in a language not known by the searcher, neither assumption is true. In such cases, Cross-Language Information Retrieval (CLIR) is needed. This chapter reviews the state of the art for cross-language information retrieval and outlines some open research questions.

downloadDownload free PDF View PDFchevron_right

Overview of CLIR Task at the Sixth NTCIR Workshop

Sung-Hyon Myaeng

Proceedings of the sixth …, 2007

downloadDownload free PDF View PDFchevron_right

The remarkable search topic-finding task to share success stories of cross-language information retrieval

Masashi Inoue

Proceedings of the Fifth Workshop on Important Unresolved Matters, 2005

The performance of cross-language information retrieval (CLIR) systems has been improved to the level of practical use. The next step is to inform potential users that CLIR technologies are ready to be used. A good way of doing this is to present attractive scenarios of using multilingual information sources. For this purpose, we need to obtain more knowledge on the occasions when CLIR is more beneficial as compared with monolingual information retrieval from the utility perspective. The difficulty lies in the ...

downloadDownload free PDF View PDFchevron_right

Implementing Cross-Language Text Retrieval Systems for Large-scale Text Collections and the World Wide Web

William Ogden

QUILT (Query User Interface with Light Translations) is prototype implementation of a complete cross-language text retrieval system that takes English queries and produces English gloss translations of Spanish documents. The system indexes the Spanish documents in Spanish, but converts the English query into a Spanish equivalent set through a novel combination of lexical methods and parallel-corpus disam- biguatinn. Similar methods are applied to the returned docu- ment to produce a simple translation that can be examined by non-Spanish speakers to gauge the relevance of the document to the original English query. The system integrates tradi- tional, glossary-based machine txanslation technology with information retrieval approaches and demonstrates that rela- tively simple term substitution and disambiguation approaches can he viable for cross-language text retrieval. Components of QUILT have been used to build a CLTR inter- face to WWW-based search services.

downloadDownload free PDF View PDFchevron_right

A system for supporting cross-lingual information retrieval

Gregor Erbach

Information processing & …, 2000

In this paper, we present the system MULINEX, a fully implemented system which supports cross-lingual search of the WWW. Users can formulate, expand and disambiguate queries, filter the search results and read the retrieved documents by using only their native language. This multilingual functionality is achieved by the use of dictionary-based query translation, multilingual document categorisation and automatic translation of summaries and documents. The system supports French, German and English and has been installed and tested in the online services of two European internet content and service provider companies. This paper focuses on the techniques and algorithms used in the MULINEX system, explaining how each component works and how it contributes to the overall functionality of the integrated system. The primary system functionalities are outlined from the user perspective, followed by a description of the document database used in the system. The technologies and linguistic resources used in the various system components are then described in detail.

downloadDownload free PDF View PDFchevron_right

A comprehensive survey on cross-language information retrieval system

Gouranga jena

Indonesian Journal of Electrical Engineering and Computer Science

Cross language information retrieval (CLIR) is a retrieval process in which the user fires queries in one language to retrieve information from another (different) language. The diversity of information and language barriers are the serious issues for communication and cultural exchange across the world. To solve such barriers, Cross language information retrieval system, are nowadays in strong demand. CLIR is a subset of Information Retrieval (IR) system. Information Retrieval deals with finding useful information from a large collection of unstructured, structured and semi-structured data to a user query where the query is a set of keywords. Information Retrieval can be classified into different classes such as Monolingual information retrieval, Bi-Lingual Information Retrieval, Multilingual information retrieval and Cross language information retrieval. This paper focuses on the various IR variants and techniques used in CLIR system. Further, based on available literature, a numb...

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

Michael Kluck

The objective of the Cross-Language Evaluation Forum (CLEF) is to promote research in the multilingual information access domain. In this short paper, we list the achievements of CLEF during its first four years of activity and describe how the range of tasks has been considerably expanded during this period. The aim of the paper is to demonstrate the importance of evaluation initiatives with respect to system research and development and to show how essential it is for such initiatives to keep abreast of and even anticipate the emerging needs of both system developers and application communities if they are to have a future.

downloadDownload free PDF View PDFchevron_right

Xerox TREC-6 Site Report: Cross Language Text Retrieval

Gregory Grefenstette

In Proceedings of the Sixth …, 1998

downloadDownload free PDF View PDFchevron_right

CRL's TREC-8 Systems Cross-Lingual IR, and Q&A

Sergei Nirenburg

NIST SPECIAL …, 2000

This paper describes the systems used by CRL in the Cross-lingual IR and Q&A tracks. The cross-language experiment was unique in that it was run interactively with a mono-lingual user simulating how a true cross-language system might be used. The methods used in the Q&A system are based on language processing technology developed at CRL for machine translation and information extraction.

downloadDownload free PDF View PDFchevron_right

Cross Language Information Retrieval System

Zahid Hussain

The number of web users accessing information’s over the net is growing hastily with each passing day. A massive quantity of information in extraordinary language is to be had on internet which can be accessed by way of all and sundry. Information retrieval(IR) is the technology which offers with locating beneficial statistics from a large series data, specifically unstructured, based and semi-based information. Records retrieval can be labeled into exclusive lessons which include monolingual information retrieval, move language information retrieval (CLIR) and multilingual information retrieval (MLIR). The sector has turn out to be global village naw and the range of data and language boundaries are the principal issues for verbal exchange and cultural trade internationally. To remedy such issues and to take away those barriers, move language information retrieval (CLIR) device is in strong demand now days. CLIR is the IR device in which the question or documents may seem in one-of-a-kind languages. on this paper we are able to offer an overview of the brand new application areas of CLIR. We are able to also overview the strategies which are used in the manner of CLIR research for question and record translation. Furthermore, we will additionally try and identify some of demanding situations and troubles in CLIR systems.

downloadDownload free PDF View PDFchevron_right

Corpus‐based cross‐language information retrieval in retrieval of highly relevant documents

Kalervo Järvelin

2007

IR systems' ability to retrieve highly relevant documents has become more and more important in the age of extremely large collections, such as the WWW. Our aim was to find out how corpus-based CLIR manages in retrieving highly relevant documents. We created a Finnish-Swedish comparable corpus and used it as a source of knowledge for query translation. Finnish test queries were translated into Swedish and run against a Swedish test collection. Graded relevance assessments were used in evaluating the results and three relevance criterion levels-liberal, regular, and stringent-were applied. The runs were also evaluated with generalized recall and precision, which weight the retrieved documents according to their relevance level. The performance of our Comparable Corpus Translation system (Cocot) was compared to that of a dictionary-based query translation program; the two translation methods were also combined. The results indicate that corpus-based CLIR performs particularly well with highly relevant documents. In average precision, Cocot even matched the monolingual baseline on the highest rele-2 vance level. The performance of the different query translation methods was further analyzed by finding out reasons for poor rankings of highly relevant documents.

downloadDownload free PDF View PDFchevron_right

Language-Dependent and Language-Independent Approaches to Cross-Lingual Text Retrieval

Christof Monz

Lecture Notes in Computer Science, 2004

We investigates the effectiveness of language-dependent approaches to document retrieval, such as stemming and decompounding, and constrast them with language-independent approaches, such as character n-gramming. In order to reap the benefits of more than one type of approach, we also consider the effectiveness of the combination of both types of approaches. We focus on document retrieval in nine European languages: Dutch, English, Finnish, French, German, Italian, Russian, Spanish, and Swedish. We look at four different cross-lingual information retrieval tasks: monolingual, bilingual, multilingual, and domain-specific retrieval. The experimental evidence is obtained using the 2003 test suite of the cross-language evaluation forum (CLEF).

downloadDownload free PDF View PDFchevron_right

ITC-irst at CLEF 2003: Monolingual, Bilingual, and Multilingual Information Retrieval

Nicola Bertoldi

Lecture Notes in Computer Science, 2004

This paper reports on the participation of ITC-irst in the Cross Language Evaluation Forum 2003; in particular, in the monolingual, bilingual, small multilingual, and spoken document retrieval tracks. Considered languages were English, French, German, Italian, and Spanish. With respect to our CLEF 2002 system, the statistical models for bilingual document retrieval have been improved, more languages have been considered, and a novel multilingual information retrieval system has been developed, which combines several bilingual retrieval models into a statistical framework. As in the last CLEF, bilingual models integrate retrieval and translation scores over the set of N-best translations of the source query.

downloadDownload free PDF View PDFchevron_right

Manual Queries and Machine Translation in Cross-language Retrieval and Interactive Retrieval with Cheshire II at TREC-7

Ray Larson

For TREC-7, the Berkeley ad-hoc experiments explored more phrase discovery in topics and documents. We utilized Boolean retrieval combined with probabilistic ranking for 17 topics in ad-hoc manual entry. Our cross-language experiments tested 3 di erent widely available machine translation software packages. For language pairs e.g. German to French for which no direct machine translation was available we made use of English as a universal intermediate language. For CLIR we also manually reformulated the English topics before doing machine translation, and this elicited a signi cant performance increase for both quad language retrieval and for English against English and French documents. In our Interactive Track entry eight searchers conducted eight searches each, half on the Cheshire II system and the other half on the Zprise system, for a total of 64 searches. Questionnaires were administered to gather information about basic demographic and searching experience, about each search, about each of the systems, and nally, about the user's perceptions of the systems.

downloadDownload free PDF View PDFchevron_right

Textual Information Retrieval Systems Test: The Point of View of an Organizer and Corpuses Provider

Patrick Kremer

2000

Amaryllis is an evaluation programme for text retrieval systems which has been carried out as two test campaigns. The second Amaryllis campaign took place in 1998/1999. Corpuses of documents, topics, and the corresponding responses were first sent to each of the participating teams for system learning purposes. Corpuses of new documents and a set of new topics were then supplied for evaluation purposes. Two optional tracks were added for Internet and interlingual track. The first track of these contained a test via the Internet. INIST sent topics to the system and collected responses directly, thus reducing the need for conceptor manipulations. The second contained tests in different European Community language pairs. The corpuses of documents consisted of records of questions and answers from the European Commission, in parallel official language versions. Participants could use any language pair for their tests. The aim of this paper is to give the point of view of an organizer an...

downloadDownload free PDF View PDFchevron_right

Document Translation for Cross-Language Text Retrieval at the University of Maryland

Paul G Hackett

NIST SPECIAL PUBLICATION SP, 1998

The University o f Maryland participated in three TREC-6 tasks: ad hoc retrieval, cross-language retrieval, and spoken document retrieval. The principal focus of the work was evaluation of a crosslanguage text retrieval technique based on fully automatic machine translation. The results show that approaches based on document translation can be approximately as e ective as approaches based on query translation, but that additional work will be needed to develop a solid basis for choosing between the two in speci c applications. Ad hoc and spoken document retrieval results are also presented.

downloadDownload free PDF View PDFchevron_right

CLIR Evaluation at TREC

Sign up for access to the world's latest research

Abstract

Related papers

Related papers

Related topics