Academia.eduAcademia.edu

Outline

Cross-language Information Retrieval

2021, Cornell University - arXiv

https://doi.org/10.48550/ARXIV.2111.05988

Abstract

Lectures on Human Language Technologies publishes monographs on topics relating to natural language processing, computational linguistics, information retrieval, and spoken language understanding. Emphasis is placed on important new techniques, on new applications, and on topics that combine two or more HLT subfields.

References (17)

  1. Using Manually Constructed Translation Systems and Resources for CLIR ......... 29
  2. 1 Machine Translation .......................................................................................... 29 2.1.1 Rule-Based MT ..................................................................................... 30 2.1.2 Statistical MT ........................................................................................ 32
  3. 2 Basic utilization of MT in CLIR ....................................................................... 37 2.2.1 Rule-Based MT ..................................................................................... 39 2.2.2 Statistical MT ........................................................................................ 41 2.2.3 Unknown Word ..................................................................................... 41
  4. 3 Open the Box of MT ......................................................................................... 44
  5. 4 Dictionary-Based Translation for CLIR ............................................................ 45 2.4.1 Basic Approaches ................................................................................... 46 2.4.2 The Term Weighting Problem .............................................................. 47 2.4.3 Coverage of the Dictionary ................................................................... 49 2.4.4 Translation Ambiguity ........................................................................... 50 2.4.5 Selection of Translation Words ............................................................. 50 2.4.6 Other Related Approaches .................................................................... 53
  6. Translation Based on Parallel and Comparable Corpora ...................................... 57
  7. 1 Parallel Corpora ................................................................................................. 57
  8. 2 Paragraph/Sentence Alignment ......................................................................... 60 3.3 Utilization of Translation Models in CLIR ...................................................... 63 3.4 Embedding Translation Models into CLIR Models ......................................... 70
  9. 5 Alternative Approaches using Parallel Corpora ................................................. 75 3.5.1 Exploiting a Parallel Corpus by Pseudo-Relevance Feedback ............... 75
  10. 5.2 Using Latent Semantic Indexing (LSI) ................................................. 76
  11. 5.3 Using Comparable Corpora................................................................... 78
  12. 6 Discussions on CLIR Methods and Resources.................................................. 80
  13. 7 Mining for Translation Resources and Relations ............................................... 81 3.7.1 Mining for Parallel Texts ....................................................................... 81 3.7.2 Transliteration ....................................................................................... 85 3.7.3 Mining Translations using Hyperlinks .................................................. 88 3.7.4 Mining Translations from Monolingual Web Pages .............................. 90
  14. Other Methods to Improve CLIR ...................................................................... 95 4.1 Pre-and Post-Translation Expansion ................................................................ 95
  15. 2 Fuzzy Matching ................................................................................................. 96
  16. 3 Combining Translations .................................................................................... 97
  17. 4 Transitive Translation ........................................................................................ 98