Compiling and using a shareable parallel corpus for MT evaluation
2004
Abstract
Parallel texts are an important resource for applications in multilingual natural language processing and human language technology. This paper presents a method for exploiting available parallel texts, both human translated and revised machine translated texts in order to populate machine translation and translation memory databases.
References (160)
- Babych, B., Hartley, A. (2003). Improving Machine Translation Quality with Automatic named Entity Recognition. Proc. EACL-EAMT, Budapest.
- Brundage, J. (2001). Machine Translation -Evolution not Revolution. Proc. MT Summit VIII Santiago Calzolari, N., Bertagna, F., Lenci, A., Monachini, M., ed., (2002). Standards and best practice for multilingual computational Lexicons and MILE (the Multilingual ISLE Lexical Entry). ISLE-Report 2002
- Charniak, E. (1997). Statistical parsing with a context-free grammar and word statistics. Proc. AAAI.
- McCormick, S. (2001). The structure and content of the body of an OLIF v.2 File. www.olif.net
- Nießen, S., Ney, H. (2000): Improving SMT Quality with Morpho-syntactic analysis. Proc. COLING 2000 (NIST, 2001) Automatic Evaluation of machine Translation Quality Using N-gram Co-Occurrence Statistics. www.nist.gov/speech/tests/mt
- Och, F., Gildea, D, Khudanpur, S., et al. (2003): Syntax for Statistical Machine Translation. J. Hopkins Summer Workshop. www.clsp.jhu.edu/ws03/groups/translate
- Piperidis, St., Boutsis, S., Demiros, J. (1997). Automatic Translation Lexicon Generation from Multilingual texts.. Proc. AAAI 1997.
- Richardson, St., Dolan, W., Menezes, A., Pinkham, J. (2001): Achieving Commercial-quality Translation with Example-based Methods. Proc. MT Summit VIII, Santiago
- Thurmair, G. (1990). Complex lexical transfer in METAL. Proc. TMI 3, Austin, Tx.
- Thurmair, G. (2000): TQPro, Quality Tools for the translation process. proc. ASLIB, London Thurmair, G. (2003). Making Term Extraction Tools Usable. Proc EAMT-CLAW Dublin.
- Underwood, N., Jongejan, B. (2001). Translatability Checker: A Tool to Help Decide Whether to Use MT. Proc. MT Summit VIII, Santiago.
- Vogel, S., Och, F, Ney, H. (2000). The Statistical tramslation Module in the Verbmobil System. Proc. KONVENS Ilmenau.
- Vogel, S., Och. F., Tillmann, Chr., Nießen, S., Sawaf, H., Ney, H. (2000). Statistical Methods for Machine Translation. In. Wahlster, W. (ed.): Verbmobil: Foundations of Speech-to-Speech Translation. Springer.
- Weber, N. (2003): MÜ-Lexikografie. Proc. GLDV, Köthen
- Whitelock, P. (1992) Shake-and-bake Translation. Proc. COLING Nantes. References
- Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement: 20 (pp. 37-46).
- Doddington, G. (2002) "Automatic evaluation of machine translation quality using n-gram co-occurence statistics." In Proceedings of HLT 2002, Human Language Technology Conference, San Diego, CA.
- Hovy, E. and D. Ravichandran (2003). Holy and Unholy Grails. Presentation at Panel, "Have we found the holy grail?" MT Summit IX, New Orleans, LA.
- Kilgariff, A. and M. Palmer (1999). Computers and the Humanities: 34:1-2 (Special issue on Senseval1).
- Marcus, M. B. Santorini, and M. Marcinkiewicz (1993). Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics: 19.
- Papineni, K., S. Roukos, T. Ward, and W. Zhu (2002). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40 th Annual Meeting of the ACL, Philadelphia, PA
- Sundheim, Beth, ed. (1991). In Proceedings of the Third Message Understanding Conference (MUC-3), San Diego, California. Morgan Kaufmann, San Mateo, CA.
- Aires, R., Sandra Aluísio, P. Quaresma, Diana Santos & Mário Silva (2003). An initial proposal for cooperative evaluation on information retrieval in Portuguese. In Mamede et al. (eds.), Computational Processing of the Portuguese Language, 6 th International Workshop, PROPOR 2003, Proceedings, Springer, pp. 227-234. ALPAC: Automatic Language Processing Advisory Committee (1966). Language and machines: computers in translation and linguistics. Division of behavioral References
- Elliott, D., Hartley, A & Atwell, E. (2003). Rationale for a multilingual corpus for machine translation evaluation. In Proceedings of CL2003: International Conference on Corpus Linguistics (pp. 191-200). Lancaster University, UK.
- Rajman, M. & Hartley, A. (2002). Automatic Ranking of MT Systems. In Proceedings of the 3 rd International Conference on Language Resources and Evaluation. Las Palmas de Gran Canaria, Spain.
- Reeder, F., Miller, K., Doyon, K. & White, J. (2001). The Naming of Things and the Confusion of Tongues. In Proceedings of the 4 th ISLE Evaluation Workshop, MT Summit VIII. Santiago de Compostela, Spain.
- Vanni, M. & Miller, K. (2002). Scaling the ISLE Framework: Use of Existing Corpus Resources for Validation of MT Evaluation Metrics across Languages. In Proceedings of the 3 rd International Conference on Language Resources and Evaluation. Las Palmas de Gran Canaria, Spain.
- White, J. (2001). Predicting Intelligibility from Fidelity in MT Evaluation. In Proceedings of the 4 th ISLE Evaluation Workshop, MT Summit VIII. Santiago de Compostela, Spain.
- White, J. & Forner, M. (2001). Predicting MT fidelity from noun-compound handling. In Proceedings of the 4 th ISLE Evaluation Workshop, MT Summit VIII. Santiago de Compostela, Spain.
- White, J. & O'Connell, T. (1994). The ARPA MT evaluation methodologies: evolution, lessons, and future approaches. In Proceedings of the 1994
- Conference, Association for Machine Translation in the Americas. Columbia, MD. References
- Al-Adhaileh, M.H. (2002). Synchronous Structured String Tree Correspondence (S-SSTC) and its Application for Machine Translation, PhD Thesis, University of Science Malaysia.
- Al-Adhaileh Mosleh H. & Tang Enya Kong. (2001). Converting a Bilingual Dictionary into a Bilingual Knowledge Bank based on the Synchronous SSTC. Proceedings of MT Summit VIII, Santiago de Compostela, Spain, 18 Sept 2001.
- Al-Adhaileh, Mosleh H., Tang Enya Kong and Zaharin Yusoff. (2002). A Synchronization Structure of SSTC and its Applications in Machine Translation. The COLING 2002 Post-Conference Workshop on Machine Translation in Asia, Taipei, Taiwan.
- Aziz, N., et al. (2002). Is Machine Translation Still Relevant?, in MIMOS 2002 Tech-Symposium Proceedings.
- Chen, S. F. (1993) . Aligning sentences in bilingual corpora using lexical information. In Proceedings of ACL-93, Columbus OH.
- Dagan, I., Church, K. W., and Gale, W. A. (1993). Robust Bilingual Word Alignment for Machine Aided Translation. In Proceedings of the Workshop on Very Large Corpora: Acad. & Industrial Perspectives, Columbus OH.
- Gaussier, E. (1998). Flow Network Models for Word Alignment and Terminology Extraction from Bilingual Corpora. In Proceedings of COLING-ACL- 98, Montreal, (pp. 444-450).
- Lars Ahrenberg, Magnus Merkel, Anna Sågvall Hein & Jörg Tiedemann (2000). Evaluating Word Alignment Systems. Proceedings of the Second International Conference on Linguistic Resources and Evaluation (LREC-2000), Athens, Greece, 31 May -2 June, 2000, Volume III: 1255-1261.
- Somers, H., Bilingual Parallel Corpora and Language Engineering, available at http://www.emille.lancs.ac.uk/lesal/somers.pdf
- References
- Arranz, Victoria, Núria Castell, and Jesús Giménez, 2003. Development of language resources for speech-to-speech translation. In Proc. of the International Conference on Recent Advances in Natural Language Processing (RANLP). Borovets, Bulgary.
- Baum, L. E., 1972. An inequality and associated max- imization technique in statistical estimation for proba- bilistic functions of markov processes. Inequalities, 3:1- 8.
- Brown, Peter F., Stephen A. Della Pietra, Vincent J. Della Pietra, and R. L. Mercer, 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263-311.
- Germann, Ullrich, 2001. Aligned hansards of the 36th parliament of canada. re- lease 2001-1a. http://www.isi.edu/natural- language/download/hansard/index.html.
- Lambert, Patrik and Núria Castell, 2004. Evalua- tion and symmetrisation of alignments obtained with the giza++ software. Technical Report LSI-04-15-R, Technical University of Catalonia. http://www.lsi.upc.es/dept/techreps/techreps.html.
- Melamed, I. Dan, 1998. Manual annotation of translational equivalence. Technical Report 98-07, IRCS.
- Mihalcea, Rada and Ted Pedersen, 2003. An evaluation ex- ercise for word alignment. In Rada Mihalcea and Ted Pedersen (eds.), HLT-NAACL 2003 Workshop: Building and Using Parallel Texts: Data Driven Machine Trans- lation and Beyond. Edmonton, Alberta, Canada: Associ- ation for Computational Linguistics.
- Och, Franz Josef, 2000. Giza++: Train- ing of statistical translation models. http://www.isi.edu/˜och/GIZA++.html.
- Och, Franz Josef and Hermann Ney, 2000. Improved sta- tistical alignment models. In Proc. of the Annual Meet- ing of the Association for Computational Linguistics. Hongkong, China.
- Och, Franz Josef and Hermann Ney, 2003. A system- atic comparison of various statistical alignment models. Computational Linguistics, 29(1):19-51.
- Vogel, Stephan, Hermann Ney, and Christoph Tillmann, 1996. HMM-based word alignment in statistical trans- lation. In COLING'96: The 16thInt. Conf. on Computa- tional Linguistics. Copenhagen, Denmark.
- Regina Barzilay and Noemie Elhadad, (2003). "Sentence Alignment for Monolingual Comparable Corpora", Proc. of EMNLP, 2003, Sapporo, Japan.
- Pascale Fung and Kathleen Mckeown. (1997). Finding terminology translations from non-parallel corpora. In The 5th Annual Workshop on Very Large Corpora. Pages 192-- 202, Hong Kong, Aug. 1997." Pascale Fung and Lo Yuen Yee. (1998). "An IR Approach for Translating New Words from Nonparallel, Comparable Texts". In Coling 1998
- Jianfeng Gao, Jian-Yun Nie, Endong Xun, Jian Zhang, Ming Zhou, Changning Huang. (2001). "Improving Query Translation for Cross-Language Information Retrieval using Statistical Models. In SIGIR'01 September 9-12,2001, New Orleans, Louisiana, USA.
- Gregory Grefenstette, editor. (1998). "Cross-Language Information Retrieval". Kluwer Academic Publishers, 1998.
- Hiroyuki Kaji. (2003). Word sense acquisition from bilingual comparable corpora, in Proceedings of the NAACL, 2003, Edmonton, Canada, pp 111-118.
- Genichiro Kikui. (1999). Resolving translation ambiguity using non-parallel bilingual corpora. In Proceedings of ACL99 Workshop on Unsupervised Learning in Natural Language Christopher D. Manning and Hinrich Schűtze. (1999). Foundations of Statistical Natural Language Processing. The MIT Press.
- Kenji Matsumoto and Hideki Tanaka. (2002) Automatic alignment of Japanese and English Newspaper articles using an MT system and a bilingual Company name dictionary. In LREC-2002, pages 480-484
- Dragos Stefan Munteanu, Daniel Marcu. (2002). Processing Comparable Corpora With Bilingual Suffix Trees. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002).
- Reinhard Rapp. (1995). Identifying word translations in non- parallel texts. Proceedings of the 33rd Meeting of the Association for Computational Linguistics. Cambridge, MA, 1995. 320-322
- Philip Resnik and Noah A. Smith. (2003) " The Web as a Parallel Corpus", Computational Linguistics 29(3), pp. 349- 380, September 2003. Masao Utiyama and Hitoshi Isahara. (2003). Reliable measures for aligning Japanese-English news articles and sentences. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Sapporo, Japan.
- Jean Veronis (editor). (2000). Parallel Text Processing: Alignment and Use of Translation Corpora. Dordrecht: Kluwer. ISBN 0-7923-6546-1. Aug 2000.
- Dekai Wu. (2000). Alignment. In Robert Dale, Hermann Moisl, and Harold Somers (editors), Handbook of Natural Language Processing. 415-458. New York: Marcel Dekker. ISBN 0- 8247-9000-6. Jul 2000.
- Bing Zhao, Stephan Vogel. (2002). Processing Comparable Corpora With Bilingual Suffix Trees, In Proceedings of the ICSLP 2002.
- Zhai, Lufeng, Pascale Fung, Richard Schwartz, Marine Carpuat and Dekai Wu. (2004). Using N-best list for Named Entity Recognition from Chinese Speec. In the Proceedings of the NAACL 2004 , to appear References
- Charniak, E., Knight, K. & Yamada K. (2003) Syntax- based Language Models for Machine Translation. Pro- ceedings of MT Summit IX 2003. New Orleans.
- Kingsbury, P. & Palmer, M. (2002) From Treebank to Propbank. In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC-2002), Las Palmas, Spain.
- Marcus, M., Kim, G., Marcinkiewicz, M.A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., & Schasberger, B. (1994) The Penn Treebank: Annotating predicate argu- ment structure. In ARPA Human Language Technol- ogy Workshop, pp 114-119, Plainsboro NJ.
- Ng, H.T., Wang, B., & Chan, Y.S. (2003). Exploiting Parallel Texts for Word Sense Disambiguation: An Empirical Study. In the Proceedings of the 41st Annual Meeting of the Association for Computational Linguis- tics (ACL-03). Sapporo, Japan.
- Palmer, M., Gildea, D. & Kingsbury, P. (submitted) The Proposition Bank: An Annotated Corpus of Semantic Roles, submitted to Computational Linguistics.
- Palmer, M., Babko-Malaya, M., Dang, H., Different Sense Granularities for Different Applications, 2 nd Workshop on Scalable Natural Language Understand- ing Systems, at HLT/NAACL-04, Boston, Mass, May 6, 2004.
- Xue, N. & Palmer, M. (2003) Annotating Propositions in the Penn Chinese Treebank. In Proceedings of the Sec- ond Sighan Workshop, in conjunction with ACL'03, Sapporo, Japan.
- Xue, N., Xia, F., Chiou, F. & Palmer, M. 2004. The Penn Chinese Treebank: Phrase Structure Annotation of a Large Corpus, Natural Language Engineering, 10(4):1-30.
- Yamada, K. & Knight, K. 2001. A Syntax-based Statisti- cal Translation Model. Proceedings of the Conference of the Association for Computational Linguistics, (ACL-2001). References
- Bentivogli, L., & Pianta, E. (2002). Opportunistic Semantic Tagging. In Proceedings of the Third International Conference on Language Resources and Evaluation (pp. 1401--1406). Las Palmas, Canary Islands -Spain, May 29-31, 2002.
- Fellbaum, C. (ed.) (1998). Wordnet: An Electronic Lexical Database. Cambridge (Mass): The MIT Press.
- Pianta, E., & Bentivogli, L. (2003). Translation as Annotation. In Proceedings of the AI*IA 2003 Workshop "Topics and Perspectives of Natural Language Processing in Italy"(pp. 40--48). Pisa, Italy, September 2003.
- Pianta, E., Bentivogli, L., & Girardi, C. (2002). MultiWordNet: Developing an aligned multilingual database. In Proceedings of the 1 st International Global WordNet Conference (pp. 293--302), Mysore, India, January 21-25, 2002.
- MultiWordNet, http://tcc.itc.it/projects/multiwordnet/multiwordnet.php References
- Al-Adhaileh, M.H., Tang, E.K. & Zaharin, Y. (2002). A Synchronization Structure of SSTC and Its Applications in Machine Translation. The COLING 2002 Post- Conference Workshop on Machine Translation in Asia, Taipei, Taiwan.
- Boitet, C. & Zaharin, Y. (1988). Representation trees and string-tree correspondences. In Proceedings of COLING-88, Budapest, pp.59-64.
- Grishman, R. (1994). Iterative Alignment of Syntactic Structures for a Bilingual Corpus. In Proceedings of Second Annual Workshop on Very Large Corpora (WVLC2), Kyoto, Japan, pp.57-68.
- Kaji, H., Kida, Y. & Morimoto, Y. (1992). Learning Translation Templates from Bilingual Text. In Proceedings of CoLING-92, Nantes, pp.672-678.
- Matsumoto, Y., Isimoto, H. & Utsuro, T. (1993). Structural Matching of Parallel Texts. 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, pp.23-30.
- Meyers, A., Yangarber, R. & Ralf, B. (1998). Deriving Transfer Rules from Dominance-Preserving Alignments. In Proceedings of Coling-ACL (1998), pp.843-847.
- Sato, S. & Nagao, M. (1990). Toward Memory-Based Translation. In Proceeding of Coling (1990), Vol.3, pp.247-252.
- Watanabe, H., Kurohashi, S. & Aramaki, E. (2000). Finding Structural Correspondences from Bilingual Parsed Corpus for Corpus-based Translation. In Proceedings of COLING-2000.
- Wong, F., Mao, Y.H., Dong, Q.F. & Qi, Y.H. (2001). Automatic Translation: Overcome the Barriers between European and Chinese Languages. In Proceedings (CD Version) of First International UNL Open Conference 2001, SuZhou China.
- Wu, D. (1995). Grammarless extraction of phrasal translation examples from parallel texts. In Proceedings of TMI-95, Sixth International Conference on Theoretical and Methodological Issues in Machine Translation, v2, Leuven Belgium, pp.354-372.
- Zhang, H.P. (2002). ICTCLAS. Institute of Computing Technology, Chinese Academy of Sciences: http://www.ict.ac.cn/freeware/003_ictclas.asp.
- References
- Brown, Peter F., John Cocke, Stephen A. Della Pietra, Vin- cent J. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin, 1990. A statisti- cal approach to machine translation. Computational Lin- gusitics, 16:79-85.
- Brown, Peter F., Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer, 1993. The math- ematics of machine translation: Parameter estimation. Computational linguistics, 19(2):263-311.
- Evert, Stefan. personal communication.
- Hiemstra, D., 1996. Using statistical Methods to cre- ate a bilingual Dictionary. Master's thesis, Universiteit Twente.
- Kermes, Hannah, 2003. Off-line (and On-line) Text Anal- ysis for Computational Lexicography. Ph.D. thesis, IMS, University of Stuttgart. Arbeitspapiere des Instituts für Maschinelle Sprachverarbeitung (AIMS), volume 9, number 3.
- Langenscheidts Handwörterbuch, 1991. Langenscheidts Handwörterbuch Deutsch / Englisch, Englisch / Deutsch.
- Auflage.
- Manning, Christopher D. and Hinrich Schütze, 1999. Foundations of statistical natural language processing. Cambridge, Massachusetts, London: MIT Press.
- Nießen, Sonja and Hermann Ney, 2000. Improving SMT quality with morpho-syntactic analysis. In Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000). Saarbruecken, Germany.
- Schmid, Helmut, 1994. Probabilistic part-of-speech tag- ging using decision trees. In International Conference on New Methods in Language Processing. Manchester, England.
- Schmid, Helmut, unpublished. The IMS Chunker. Unpub- lished manuscript.
- Schmid, Tanja, Anke Lüdeling, Bettina Säuberlich, Ulrich Heid, and Bernd Möbius, 2001. DeKo: Ein System zur Analyse komplexer Wörter. In GLDV -Jahrestagung 2001.
- Vogel, Stephan, Hermann Ney, and Christoph Tillmann, 1999. HMM-based word alignment in statistical trans- lation. In Proceedings of the International Conference on Computational LInguistics. Copenhagen, Denmark.
- Atwell E 1996 Comparative Evaluation of Grammatical Annotation Models in Sutcliffe R, Koch H-D, and McElligott A (editors), pages 25-46, Rodopi, Amsterdam.
- Cloeren, Jan (1993): "Towards a cross-linguistic tagset", Proceedings of the ACL Workshop on Very Large Corpora, Ohio State University, Columbus (OH), 1993. EAGLES (1996), WWW site for European Advisory Group on Language Engineering Standards, http://www.ilc.pi.cnr.it/EAGLES96/home.html
- Elliott, J. (2002b) Detecting Languageness: in proceedings of 6 th World Multi-Conference on Systemics, Cybernetics and Informatics (SCI2002), Orlando, Florida, USA: volume XI, pp 323-328.
- Elliott, J & Atwell, E. (2001) Visualisation of long distance grammatical collocation patterns in language in: IV2001: Proceedings of 5th International Conference on Information Visualisation, pp.297-302. 2001. ISBN 0-7695-195-
- Elliott, J. Atwell, E & Whyte, B. (2000). Language identification in unknown signals: in Proceeding of COLING'2000, pages 1021-1026, Association for Computational Linguistics (ACL) and Morgan Kaufmann Publishers, San Francisco. ISBN: 1-55860- 717-X (2 volumes).
- Elliott, J. & Atwell, E. (1999) Language in signals: the detection of generic species-independent intelligent language features in symbolic and oral communications in: Proceedings of the 50th International Astronautical Congress, International Astronautical Federation: IAA- 99-IAA.9.1.08
- Elliott, J & Elliott, D. (2003) The Human Language Chorus Corpus in: proceedings of CL2003, vol. 16 part 2 pp. 201-210 Archer D, Rayson P, Wilson A and McEnery T (eds.) Proceedings of CL2003: International Conference on Corpus Linguistics.
- Elliott, D., Hartley, A. & Atwell, E. (2003) Rationale for a multilingual aligned corpus for machine translation evaluation In: Archer D, Rayson P, Wilson A and McEnery T (eds.) Proceedings of CL2003: International Conference on Corpus Linguistics.
- Erjavec T, Ide N & Tufis D. 1998 Development And Assessment Of Common Lexical Specifications For Six Central And Eastern European Languages. LREC'98.
- Friederici, A. D., Opitz, B. & von Cramon, D.Y. (2000) Segregating Semantic and Syntactic Aspects of Processing in the Human Brain in: Cerebral Cortex (Journal) 10, pp698-705.
- Lesher G W, Moulton B J & Higginbotham D J.1999. Effects of ngram order and training text size on word prediction. Proceedings of the RESNA '99 Annual Conference, 52-54, Arlington, VA: RESNA Press.
- Piao Scott Songlin, 2000. Sentence and Word Alignment between Chinese and English (PhD Thesis), Lancaster University. (b): Piao, Scott Songlin ,2000. Chinese Corpus adapted from CEPC Corpus, Sheffield University, Sheffield UK.
- Rosetta Project (2002) [online] Available on World Wide Web: <http://www.rosettaproject.org/live>
- References
- Al-Onaizan, Yaser, Jan Curin, Michael Jahr, Kevin Knight, John Lafferty, Dan Melamed, Franz-Josef Och, David Purdy, Noah A. Smith, and David Yarowsky, 1999. Sta- tistical machine translation. Final report, JHU Work- shop.
- Collins, M., 1999. A statistical parser for Czech. In Pro- ceedings of ACL.
- Hwa, R., P. Resnik, and A. Weinberg, 2002. Breaking the resource bottleneck for multilingual parsing. In Proceed- ings of LREC.
- Klein, D. and C. Manning, 2002. A generative constituent- context model for improved grammar induction. In Pro- ceedings of ACL.
- Koehn, P., 2002. Europarl: A multilingual corpus for eval- uation of machine translation. Ms., University of South- ern California.
- Kuhn, Jonas, 2004. Experiments in parallel-text based grammar induction. Ms., The University of Texas at Austin.
- Nigam, K., A. K. McCallum, S. Thrun, and T. Mitchell, 2000. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2/3):103- 134.
- Och, F. J. and H. Ney, 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1):19-51.
- Schmid, Helmut, 1994. Probabilistic part-of-speech tag- ging using decision trees. In International Conference on New Methods in Language Processing. Manchester, UK. Schmid, Helmut, 2000. Lopar: Design and implemen- tation. Arbeitspapiere des Sonderforschungsbereiches 340, No. 149, IMS Stuttgart.
- van Zaanen, M., 2000. ABL: Alignment-based learning. In COLING 2000 -Proceedings of the 18th International Conference on Computational Linguistics.
- Yarowsky, D. and G. Ngai, 2001. Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora. In Proceedings of NAACL.
- References
- Adriaans, Pieter Willem, 1992. Language Learning from a Categorial Perspective. Ph.D. thesis, University of Am- sterdam, Amsterdam, the Netherlands.
- Archer, D., P. Rayson, A. Wilson, and T. McEnery (eds.), 2003. Proceedings of the Corpus Linguistics 2003 con- ference;
- Lancaster, UK.
- Atwell, E., G. Demetriou, J. Hughes, A. Schiffrin, C. Souter, and S. Wilcock, 2000. A comparative eval- uation of modern english corpus grammatical annota- tion schemes. ICAME Journal, International Computer Archive of Modern and medieval English, 24:7-23.
- Black, E., S. Abney, D. Flickinger, C. Gdaniec, R. Grish- man, P. Harrison, D. Hindle, R. Ingria, F. Jelinek, J. Kla- vans, M. Liberman, M. Marcus, S. Roukos, B. Santorini, and T. Strzalkowski, 1991. A procedure for quantita- tively comparing the syntactic coverage of English gram- mars. In Proceedings of a Workshop-Speech and Natu- ral Language.
- Clark, Alexander, 2001. Unsupervised induction of stochastic context-free grammars using distributional clustering. In Proceedings of the Workshop on Compu- tational Natural Language Learning held at the 39th An- nual Meeting of the ACL and the 10th Meeting of the EACL; Toulouse, France.
- Cook, Craig M., Azriel Rosenfeld, and Alan R. Aronson, 1976. Grammatical inference by hill climbing. Informa- tional Sciences, 10:59-80.
- de la Higuera, Colin, Pieter Adriaans, Menno van Zaa- nen, and Jose Oncina (eds.), 2003. Proceedings of the Workshop and Tutorial on Learning Context-Free Gram- mars held at the 14th European Conference on Ma- chine Learning (ECML) and the 7th European Confer- ence on Principles and Practice of Knowledge Discovery in Databases (PKDD);
- Dubrovnik, Croatia.
- Déjean, Hervé, 2000. ALLiS: a symbolic learning system for natural language learning. In Claire Cardie, Walter Daelemans, Claire Nédellec, and Erik Tjong Kim Sang (eds.), Proceedings of the Fourth Conference on Compu- tational Natural Language Learning and of the Second Learning Language in Logic Workshop; Lisbon, Portu- gal. Held in cooperation with ICGI-2000.
- Elliott, Debbie, Anthony Hartley, and Eric Atwell, 2003. Rationale for a multilingual aligned corpus for machine translation evaluation. In (Archer et al., 2003), pages 191-200.
- Grünwald, Peter, 1994. A minimum description length ap- proach to grammar inference. In G. Scheler, S. Wern- ter, and E. Riloff (eds.), Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language. Berlin Heidelberg, Germany: Springer-Verlag, pages 203-216.
- Hopcroft, J.E., R. Motwani, and J.D. Ullman, 2001. In- troduction to automata theory, languages, and compu- tation. Reading:MA, USA: Addison-Wesley Publishing Company.
- Klein, Dan and Christopher D. Manning, 2002. A gener- ative constituent-context model for improved grammar induction. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL);
- Moreno, A. and S. López, 1999. Developing a spanish tree bank. In Proceedings of the ATALA Treebank Work- shop (Journés ATALA, Corpus annotés pour la syntaxe);
- Paris, France.
- Nakamura, K. and M. Matsumoto, 2002. Incremental learning of context-free grammars. In Pieter Adriaans, Henning Fernau, and Menno van Zaanen (eds.), Gram- matical Inference: Algorithms and Applications (ICGI);
- Amsterdam, the Netherlands, volume 2482 of Lecture Notes in AI. Berlin Heidelberg, Germany: Springer- Verlag.
- Nakamura, Katsuhiko and Takashi Ishiwata, 2000. Synthe- sizing context free grammars from sample strings based on inductive CYK algorithm. In Arlindo L. Oliveira (ed.), Grammatical Inference: Algorithms and Applica- tions (ICGI);
- Osenova, Petya and Kiril Simov, 2003. The bulgarian hpsg treebank: Specialization of the annotation scheme. In Proceedings of The Second Workshop on Treebanks and Linguistic Theories; Växjö, Sweden.
- Roberts, Andrew and Eric Atwell, 2003. The use of cor- pora for automatic evaluation of grammar inference sys- tems. In (Archer et al., 2003), pages 657-661.
- Skut, Wojciech, Brigitte Krenn, Thorsten Brants, and Hans. Uszkoreit, 1997. An annotation scheme for free word or- der languages. In Proceedings of the Fifth Conference on Applied Natural Language Processing (ANLP-97);
- Stolcke, A., 2003. Boogie. ftp://ftp.icsi. berkeley.edu/pub/ai/stolcke/software/ boogie.shar.Z.
- Stolcke, Andreas and Stephen Omohundro, 1994. Inducing probabilistic grammars by bayesian model merging. In Proceedings of the Second International Conference on Grammar Inference and Applications; Alicante, Spain.
- van der Beek, Leonoor, Gosse Bouma, Robert Malouf, and Gertjan van Noord, 2001. The Alpino Dependency Tree- bank. In Mariët Theune, Anton Nijholt, and Hendri Hon- dorp (eds.), Computational Linguistics in the Nether- lands 2001; Enschede, the Netherlands. Amsterdam, the Netherlands: Rodopi.
- van Zaanen, Menno, 2002. Bootstrapping Structure into Language: Alignment-Based Learning. Ph.D. thesis, University of Leeds, Leeds, UK.
- van Zaanen, Menno and Pieter Adriaans, 2001. Alignment- Based Learning versus EMILE: A comparison. In Pro- ceedings of the Belgian-Dutch Conference on Artificial Intelligence (BNAIC);
- Wolff, J. Gerard, 1980. Language acquisition and the discovery of phrase structure. Language and Speech, 23(3):255-269.
- Xue, Nianwen, Fei Xia, Fu-Dong Chiou, and Martha Palmer, 2004. The penn chinese treebank: Phrase struc- ture annotation of a large corpus. Natural Language En- gineering, 10(4):1-30.