Semantic Processing of Compounds in Indian Languages

Amba Kulkarni

Outline

Semantic Processing of Compounds in Indian Languages

Abstract

Compounds occur very frequently in Indian Languages. There are no strict orthographic conventions for compounds in modern Indian Languages. In this paper, Sanskrit compounding system is examined thoroughly and the insight gained from the Sanskrit grammar is applied for the analysis of compounds in Hindi and Marathi. It is interesting to note that compounding in Hindi deviates from that in Sanskrit in two aspects. The data analysed for Hindi does not contain any instance of Bahuvrīhi (exo-centric) compound. Second, Hindi data presents many cases where quite a lot of compounds require a verb as well as vibhakti(a case marker) for its paraphrasing. Compounds requiring a verb for paraphrasing are termed as madhyama-pada-lopī in Sanskrit, and they are found to be rare in Sanskrit.

Figures (2)

Languages at IIT Bombay have developed a tool for automatic extraction of Multi Word Expressions from a corpus that uses minimum linguistic tools such as morphological analysers and POS taggers. The candidates were ranked using Point-wise Mutual Information (PMD method. Marathi corpus from Tourism domain consisting of 15,925 sentences with 0.325M words was chosen for the experiment. The Multi Word Expression extraction tool gave an initial set of Multi Word Expressions. From these Multi Word Expressions for Marathi, noun compounds were extracted manually, and a study was undertaken to identify the relations between the components. Table 2 lists the identified relations with examples from Marathi.

Table 3: Paraphrasing Hindi compounds with vibhakti alone

References (21)

Butnariu, C. and Veale, T. (2008). A concept-centered approach to noun compound inter- pretation. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING-08), Manchester, UK.
Finin, T. W. (1980). The semantic interpretation of nominal compounds. In In the Proceedings of the 1st Conference on Artificial Intelligence (AAAI-80).
Girju, R., Badulescu, A., and Moldovan, D. (2003). Learning semantic constraints for the automatic discovery of part-whole relations. In In the proceedings of the Human Language Technology Conference (HLT).
Girju, R., Nakov, P., Nastase, V., Szpakowicz, S., Turney, P., and Yuret, D. (2007). Classification of semantic relations between nominals. In Proceedings of The Semantic Evaluation Workshop (SemEval) in Conjunction with ACL, Prague.
Huddleston, R. and Pullum, G. K. (2002). The Cambridge Grammar of the English Language. Cambridge University Press.
Huet, G. (2009). Formal structure of Sanskrit text: Requirements analysis for a mechanical Sanskrit processor. In Huet, G., Kulkarni, A., and Scharf, P., editors, Sanskrit Computational Linguistics 1 & 2. Springer-Verlag LNAI 5402.
Kim, S. N. and Baldwin, T. (2006). Interpreting semantic relation in noun compound via verb semantics. In Proceedings of ACL/COLING-2006.
Kulkarni, A. and Kumar, A. (2011). Statistical constituency parser for Sanskrit compounds. In Proceedings of ICON 2011. Macmillan Advanced Research Series, Macmillan Publishers India Ltd. Kulkarni, M., Dangarikar, C., Kulkarni, I., Nanda, A., and Bhattacharyya, P. (2010). Introducing sanskrit wordnet. In Pushpak Bhattacharyya, C. F. and Vossen, P., editors, Principles, Construc- tion and Application of Multilingual Wordnets, Proceedings of the Global Wordnet Conference, 2010. Narosa Publishing House, New Delhi.
Kumar, A. (2012). An automatic Sanskrit Compound Processing. PhD thesis, University of Hyderabad, Hyderabad.
Kumar, A., Mittal, V., and Kulkarni, A. (2010). Sanskrit compound processor. In Jha, G. N., editor, Proceedings of the International Sanskrit Computational Linguistics Symposium. Springer- Verlag LNAI 6465.
Kumar, A., SheebaSudheer, V., and Kulkarni, A. (2009). Sanskrit compound paraphrase generator. In Proceedings of ICON 2009.
Lauer, M. (1995). Designing Statistical Language Learners: Experiments on Noun compounds. PhD thesis, Macquarie University, Australia.
Mittal, V. (2010). Automatic sanskrit segmentizer using finite state transducers. In Proceedings of the ACL 2010 Student Research Workshop, pages 85-90, Uppsala, Sweden. Association for Computational Linguistics.
Nair, S. and Kulkarni, A. (2010). The knowledge structure in amarakośa. In Jha, G. N., editor, Proceedings of the International Sanskrit Computational Linguistics Symposium. Springer-Verlag LNAI 6465.
Nakov, P. (2008). Noun compound interpretation using paraphrasing verbs: Feasibility study. In Proceeding of 13th International Conference on Artificial Intelligence: Methodology, Systems and Applications (AIMSA-08), Varna, Bulgaria.
Nastase, V. and Szpakowicz, S. (2009). The same semantic relations link structurally different realizations of concept. In Linguistic Issues in Language.
Paul, S., Mathur, P., and Kishore, S. (2010). Syntactic construct: An aid for translating english nominal compound into hindi. In Proceedings of the NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics, Los Angeles, California.
Prince, A. and Smolensky, P. (1993). Optimality theory: Constraint interaction in generative grammar. Technical report, Rutgers University, Piscataway.
Séaghdha, D. O. and Copestake, A. (2007). Co-occurrence contexts for noun-compound interpretation. In Proceedings of the ACL-07 Workshop on a Broader Perspective on Multiword Expression (MWE-07), Prague, Czech Republic.
Shastri, G. (2006). Patañjali's Vyākaran . a Mahābhās . ya with Kaiyat . a's Pradīpa and Nāgojibhat . t . a's Uddyota with the Notes by Guruprasad Shastri (Adhyāya 2). Rashtriya Sanskrit Sansthan, New Delhi (reprint of 1938 edition).
Vanderwende, L. (1995). The Analysis of Noun Sequences Using semantic Information Extracted from on-line Dictionaries. PhD thesis, Georgetown University.

Semantic Processing of Compounds in Indian Languages

Sign up for access to the world's latest research

Abstract

Related papers

References (21)

Related papers