Papers by Teruko Mitamura

Code-switched (CS) data is ubiquitous in today's globalized world, but the dearth of annotated da... more Code-switched (CS) data is ubiquitous in today's globalized world, but the dearth of annotated datasets in code-switching poses a significant challenge for learning diverse tasks across different language pairs. Parameter-efficient prompt-tuning approaches conditioned on frozen language models have shown promise for transfer learning in limited-resource setups. In this paper, we propose a novel instancebased prompt composition technique, PRO-CS, for CS tasks that combine language and task knowledge. We compare our approach with prompt-tuning and fine-tuning for codeswitched tasks on 10 datasets across 4 language pairs. Our model outperforms the prompttuning approach by significant margins across all datasets and outperforms or remains at par with fine-tuning by using just 0.18% of total parameters. We also achieve competitive results when compared with the fine-tuned model in the low-resource cross-lingual and crosstask setting, indicating the effectiveness of our approach to incorporate new code-switched tasks. Our code and models will be available at

Identifying the salience (i.e. importance) of discourse units is an important task in language un... more Identifying the salience (i.e. importance) of discourse units is an important task in language understanding. While events play important roles in text documents, little research exists on analyzing their saliency status. This paper empirically studies the Event Salience task and proposes two salience detection models based on content similarities and discourse relations. The first is a feature based salience model that incorporates similarities among discourse units. The second is a neural model that captures more complex relations between discourse units. Tested on our new largescale event salience corpus, both methods significantly outperform the strong frequency baseline, while our neural model further improves the feature based one by a large margin. Our analyses demonstrate that our neural model captures interesting connections between salience and discourse unit relations (e.g., scripts and frame structures).

In this paper, we describe new consumer services based on speech processing technologies to suppo... more In this paper, we describe new consumer services based on speech processing technologies to support a new digital/mobile era of ubiquitous communication. First, we propose a compact and noise robust embedded speech recognition middleware implemented on microprocessors focused on sophisticated HMIs (Human Machine Interfaces) for car information systems (i.e. Car Telematics). Second, we report on a novel and sophisticated Dialog Management/Manager (DM) system based on VoiceXML (Voice eXtensible Markup Language), called CAMMIA (Conversational Agent for Multimedia Mobile Information Access). The proposed DM will handle two important issues: an automatic generation scheme for lexicons and grammars, and an effective combination/merger between Automatic Speech Recognition (ASR) and Natural Language Processing (NLP). The new DM scheme has been evaluated for an application of the Car Telematics service task after integration with ASR and a VoiceXML interpreter (VXI).

arXiv (Cornell University), Jan 14, 2021
Visual Question Answering (VQA) is of tremendous interest to the research community with importan... more Visual Question Answering (VQA) is of tremendous interest to the research community with important applications such as aiding visually impaired users and image-based search. In this work, we explore the use of scene graphs for solving the VQA task. We conduct experiments on the GQA dataset (Hudson and Manning, 2019b) which presents a challenging set of questions requiring counting, compositionality and advanced reasoning capability, and provides scene graphs for a large number of images. We adopt image + question architectures for use with scene graphs, evaluate various scene graph generation techniques for unseen images, propose a training curriculum to leverage human annotated and autogenerated scene graphs, and build late fusion architectures to learn from multiple image representations. We present a multi-faceted study into the use of scene graphs for VQA, making this work the first of its kind.

arXiv (Cornell University), Oct 5, 2020
In this paper, we investigate data augmentation for text generation, which we call GenAug. Text g... more In this paper, we investigate data augmentation for text generation, which we call GenAug. Text generation and language modeling are important tasks within natural language processing, and are especially challenging for lowdata regimes. We propose and evaluate various augmentation methods, including some that incorporate external knowledge, for finetuning GPT-2 on a subset of Yelp Reviews. We also examine the relationship between the amount of augmentation and the quality of the generated text. We utilize several metrics that evaluate important aspects of the generated text including its diversity and fluency. Our experiments demonstrate that insertion of character-level synthetic noise and keyword replacement with hypernyms are effective augmentation methods, and that the quality of generations improves to a peak at approximately three times the amount of original data.

arXiv (Cornell University), Aug 28, 2020
Answering questions related to art pieces (paintings) is a difficult task, as it implies the unde... more Answering questions related to art pieces (paintings) is a difficult task, as it implies the understanding of not only the visual information that is shown in the picture, but also the contextual knowledge that is acquired through the study of the history of art. In this work, we introduce our first attempt towards building a new dataset, coined AQUA (Art QUestion Answering). The question-answer (QA) pairs are automatically generated using state-of-the-art question generation methods based on paintings and comments provided in an existing art understanding dataset. The QA pairs are cleansed by crowdsourcing workers with respect to their grammatical correctness, answerability, and answers' correctness. Our dataset inherently consists of visual (paintingbased) and knowledge (comment-based) questions. We also present a two-branch model as baseline, where the visual and knowledge questions are handled independently. We extensively compare our baseline model against the state-of-the-art models for question answering, and we provide a comprehensive study about the challenges and potential future directions for visual question answering on art.

Computer Vision – ECCV 2020 Workshops, 2020
Answering questions related to art pieces (paintings) is a difficult task, as it implies the unde... more Answering questions related to art pieces (paintings) is a difficult task, as it implies the understanding of not only the visual information that is shown in the picture, but also the contextual knowledge that is acquired through the study of the history of art. In this work, we introduce our first attempt towards building a new dataset, coined AQUA (Art QUestion Answering). The question-answer (QA) pairs are automatically generated using state-of-the-art question generation methods based on paintings and comments provided in an existing art understanding dataset. The QA pairs are cleansed by crowdsourcing workers with respect to their grammatical correctness, answerability, and answers' correctness. Our dataset inherently consists of visual (paintingbased) and knowledge (comment-based) questions. We also present a two-branch model as baseline, where the visual and knowledge questions are handled independently. We extensively compare our baseline model against the state-of-the-art models for question answering, and we provide a comprehensive study about the challenges and potential future directions for visual question answering on art.

In this paper, we describe new consumer services based on speech processing technologies to suppo... more In this paper, we describe new consumer services based on speech processing technologies to support a new digital/mobile era of ubiquitous communication. First, we propose a compact and noise robust embedded speech recognition middleware implemented on microprocessors focused on sophisticated HMIs (Human Machine Interfaces) for car information systems (i.e. Car Telematics). Second, we report on a novel and sophisticated Dialog Management/Manager (DM) system based on VoiceXML (Voice eXtensible Markup Language), called CAMMIA (Conversational Agent for Multimedia Mobile Information Access). The proposed DM will handle two important issues: an automatic generation scheme for lexicons and grammars, and an effective combination/merger between Automatic Speech Recognition (ASR) and Natural Language Processing (NLP). The new DM scheme has been evaluated for an application of the Car Telematics service task after integration with ASR and a VoiceXML interpreter (VXI).

Lecture Notes in Computer Science, 2010
Self-explanation is an effective instructional strategy for improving problem solving in math and... more Self-explanation is an effective instructional strategy for improving problem solving in math and science domains. However, our previous studies, within the domain of second language grammar learning, show self-explanation to be no more effective than simple practice; perhaps the metalinguistic challenges involved in explaining using one's non-native language are hampering the potential benefits. An alternative strategy is tutoring using analogical comparisons, which reduces language difficulties while continuing to encourage feature focusing and deep processing. In this paper, we investigate adult English language learners learning the English article system (e.g. the difference between "a dog" and "the dog"). We present the results of a classroom-based study (N=99) that compares practice-only to two conditions that facilitate deep processing: self-explanation with practice and analogy with practice. Results show that students in all conditions benefit from the instruction. However, students in the practice-only condition complete the instruction in significantly less time leading to greater learning efficiency. Possible explanations regarding the differences between language and science learning are discussed.

KANT is an interlingual MT system for multi-lingual translation of technical documents, written u... more KANT is an interlingual MT system for multi-lingual translation of technical documents, written using a controlled vocabulary and grammar. KANT is comprised of a set of software modules (parser, interpreter, mapper, generator) which work together to produce target language translations from controlled source text. These modules are the result of long-term research and development in practical machine translation at the Center for Machine Translation (CMT) at Carnegie Mellon University, located in Pittsburgh, PA. The KANT software grew out of extensions and refinements to earlier systems developed at the CMT, which include the CMT-SEMSYN system, a collaborative effort with the University of Stuttgart in the domain of doctor patient communications (Japanese and English source languages to Japanese, English and German target languages), and the KBMT-89 system, a funded project with IBM's Tokyo Research Laboratory in the domain of PC installation manuals (Japanese and English to Japanese and English; cf. (Goodman and Nirenburg, 1991)).
We present an approach to pronominal anaphora resolution using KANT Controlled Language and the K... more We present an approach to pronominal anaphora resolution using KANT Controlled Language and the KANTOO multilingual MT system. Our algorithm is based on a robust, syntax-based approach that applies a set of restrictions and preferences to select the correct antecedent. We report a success rate of 93.3% on a training corpus with 286 anaphors, and 88.8% on held-out data with 144 anaphors. Our approach translates anaphors to Spanish with 97.9% accuracy and to German with 94.4% accuracy on held-out data.

Proceedings of the ... AAAI Conference on Artificial Intelligence, Jun 26, 2023
Event grounding aims at linking mention references in text corpora to events from a knowledge bas... more Event grounding aims at linking mention references in text corpora to events from a knowledge base (KB). Previous work on this task focused primarily on linking to a single KB event, thereby overlooking the hierarchical aspects of events. Events in documents are typically described at various levels of spatio-temporal granularity. These hierarchical relations are utilized in downstream tasks of narrative understanding and schema construction. In this work, we present an extension to the event grounding task that requires tackling hierarchical event structures from the KB. Our proposed task involves linking a mention reference to a set of event labels from a subevent hierarchy in the KB. We propose a retrieval methodology that leverages event hierarchy through an auxiliary hierarchical loss. On an automatically created multilingual dataset from Wikipedia and Wikidata, our experiments demonstrate the effectiveness of the hierarchical loss against retrieve and re-rank baselines. Furthermore, we demonstrate the systems' ability to aid hierarchical discovery among unseen events.
Workshop on Events: Definition, Detection, Coreference, and Representation, Jun 1, 2013
Theory and Applications of Categories, 2016
In this paper, we describe the second Event Nugget evaluation track for Knowledge Base Population... more In this paper, we describe the second Event Nugget evaluation track for Knowledge Base Population(KBP) at TAC 2016. This year we extend the Event Nugget task to a trilingual setting: English, Chinese and Spanish. All the Event Nugget sub-tasks now require end-toend processing from raw text. This task has attracted a lot of participation and intrigued interesting research problems. In this paper we try to provide an overview on the task definition, data annotation, evaluation and trending research methods. We further discuss issues related to the annotation process and the current restricted evaluation scope. With the lessons learned, we hope the next KBP Event Nugget task can incorporate more complex event relations on a larger scale.
Theory and Applications of Categories, 2015
This paper describes three TAC KBP Event Nugget tasks: (1) Event Nugget Detection, (2) Event Nugg... more This paper describes three TAC KBP Event Nugget tasks: (1) Event Nugget Detection, (2) Event Nugget Detection and Coreference, and (3) Event Nuggest Coreference. The evaluation corpus, prepared by LDC, consists of 202 documents from newswire and discussion forum. Participating systems detect event nuggets, event types and subtypes, and Realis values. For task 1, 38 runs were submitted by 14 teams; for task 2, 19 runs were submitted by 8 teams; for task 3, 16 runs were submitted by 6 teams. After the scoring algorithms and their results, we provide some analyses of these tasks.
Theory and Applications of Categories, 2015
We describe CMU LTI's participation in the KBP 2015 Event Track. We officially participated in Ta... more We describe CMU LTI's participation in the KBP 2015 Event Track. We officially participated in Task 1: Event Nugget Detection track and Task 3: Event Coreference track. Our system rank high in both tracks. We found that our combined system is competitive but have room to improve. In addition, we have conducted follow up experiments by creating a simple piplined system, and We found it competitive comparing to the official submissions.
Theory and Applications of Categories, 2017
After two successful years of Event Nugget evaluation in the TAC KBP workshop, the third Event Nu... more After two successful years of Event Nugget evaluation in the TAC KBP workshop, the third Event Nugget evaluation track for Knowledge Base Population(KBP) still attracts a lot of attention from the field. In addition to the traditional event nugget and coreference tasks, we introduce a new event sequencing task in English. The new task has brought more complex event relation reasoning to the current evaluations. In this paper we try to provide an overview on the task definition, data annotation, evaluation and trending research methods. We further discuss our efforts in creating the new event sequencing task and interesting research problems related to it.

arXiv (Cornell University), Sep 13, 2021
In this paper, we study the identity of textual events from different documents. While the comple... more In this paper, we study the identity of textual events from different documents. While the complex nature of event identity is previously studied , the case of events across documents is unclear. Prior work on cross-document event coreference has two main drawbacks. First, they restrict the annotations to a limited set of event types. Second, they insufficiently tackle the concept of event identity. Such annotation setup reduces the pool of event mentions and prevents one from considering the possibility of quasiidentity relations. We propose a dense annotation approach for cross-document event coreference, comprising a rich source of event mentions and a dense annotation effort between related document pairs. To this end, we design a new annotation workflow with careful quality control and an easy-to-use annotation interface. In addition to the links, we further collect overlapping event contexts, including time, location, and participants, to shed some light on the relation between identity decisions and context. We present an open-access dataset for cross-document event coreference, CDEC-WN, collected from English Wikinews and open-source our annotation toolkit to encourage further research on cross-document tasks. 1
Uploads
Papers by Teruko Mitamura