Skip to main content

Chee Wee (Ben) Leong

Educational Testing Service, Speech & Natural Language Processing Group, Research Engineer

Followers

50

Following

17

Co-authors

4

Public Views

Supervisors: Rada Mihalcea

less

InterestsView All (9)

Uploads

Papers by Chee Wee (Ben) Leong

Semantic classifications for detection of verb metaphors

by Chee Wee (Ben) Leong, Dario Gutierrez, and Michael Flor

We investigate the effectiveness of semantic generalizations/classifications for capturing the re... more We investigate the effectiveness of semantic generalizations/classifications for capturing the regularities of the behavior of verbs in terms of their metaphoric-ity. Starting from orthographic word unigrams, we experiment with various ways of defining semantic classes for verbs (grammatical, resource-based, dis-tributional) and measure the effectiveness of these classes for classifying all verbs in a running text as metaphor or non metaphor.

Supporting factual statements with evidence from the web

by Silviu Cucerzan and Chee Wee (Ben) Leong

Proceedings of the 21st Acm International Conference on Information and Knowledge Management, Oct 29, 2012

Fact verification has become an important task due to the increased popularity of blogs, discussi... more Fact verification has become an important task due to the increased popularity of blogs, discussion groups, and social sites, as well as of encyclopedic collections that aggregate content from many contributors. We investigate the task of automatically retrieving supporting evidence from the Web for factual statements. Using Wikipedia as a starting point, we derive a large corpus of statements paired with supporting Web documents, which we employ further as training and test data under the assumption that the contributed references to Wikipedia represent some of the most relevant Web documents for supporting the corresponding statements. Given a factual statement, the proposed system first transforms it into a set of semantic terms by using machine learning techniques. It then employs a quasi-random strategy for selecting subsets of the semantic terms according to topical likelihood. These semantic terms are used to construct queries for retrieving Web documents via a Web search API. Finally, the retrieved documents are aggregated and re-ranked by employing additional measures of their suitability to support the factual statement. To gauge the quality of the retrieved evidence, we conduct a user study through Amazon Mechanical Turk, which shows that our system is capable of retrieving supporting Web documents comparable to those chosen by Wikipedia contributors.

UNT at ImageCLEF 2011: Relevance Models and Salient Semantic Analysis for Image Retrieval

Page 1. UNT at ImageCLEF 2011: Relevance Models and Salient Semantic Analysis for Image Retrieval... more

Fact verification engine

The described implementations relate to processing of elec tronic data. One implementation is man... more The described implementations relate to processing of elec tronic data. One implementation is manifested as a technique that can include receiving an input statement that includes a plurality of terms. The technique can also include providing, in response to the input statement, ranked supporting docu ments that support the input statement or ranked contradict ing results that contradict the input statement.

Systems and Methods for Providing a Multi-Modal Evaluation of a Presentation

Modeling synergistic relationships between words and images

Leong, Chee Wee. Modeling Synergistic Relationships between Words and Images.

Utilizing multimodal cues to automatically evaluate public speaking performance

2015 International Conference on Affective Computing and Intelligent Interaction (ACII), 2015

Public speaking, an important type of oral communication, is critical to success in both learning... more Public speaking, an important type of oral communication, is critical to success in both learning and career development. However, there is a lack of tools to efficiently and economically evaluate presenters' verbal and nonverbal behaviors. The recent advancements in automated scoring and multimodal sensing technologies may address this issue. We report a study on the development of an automated scoring model for public speaking performance using multimodal cues. A multimodal presentation corpus containing 14 subjects' 56 presentations has been recorded using a Microsoft Kinect depth camera. Task design, rubric development, and human rating were conducted according to standards in educational assessment. A rich set of multimodal features has been extracted from head poses, eye gazes, facial expressions, motion traces, speech signal, and transcripts. The model building experiment shows that jointly using both lexical/speech and visual features achieves more accurate scoring, which suggests the feasibility of using multimodal technologies in the assessment of public speaking skills.

Evaluating Speech, Face, Emotion and Body Movement Time-series Features for Automated Multimodal Presentation Scoring

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction - ICMI '15, 2015

We analyze how fusing features obtained from different multimodal data streams such as speech, fa... more We analyze how fusing features obtained from different multimodal data streams such as speech, face, body movement and emotion tracks can be applied to the scoring of multimodal presentations. We compute both time-aggregated and time-series based features from these data streamsthe former being statistical functionals and other cumulative features computed over the entire time series, while the latter, dubbed histograms of cooccurrences, capture how different prototypical body posture or facial configurations co-occur within different time-lags of each other over the evolution of the multimodal, multivariate time series. We examine the relative utility of these features, along with curated speech stream features in predicting human-rated scores of multiple aspects of presentation proficiency. We find that different modalities are useful in predicting different aspects, even outperforming a naive human inter-rater agreement baseline for a subset of the aspects analyzed.

Automated Scoring of Speaking Tasks in the Test of English-for-Teaching ( TEFT ™)

ETS Research Report Series, 2015

ABSTRACT

SupervisedWord-Level Metaphor Detection: Experiments with Concreteness and Reweighting of Examples

by Michael Flor and Chee Wee (Ben) Leong

Proceedings of the Third Workshop on Metaphor in NLP (at NAACL 2015), Jun 5, 2015

We present a supervised machine learning system for word-level classification of all content wo... more We present a supervised machine learning system
for word-level classification of all content
words in a running text as being metaphorical
or non-metaphorical. The system provides
a substantial improvement upon a previously
published baseline, using re-weighting of the
training examples and using features derived
from a concreteness database. We observe that
while the first manipulation was very effective,
the second was only slightly so. Possible
reasons for these observations are discussed.

Different Texts, Same Metaphors: Unigrams and Beyond

by Michael Flor, Chee Wee (Ben) Leong, and Beigman Klebanov

Proceedings of the Second Workshop on Metaphor in NLP, at ACL2014 Conference, Jun 2014

"Current approaches to supervised learning of metaphor tend to use sophisticated features and res... more "Current approaches to supervised learning of metaphor tend to use sophisticated features and restrict their attention to constructions and contexts where these features apply. In this paper, we describe the development of a supervised learning system to classify all content words in a running
text as either being used metaphorically or not. We start by examining the performance of a simple unigram baseline that achieves surprisingly good results for some of the datasets. We then show how the recall of the system can be improved over this strong baseline."

Using Multimodal Cues to Analyze MLA'14 Oral Presentation Quality Corpus

Proceedings of the 2014 ACM workshop on Multimodal Learning Analytics Workshop and Grand Challenge - MLA '14, 2014

The ability of making presentation slides and delivering them effectively to convey information t... more The ability of making presentation slides and delivering them effectively to convey information to the audience is a task of increasing importance, particularly in the pursuit of both academic and professional career success. We envision that multimodal sensing and machine learning techniques can be employed to evaluate, and potentially help to improve the quality of the content and delivery of public presentations. To this end, we report a study using the Oral Presentation Quality Corpus provided by the 2014 Multimodal Learning Analytics (MLA) Grand Challenge. A set of multimodal features were extracted from slides, speech, posture and hand gestures, as well as head poses. We also examined the dimensionality of the human scores, which could be concisely represented by two Principal Component (PC) scores, comp1 for delivery skills and comp2 for slides quality. Several machine learning experiments were performed to predict the two PC scores using multimodal features. Our experiments suggest that multimodal cues can predict human scores on presentation tasks, and a scoring model comprising both verbal and visual features can outperform that using just a single modality.

An Initial Analysis of Structured Video Interviews by Using Multimodal Emotion Detection

Proceedings of the 2014 workshop on Emotion Recognition in the Wild Challenge and Workshop - ERM4HCI '14, 2014

Recently online video interviews have been increasingly used in the employment process. Though se... more Recently online video interviews have been increasingly used in the employment process. Though several automatic techniques have emerged to analyze the interview videos, so far, only simple emotion analyses have been attempted, e.g. counting the number of smiles on the face of an interviewee. In this paper, we report our initial study of applying advanced multimodal emotion detection approaches for the purpose of measuring performance on an interview task that elicits emotion. On an acted interview corpus we created, we performed our evaluations using a Speech-based Emotion Recognition (SER) system, as well as an off-the-shelf facial expression analysis toolkit (FACET). While the results obtained suggest the promise of using FACET for emotion detection, the benefits of employing the SER are somewhat limited.

Towards Automated Assessment of Public Speaking Skills Using Multimodal Cues

Proceedings of the 16th International Conference on Multimodal Interaction - ICMI '14, 2014

Traditional assessments of public speaking skills rely on human scoring. We report an initial stu... more Traditional assessments of public speaking skills rely on human scoring. We report an initial study on the development of an automated scoring model for public speaking performances using multimodal technologies. Task design, rubric development, and human rating were conducted according to standards in educational assessment. An initial corpus of 17 speakers with 4 speaking tasks was collected using audio, video, and 3D motion capturing devices. A scoring model based on basic features in the speech content, speech delivery, and hand, body, and head movements significantly predicts human rating, suggesting the feasibility of using multimodal technologies in the assessment of public speaking skills.

Supporting factual statements with evidence from the web

Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12, 2012

Fact verification has become an important task due to the increased popularity of blogs, discussi... more Fact verification has become an important task due to the increased popularity of blogs, discussion groups, and social sites, as well as of encyclopedic collections that aggregate content from many contributors. We investigate the task of automatically retrieving supporting evidence from the Web for factual statements. Using Wikipedia as a starting point, we derive a large corpus of statements paired with supporting Web documents, which we employ further as training and test data under the assumption that the contributed references to Wikipedia represent some of the most relevant Web documents for supporting the corresponding statements. Given a factual statement, the proposed system first transforms it into a set of semantic terms by using machine learning techniques. It then employs a quasi-random strategy for selecting subsets of the semantic terms according to topical likelihood. These semantic terms are used to construct queries for retrieving Web documents via a Web search API. Finally, the retrieved documents are aggregated and re-ranked by employing additional measures of their suitability to support the factual statement. To gauge the quality of the retrieved evidence, we conduct a user study through Amazon Mechanical Turk, which shows that our system is capable of retrieving supporting Web documents comparable to those chosen by Wikipedia contributors.

Going beyond text: A hybrid image-text approach for measuring word relatedness

Proceedings of IJCNLP, 2011

Traditional approaches to semantic relatedness are often restricted to text-based methods, which ... more Traditional approaches to semantic relatedness are often restricted to text-based methods, which typically disregard other multimodal knowledge sources. In this paper, we propose a novel image-based metric to estimate the relatedness of words, and demonstrate the promise of this method through comparative evaluations on three standard datasets. We also show that a hybrid image-text approach can lead to improvements in word relatedness, confirming the applicability of visual cues as a possible orthogonal information source.

Toward communicating simple sentences using pictorial representations

Machine translation, Jan 1, 2008

This paper evaluates the hypothesis that pictorial representations can be used to effectively con... more This paper evaluates the hypothesis that pictorial representations can be used to effectively convey simple sentences across language barriers. Comparative evaluations show that a considerable amount of understanding can be achieved using visual descriptions of information, with evaluation figures within a comparable range of those obtained with linguistic representations produced by an automatic machine translation system.

Exploiting Wikipedia for directional inferential text similarity

Fifth International Conference on Information …, Jan 1, 2008

In natural languages, variability of semantic expression refers to the situation where the same m... more In natural languages, variability of semantic expression refers to the situation where the same meaning can be inferred from different words or texts. Given that many natural language processing tasks nowadays (e.g. question answering, information retrieval, document summarization) often model this variability by requiring a specific target meaning to be inferred from different text variants, it is helpful to capture text similarity in a directional manner to serve such inference needs. In this paper, we show how Wikipedia can be used as a semantic resource to build a directional inferential similarity metric between words, and subsequently, texts. Through experiments, we show that our Wikipediabased metric performs significantly better when applied to a standard evaluation dataset, with a reduction in error rate of 16.1% over the random metric baseline.

Text mining for automatic image tagging

Proceedings of the 23rd …, Jan 1, 2010

This paper introduces several extractive approaches for automatic image tagging, relying exclusiv... more This paper introduces several extractive approaches for automatic image tagging, relying exclusively on information mined from texts. Through evaluations on two datasets, we show that our methods exceed competitive baselines by a large margin, and compare favorably with the stateof-the-art that uses both textual and image features.

Explorations in automatic image annotation using textual features

Proceedings of the Third Linguistic …, Jan 1, 2009

In this paper, we report our work on automatic image annotation by combining several textual feat... more In this paper, we report our work on automatic image annotation by combining several textual features drawn from the text surrounding the image. Evaluation of our system is performed on a dataset of images and texts collected from the web. We report our findings through comparative evaluation with two gold standard collections of manual annotations on the same dataset.