Papers by Chee Wee (Ben) Leong
We investigate the effectiveness of semantic generalizations/classifications for capturing the re... more We investigate the effectiveness of semantic generalizations/classifications for capturing the regularities of the behavior of verbs in terms of their metaphoric-ity. Starting from orthographic word unigrams, we experiment with various ways of defining semantic classes for verbs (grammatical, resource-based, dis-tributional) and measure the effectiveness of these classes for classifying all verbs in a running text as metaphor or non metaphor.

Proceedings of the 21st Acm International Conference on Information and Knowledge Management, Oct 29, 2012
Fact verification has become an important task due to the increased popularity of blogs, discussi... more Fact verification has become an important task due to the increased popularity of blogs, discussion groups, and social sites, as well as of encyclopedic collections that aggregate content from many contributors. We investigate the task of automatically retrieving supporting evidence from the Web for factual statements. Using Wikipedia as a starting point, we derive a large corpus of statements paired with supporting Web documents, which we employ further as training and test data under the assumption that the contributed references to Wikipedia represent some of the most relevant Web documents for supporting the corresponding statements. Given a factual statement, the proposed system first transforms it into a set of semantic terms by using machine learning techniques. It then employs a quasi-random strategy for selecting subsets of the semantic terms according to topical likelihood. These semantic terms are used to construct queries for retrieving Web documents via a Web search API. Finally, the retrieved documents are aggregated and re-ranked by employing additional measures of their suitability to support the factual statement. To gauge the quality of the retrieved evidence, we conduct a user study through Amazon Mechanical Turk, which shows that our system is capable of retrieving supporting Web documents comparable to those chosen by Wikipedia contributors.
Page 1. UNT at ImageCLEF 2011: Relevance Models and Salient Semantic Analysis for Image Retrieval... more Page 1. UNT at ImageCLEF 2011: Relevance Models and Salient Semantic Analysis for Image Retrieval Miguel E. Ruiz1, Chee Wee Leong2 and Samer Hassan12 University of North Texas, 1 Department of Library and Information ...
The described implementations relate to processing of elec tronic data. One implementation is man... more The described implementations relate to processing of elec tronic data. One implementation is manifested as a technique that can include receiving an input statement that includes a plurality of terms. The technique can also include providing, in response to the input statement, ranked supporting docu ments that support the input statement or ranked contradict ing results that contradict the input statement.
Leong, Chee Wee. Modeling Synergistic Relationships between Words and Images.

2015 International Conference on Affective Computing and Intelligent Interaction (ACII), 2015
Public speaking, an important type of oral communication, is critical to success in both learning... more Public speaking, an important type of oral communication, is critical to success in both learning and career development. However, there is a lack of tools to efficiently and economically evaluate presenters' verbal and nonverbal behaviors. The recent advancements in automated scoring and multimodal sensing technologies may address this issue. We report a study on the development of an automated scoring model for public speaking performance using multimodal cues. A multimodal presentation corpus containing 14 subjects' 56 presentations has been recorded using a Microsoft Kinect depth camera. Task design, rubric development, and human rating were conducted according to standards in educational assessment. A rich set of multimodal features has been extracted from head poses, eye gazes, facial expressions, motion traces, speech signal, and transcripts. The model building experiment shows that jointly using both lexical/speech and visual features achieves more accurate scoring, which suggests the feasibility of using multimodal technologies in the assessment of public speaking skills.

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction - ICMI '15, 2015
We analyze how fusing features obtained from different multimodal data streams such as speech, fa... more We analyze how fusing features obtained from different multimodal data streams such as speech, face, body movement and emotion tracks can be applied to the scoring of multimodal presentations. We compute both time-aggregated and time-series based features from these data streamsthe former being statistical functionals and other cumulative features computed over the entire time series, while the latter, dubbed histograms of cooccurrences, capture how different prototypical body posture or facial configurations co-occur within different time-lags of each other over the evolution of the multimodal, multivariate time series. We examine the relative utility of these features, along with curated speech stream features in predicting human-rated scores of multiple aspects of presentation proficiency. We find that different modalities are useful in predicting different aspects, even outperforming a naive human inter-rater agreement baseline for a subset of the aspects analyzed.
ETS Research Report Series, 2015
ABSTRACT
Proceedings of the Third Workshop on Metaphor in NLP (at NAACL 2015), Jun 5, 2015
We present a supervised machine learning system
for word-level classification of all content
wo... more We present a supervised machine learning system
for word-level classification of all content
words in a running text as being metaphorical
or non-metaphorical. The system provides
a substantial improvement upon a previously
published baseline, using re-weighting of the
training examples and using features derived
from a concreteness database. We observe that
while the first manipulation was very effective,
the second was only slightly so. Possible
reasons for these observations are discussed.
Proceedings of the Second Workshop on Metaphor in NLP, at ACL2014 Conference, Jun 2014
"Current approaches to supervised learning of metaphor tend to use sophisticated features and res... more "Current approaches to supervised learning of metaphor tend to use sophisticated features and restrict their attention to constructions and contexts where these features apply. In this paper, we describe the development of a supervised learning system to classify all content words in a running
text as either being used metaphorically or not. We start by examining the performance of a simple unigram baseline that achieves surprisingly good results for some of the datasets. We then show how the recall of the system can be improved over this strong baseline."

Proceedings of the 2014 ACM workshop on Multimodal Learning Analytics Workshop and Grand Challenge - MLA '14, 2014
The ability of making presentation slides and delivering them effectively to convey information t... more The ability of making presentation slides and delivering them effectively to convey information to the audience is a task of increasing importance, particularly in the pursuit of both academic and professional career success. We envision that multimodal sensing and machine learning techniques can be employed to evaluate, and potentially help to improve the quality of the content and delivery of public presentations. To this end, we report a study using the Oral Presentation Quality Corpus provided by the 2014 Multimodal Learning Analytics (MLA) Grand Challenge. A set of multimodal features were extracted from slides, speech, posture and hand gestures, as well as head poses. We also examined the dimensionality of the human scores, which could be concisely represented by two Principal Component (PC) scores, comp1 for delivery skills and comp2 for slides quality. Several machine learning experiments were performed to predict the two PC scores using multimodal features. Our experiments suggest that multimodal cues can predict human scores on presentation tasks, and a scoring model comprising both verbal and visual features can outperform that using just a single modality.

Proceedings of the 2014 workshop on Emotion Recognition in the Wild Challenge and Workshop - ERM4HCI '14, 2014
Recently online video interviews have been increasingly used in the employment process. Though se... more Recently online video interviews have been increasingly used in the employment process. Though several automatic techniques have emerged to analyze the interview videos, so far, only simple emotion analyses have been attempted, e.g. counting the number of smiles on the face of an interviewee. In this paper, we report our initial study of applying advanced multimodal emotion detection approaches for the purpose of measuring performance on an interview task that elicits emotion. On an acted interview corpus we created, we performed our evaluations using a Speech-based Emotion Recognition (SER) system, as well as an off-the-shelf facial expression analysis toolkit (FACET). While the results obtained suggest the promise of using FACET for emotion detection, the benefits of employing the SER are somewhat limited.
Proceedings of the 16th International Conference on Multimodal Interaction - ICMI '14, 2014
Traditional assessments of public speaking skills rely on human scoring. We report an initial stu... more Traditional assessments of public speaking skills rely on human scoring. We report an initial study on the development of an automated scoring model for public speaking performances using multimodal technologies. Task design, rubric development, and human rating were conducted according to standards in educational assessment. An initial corpus of 17 speakers with 4 speaking tasks was collected using audio, video, and 3D motion capturing devices. A scoring model based on basic features in the speech content, speech delivery, and hand, body, and head movements significantly predicts human rating, suggesting the feasibility of using multimodal technologies in the assessment of public speaking skills.

Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12, 2012
Fact verification has become an important task due to the increased popularity of blogs, discussi... more Fact verification has become an important task due to the increased popularity of blogs, discussion groups, and social sites, as well as of encyclopedic collections that aggregate content from many contributors. We investigate the task of automatically retrieving supporting evidence from the Web for factual statements. Using Wikipedia as a starting point, we derive a large corpus of statements paired with supporting Web documents, which we employ further as training and test data under the assumption that the contributed references to Wikipedia represent some of the most relevant Web documents for supporting the corresponding statements. Given a factual statement, the proposed system first transforms it into a set of semantic terms by using machine learning techniques. It then employs a quasi-random strategy for selecting subsets of the semantic terms according to topical likelihood. These semantic terms are used to construct queries for retrieving Web documents via a Web search API. Finally, the retrieved documents are aggregated and re-ranked by employing additional measures of their suitability to support the factual statement. To gauge the quality of the retrieved evidence, we conduct a user study through Amazon Mechanical Turk, which shows that our system is capable of retrieving supporting Web documents comparable to those chosen by Wikipedia contributors.
Proceedings of IJCNLP, 2011
Traditional approaches to semantic relatedness are often restricted to text-based methods, which ... more Traditional approaches to semantic relatedness are often restricted to text-based methods, which typically disregard other multimodal knowledge sources. In this paper, we propose a novel image-based metric to estimate the relatedness of words, and demonstrate the promise of this method through comparative evaluations on three standard datasets. We also show that a hybrid image-text approach can lead to improvements in word relatedness, confirming the applicability of visual cues as a possible orthogonal information source.
Machine translation, Jan 1, 2008
This paper evaluates the hypothesis that pictorial representations can be used to effectively con... more This paper evaluates the hypothesis that pictorial representations can be used to effectively convey simple sentences across language barriers. Comparative evaluations show that a considerable amount of understanding can be achieved using visual descriptions of information, with evaluation figures within a comparable range of those obtained with linguistic representations produced by an automatic machine translation system.

Fifth International Conference on Information …, Jan 1, 2008
In natural languages, variability of semantic expression refers to the situation where the same m... more In natural languages, variability of semantic expression refers to the situation where the same meaning can be inferred from different words or texts. Given that many natural language processing tasks nowadays (e.g. question answering, information retrieval, document summarization) often model this variability by requiring a specific target meaning to be inferred from different text variants, it is helpful to capture text similarity in a directional manner to serve such inference needs. In this paper, we show how Wikipedia can be used as a semantic resource to build a directional inferential similarity metric between words, and subsequently, texts. Through experiments, we show that our Wikipediabased metric performs significantly better when applied to a standard evaluation dataset, with a reduction in error rate of 16.1% over the random metric baseline.
Proceedings of the 23rd …, Jan 1, 2010
This paper introduces several extractive approaches for automatic image tagging, relying exclusiv... more This paper introduces several extractive approaches for automatic image tagging, relying exclusively on information mined from texts. Through evaluations on two datasets, we show that our methods exceed competitive baselines by a large margin, and compare favorably with the stateof-the-art that uses both textual and image features.
Proceedings of the Third Linguistic …, Jan 1, 2009
In this paper, we report our work on automatic image annotation by combining several textual feat... more In this paper, we report our work on automatic image annotation by combining several textual features drawn from the text surrounding the image. Evaluation of our system is performed on a dataset of images and texts collected from the web. We report our findings through comparative evaluation with two gold standard collections of manual annotations on the same dataset.
Uploads
Papers by Chee Wee (Ben) Leong
for word-level classification of all content
words in a running text as being metaphorical
or non-metaphorical. The system provides
a substantial improvement upon a previously
published baseline, using re-weighting of the
training examples and using features derived
from a concreteness database. We observe that
while the first manipulation was very effective,
the second was only slightly so. Possible
reasons for these observations are discussed.
text as either being used metaphorically or not. We start by examining the performance of a simple unigram baseline that achieves surprisingly good results for some of the datasets. We then show how the recall of the system can be improved over this strong baseline."