Papers by Michael Witbrock

Proceedings of the 31st ACM International Conference on Multimedia
Many software packages and toolkits have been developed for machine learning, in particular for n... more Many software packages and toolkits have been developed for machine learning, in particular for natural language processing and automatic speech recognition. However, there are few software packages designed for emotion recognition. Emotion datasets have diverse structures and annotations, and feature extractors often have different interfaces, which requires writing code specific to each interface. To improve the standardisation and reproducibility of emotion recognition research, we present the Emotion Recognition ToolKit (ERTK), a Python library for emotion recognition. ERTK comprises processing scripts for emotion datasets, standard interfaces to feature extractors, and a framework for defining experiments with declarative configuration files. ERTK is modular and extensible, which allows for easily incorporating additional models and processors. The current version of ERTK focuses on emotional speech, however, the library is modular and can be easily extended to other modalities, which we plan for future releases. ERTK is open-source and available from GitHub: https://github.com/Strong-AI-Lab/emotion. CCS CONCEPTS • Computing methodologies → Learning settings; • Software and its engineering → Software libraries and repositories; • General and reference → Experimentation.

The Florida AI Research Society, 2007
Symbolic reasoning is a well understood and effective approach to handling reasoning over formall... more Symbolic reasoning is a well understood and effective approach to handling reasoning over formally represented knowledge; however, simple symbolic inference systems necessarily slow as complexity and ground facts grow. As automated approaches to ontology-building become more prevalent and sophisticated, knowledge base systems become larger and more complex, necessitating techniques for faster inference. This work uses reinforcement learning, a statistical machine learning technique, to learn control laws which guide inference. We implement our learning method in ResearchCyc, a very large knowledge base with millions of assertions. A large set of test queries, some of which require tens of thousands of inference steps to answer, can be answered faster after training over an independent set of training queries. Furthermore, this learned inference module outperforms ResearchCyc's integrated inference module, a module that has been hand-tuned with considerable effort.
Improving the suitability of imperfect transcriptions for information retrieval from spoken documents
1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258)

arXiv (Cornell University), Mar 14, 2016
Very large commonsense knowledge bases (KBs) often have thousands to millions of axioms, of which... more Very large commonsense knowledge bases (KBs) often have thousands to millions of axioms, of which relatively few are relevant for answering any given query. A large number of irrelevant axioms can easily overwhelm resolution-based theorem provers. Therefore, methods that help the reasoner identify useful inference paths form an essential part of large-scale reasoning systems. In this paper, we describe two ordering heuristics for optimization of reasoning in such systems. First, we discuss how decision trees can be used to select inference steps that are more likely to succeed. Second, we identify a small set of problem instance features that suffice to guide searches away from intractable regions of the search space. We show the efficacy of these techniques via experiments on thousands of queries from the Cyc KB. Results show that these methods lead to an order of magnitude reduction in inference time.
The public reporting burden for this collection of information is estimated to average 1 hour per... more The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information.

Ai Magazine, Dec 15, 2004
Vulcan Inc.'s Project Halo is a multi-staged effort to create a Digital Aristotle, an application... more Vulcan Inc.'s Project Halo is a multi-staged effort to create a Digital Aristotle, an application that will encompass much of the world's scientific knowledge and be capable of applying sophisticated problem-solving to answer novel questions. Vulcan envisions two primary roles for the Digital Aristotle: as a tutor to instruct students in the sciences, and as an interdisciplinary research assistant to help scientists in their work. As a first step towards this goal, we have just completed a six-month pilot phase, designed to assess the state of the art in applied Knowledge Representation and Reasoning (KR&R). Vulcan selected three teams, each of which was to formally represent 70 pages from the Advanced Placement (AP) chemistry syllabus and deliver knowledge based systems capable of answering questions on that syllabus. The evaluation quantified each system's coverage of the syllabus in terms of its ability to answer previously unseen questions and to provide human-readable answer justifications. These justifications will play a critical role in building user trust in the questionanswering capabilities of the Digital Aristotle. Prior to the final evaluation, a "failure taxonomy" was collaboratively developed in an attempt to standardize failure analysis and to facilitate cross-platform comparisons. Despite differences in approach, all three systems did very well on the challenge, achieving performance comparable to the human median. The analysis also provided key insights into how the approaches might be scaled, while at the same time suggesting how the cost of producing such systems might be reduced. This outcome leaves us highly optimistic that the technical challenges facing this effort in the years to come can be identified and overcome. This paper presents the motivation and long-term goals of Project Halo, describes in detail the month-month pilot phase of the project, its KR&R challenge, empirical 1 Full support for this research was provided by Vulcan Inc. as part of Project Halo. For more information, visit our Web site at www.projecthalo.com. Towards a Digital Aristotle 2 6/18/2004 evaluation, results and failure analysis. The pilot's outcome is used to define challenges for the next phase of the project and beyond.
Converting Semantic Meta-knowledge into Inductive Bias
Lecture Notes in Computer Science, 2005

Traditionally, indexing and searching of speech content in multimedia databases have been achieve... more Traditionally, indexing and searching of speech content in multimedia databases have been achieved through a combination of separately constructed speech recognition and information retrieval engines. Although each technology has a legacy of research, only recently have efforts been made to study the potential suboptimality of this strategy, and none of these efforts specifically addresses the presence of uncertainty in automatically generated transcriptions. This research develops a refinement of the most common information retrieval relevance formula, TFIDF, to incorporate uncertainty as a retrieval feature, along with a set of techniques to acquire this uncertainty from multiple hypotheses produced by existing speech recognition data structures. In the process a greater amount of evidence is extracted than is available in the most likely transcription hypothesis, and overall retrieval precision and recall are improved. The term weighting scheme known as the inverse document frequency is shown to be a special case of the mutual information between the document set and the term, the former requiring a Boolean characterization of term occurrence information and the latter permitting fractional probabilities. The relevance between a query and document from speech recognition is then modelled as a random variable arising from the statistical nature of the speech recognition system. The statistics of this model are then derived from the word lattices and the N-Best lists from the output of the recognizer. In analyzing the word lattices, the path probabilities for each node are summed. The relative rankings of competing terms of these summed probabilities are shown to be indicative of the probability of term occurrence. A model of this relationship is used to predict term presence and term count, reducing the degradation in retrieval quality due to speech recognition by 24%. In a separate model, the Top-N distinct text-processed hypotheses from the word lattices are used to estimate the term probability and term count. This strategy reduces the degradation in retrieval quality due to speech recognition by 63%. Experiments were performed on a standardized test of broadcast news stories that had been transcribed manually and judged against a set of natural language queries. This thesis is dedicated to my wife Erika, whose patience, confidence, and compassion made it possible.

This paper describes the TextLearner prototype, a knowledgeacquisition program that represents th... more This paper describes the TextLearner prototype, a knowledgeacquisition program that represents the culmination of the DARPA-IPTO-sponsored Reading Learning Comprehension seedling program, an effort to determine the feasibility of autonomous knowledge acquisition through the analysis of text. Built atop the Cyc Knowledge Base and implemented almost entirely in the formal representation language of CycL, TextLearner is an anomaly in the way of Natural Language Understanding programs. The system operates by generating an information-rich model of its target document, and uses that model to explore learning opportunities. In particular, TextLearner generates and evaluates hypotheses, not only about the content of the target document, but about how to interpret unfamiliar natural language constructions. This paper focuses on this second capability and describes four algorithms TextLearner uses to acquire rules for interpreting text.
An interactive dialogue system for knowledge acquisition in cyc

In theory, speech recognition technology can make any spoken words in video or audio media usable... more In theory, speech recognition technology can make any spoken words in video or audio media usable for text indexing, search and retrieval. This article describes the News-on-Demand application created within the Informedia TM Digital Video Library project and discusses how speech recognition is used in transcript creation from video, alignment with closed-captioned transcripts, audio paragraph segmentation and a spoken query interface. Speech recognition accuracy varies dramatically depending on the quality and type of data used. Informal information retrieval test show that reasonable recall and precision can be obtained with only moderate speech recognition accuracy. 1.1. Component Technologies There are three broad categories of technologies we can bring This material is based upon work supported by the National Science Foundation under Cooperative Agreement No. IRI-9411299. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Proceedings of the 20th international conference on Computational Linguistics - COLING '04, 2004
We present an automatic approach to learning criteria for classifying the parts-of-speech used in... more We present an automatic approach to learning criteria for classifying the parts-of-speech used in lexical mappings. This will further automate our knowledge acquisition system for non-technical users. The criteria for the speech parts are based on the types of the denoted terms along with morphological and corpus-based clues. Associations among these and the parts-of-speech are learned using the lexical mappings contained in the Cyc knowledge base as training data. With over 30 speech parts to choose from, the classifier achieves good results (77.8% correct). Accurate results (93.0%) are achieved in the special case of the mass-count distinction for nouns. Comparable results are also obtained using OpenCyc (73.1% general and 88.4% mass-count).

The Florida AI Research Society, 2007
Ontologies are an increasingly important tool in knowledge representation, as they allow large am... more Ontologies are an increasingly important tool in knowledge representation, as they allow large amounts of data to be related in a logical fashion. Current research is concentrated on automatically constructing ontologies, merging ontologies with different structures, and optimal mechanisms for ontology building; in this work we consider the related, but distinct, problem of how to automatically determine where to place new knowledge into an existing ontology. Rather than relying on human knowledge engineers to carefully classify knowledge, it is becoming increasingly important for machine learning techniques to automate such a task. Automation is particularly important as the rate of ontology building via automatic knowledge acquisition techniques increases. This paper compares three well-established machine learning techniques and shows that they can be applied successfully to this knowledge placement task. Our methods are fully implemented and tested in the Cyc knowledge base system. 1
The Cyc project is predicated on the idea that, in order to be effective and flexible, computer s... more The Cyc project is predicated on the idea that, in order to be effective and flexible, computer software must have an understanding of the context in which its tasks are performed. We believe this context is what is known informally as “common sense.” Over the last twenty years, sufficient common sense knowledge has been entered into Cyc to allow it to more effectively and flexibly support an important task: increasing its own store of world knowledge. In this paper, we describe the Cyc knowledge base and inference system, enumerate the means that it provides for knowledge elicitation, including some means suitable for use by untrained or lightly trained volunteers, review some ways in which we expect to have Cyc assist in verifying and validating collected knowledge, and describe how we expect the knowledge acquisition process to accelerate in the future.

Journal of Artificial Intelligence and Consciousness, 2021
Processes occurring in brains, a.k.a. biological neural networks, can and have been modeled withi... more Processes occurring in brains, a.k.a. biological neural networks, can and have been modeled within artificial neural network architectures. Due to this, we have conducted a review of research on the phenomenon of blindsight in an attempt to generate ideas for artificial intelligence models. Blindsight can be considered as a diminished form of visual experience. If we assume that artificial networks have no form of visual experience, then deficits caused by blindsight give us insights into the processes occurring within visual experience that we can incorporate into artificial neural networks. This paper has been structured into three parts. Section 2 is a review of blindsight research, looking specifically at the errors occurring during this condition compared to normal vision. Section 3 identifies overall patterns from Sec. 2 to generate insights for computational models of vision. Section 4 demonstrates the utility of examining biological research to inform artificial intelligence...

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018
Advances in image super-resolution (SR) have recently benefited significantly from rapid developm... more Advances in image super-resolution (SR) have recently benefited significantly from rapid developments in deep neural networks. Inspired by these recent discoveries, we note that many state-of-the-art deep SR architectures can be reformulated as a single-state recurrent neural network (RNN) with finite unfoldings. In this paper, we explore new structures for SR based on this compact RNN view, leading us to a dual-state design, the Dual-State Recurrent Network (DSRN). Compared to its single-state counterparts that operate at a fixed spatial resolution, DSRN exploits both lowresolution (LR) and high-resolution (HR) signals jointly. Recurrent signals are exchanged between these states in both directions (both LR to HR and HR to LR) via delayed feedback. Extensive quantitative and qualitative evaluations on benchmark datasets and on a recent challenge demonstrate that the proposed DSRN performs favorably against state-of-the-art algorithms in terms of both memory consumption and predictive accuracy.

Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-
The Informedia Digital Library Project [Wactlar96] allows full content indexing and retrieval of ... more The Informedia Digital Library Project [Wactlar96] allows full content indexing and retrieval of text, audio and video material. Segmentation is an integral process in the Informedia digital video library. The success of the Informedia project hinges on two critical assumptions: that we can extract sufficiently accurate speech recognition transcripts from the broadcast audio and that we can segment the broadcast into video paragraphs, or stories, that are useful for information retrieval. In previous papers [Hauptmann97, Witbrock97, Witbrock98], we have shown that speech recognition is sufficient for information retrieval of pre-segmented video news stories. In this paper we address the issue of segmentation and demonstrate that a fully automatic system can extract story boundaries using available audio, video and closed-captioning cues. The story segmentation step for the Informedia Digital Video Library splits full-length news broadcasts into individual news stories. During this phase the system also labels commercials as separate "stories". We explain how the Informedia system takes advantage of the closed captioning frequently broadcast with the news, how it extracts timing information by aligning the closed-captions with the result of the speech recognition, and how the system integrates closed-caption cues with the results of image and audio processing.
[Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1992
We present SVCnet, a system for modelling speaker variability. Encoder Neural Networks specialize... more We present SVCnet, a system for modelling speaker variability. Encoder Neural Networks specialized for each speech sound produce low dimensionality models of acoustical variation, and these models are further combined into an overall model of voice variability. A training procedure is described which minimizes the dependence of this model on which sounds have been uttered. Using the trained model (SVCnet) and a brief, unconstrained sample of a new speaker's voice, the system produces a Speaker Voice Code that can be used to adapt a recognition system to the new speaker without retraining. A system which combines SVCnet with an MS-TDNN recognizer is described.
Cycorp Project Halo Final Report
Uploads
Papers by Michael Witbrock