Byron Dom

Followers

Following

Co-authors

Public Views

Retired. Have worked in several areas, including most recently, machine learning and text analysis.

less

Interests

Uploads

Papers by Byron Dom

Realtime hardware architectures for 2n tree classifiers

KULeuven. ...

Streaming-media knowledge discovery

IEEE Computer, 2001

Web pages can link to streaming media in several ways. To analyze the Web-page content, we focus ... more Web pages can link to streaming media in several ways. To analyze the Web-page content, we focus on detecting both direct links to streaming media and As the amount of streaming audio and video available to Web users grows, tools for analyzing and indexing this content will become increasingly important.

Download

Pattern recognition meets the world wide web

In the field of pattern recognition there are many problems that are now familiar to us, in a sen... more In the field of pattern recognition there are many problems that are now familiar to us, in a sense part of a standard problem set. This set includes both supervised and unsupervised learning (e.g. clustering). In fact these are the two major problems addressed by the field. These problems and the general pattern recognition problem have appeared in many domains. They have even appeared in the context of something that is reshaping society and commerce-the world wide web. It is our intent to discuss these two learning problems in that context in the hope of drawing the attention of the pattern recognition community and in the hope of making that community aware of some of the more significant work that had been performed to date attacking these problems in the web context.

2n-Tree Classifiers for Realtime Image Segmentation

Journal of Machine Vision and Applications, 1990

For realtime pattern classification applications (e.g. realtime image segmentation), the number o... more For realtime pattern classification applications (e.g. realtime image segmentation), the number of usable pattern classification algorithms is limited by the feasibility of high-speed hardware implementation. This paper describes a pattern classifier and associated hardware architecture and training algorithms. The classifier has both a feasible hardware implementation and other desirable properties not normally found in statistical classifiers. In addition to the classification/training algorithms and hardware architecture, the paper discusses the application of the technique to the problem of image segmentation. Results from segmenting images are included. The scheme described has two major aspects: (1) The classifier itself, which is a look-up-table (LUT) implemented as a 2"-tree, which is a hierarchical data structure that corresponds to a recursive decomposition of feature space and (2) Training schemes, specific to the 2" structure, by which the classification tree is constructed. These training schemes may be used as techniques for machine learning. Two of the training algorithms have the following important properties: they are non-parametric and therefore independent of any particular probability model (e.g. Gaussian); they can handle any shaped decision regions in feature space; and They are consistent in the sense that for large training data sets they produce a classifier that approaches the ideal Bayes classifier. These attributes make this architecture/algorithm combination an excellent alternative to artificial neural networks, a class of classifiers in which there has been much interest, of late. The training algorithms also include an interesting application of the Minimum Description Length principle (MDL). It is used in a tree pruning algorithm that produces trees that are both significantly smaller and, at the same time, have better classification performance (i.e. lower error rates) than unpruned trees.

Download

Pattern classification algorithms for real-time image segmentation

We develop algorithms for use with a recently devel-oped VLSI architecture for pattern classifica... more

<title>Segmentation engine: a real-time image segmentation subsystem</title>

Proceedings of SPIE, Mar 11, 1994

This paper describes a system developed for segmenting multiband grayscale images into n-class la... more This paper describes a system developed for segmenting multiband grayscale images into n-class labeled images at high-throughput rates. This system, which we refer to as the segmentation engine, performs supervised image segmentation using algorithms based on the statistical pattern recognition paradigm. So-called 'features' are computed for each pixel and the feature vector thus formed is presented to a statistical classifier, which uses feature information to determine the most probable class of the pixel. Algorithms are described for the following: features, automatic feature selection, classification and classifier training. While this paper describes the entire system, the algorithmic approach will be emphasized.

MDL estimation for small sample sizes and its application to segmenting binary strings

Minimum Description Length (MDL) estimation has proven itself of major importance in a large numb... more Minimum Description Length (MDL) estimation has proven itself of major importance in a large number of applications many of which are in the fields of computer vision and pattern recognition. A problem is encountered in applying the associated formulas, however, especially those associated with model cost. This is because most of these are asymptotic forms appropriate only for large sample sizes. J. Rissanen has recently derived sharper code-length formulas valid for much smaller sample sizes. Because of the importance of these results, it is our intent here to present a tutorial description of them. In keeping with this goal we have chosen a simple application whose relative tractability allows it to be explored more deeply than most problems: the segmentation of binary strings based on a piecewise Bernoulli assumption. By that we mean that the strings are assumed to be divided into substrings, the bits of which are assumed to have been generated by a single (within a substring) Bernoulli source.

Model Selection in Unsupervised Learning with Applications To Document Clustering

International Conference on Machine Learning, Jun 27, 1999

Google, Inc. (search). ...

Model-based hierarchical clustering

Uncertainty in Artificial Intelligence, Jun 30, 2000

We present an approach to model-based hi erarchical clustering by formulating an ob jective funct... more We present an approach to model-based hi erarchical clustering by formulating an ob jective function based on a Bayesian anal ysis. This model organizes the data into a cluster hierarchy while specifying a complex feature-set partitioning that is a key compo nent of our model. Features can have either a unique distribution in every cluster or a com mon distribution over some (or even all) of the clusters. The cluster subsets over which these features have such a common distribu tion correspond to the nodes (clusters) of the tree representing the hierarchy. We apply this general model to the problem of docu ment clustering for which we use a multino mial likelihood function and Dirichlet priors. Our algorithm consists of a two-stage pro cess wherein we first perform a flat clustering followed by a modified hierarchical agglom erative merging process that includes deter mining the features that will have common distributions over the merged clusters. The regularization induced by using the marginal likelihood automatically determines the op timal model structure including number of clusters, the depth of the tree and the subset of features to be modeled as having a com mon distribution at each node. We present experimental results on both synthetic data and a real document collection.

Download

Generalized Model Selection for Unsupervised Learning in High Dimensions

Neural Information Processing Systems, Nov 29, 1999

We describe a Bayesian approach to model selection in unsupervised learning that determines both ... more We describe a Bayesian approach to model selection in unsupervised learning that determines both the feature set and the number of clusters. We then evaluate this scheme (based on marginal likelihood) and one based on cross-validated likelihood. For the Bayesian scheme we derive a closed-form solution of the marginal likelihood by assuming appropriate forms of the likelihood function and prior. Extensive experiments compare these approaches and all results are verified by comparison against ground truth. In these experiments the Bayesian scheme using our objective function gave better results than cross-validation.

Download

Quantitative methods of evaluating image segmentation

Two sets of measures are proposed in this paper for quantitatively evaluating segmentation result... more

Thermographic Detection of Polymer/Metal Adhesion Failures

Springer eBooks, 1983

Thermography is based on the remote mapping of surface temperature distributions. When heat is ap... more Thermography is based on the remote mapping of surface temperature distributions. When heat is appropriately applied to a sample, subsurface flaws can become projected onto the surface temperature profile due to differences between their thermal transfer properties and those of the bulk. Although not widely exploited in this area in the past, thermography can be an effective nondestructive means of monitoring polymer/metal bond continuity. This work examines the nature of thermographic detection techniques as they relate specifically to polymer/metal adhesion studies. The physical phenomena involved are reviewed, and a basic heat transfer model is presented as a prototype for quantitative analysis of thermographic data. Fundamentals of instrumentation and experimental techniques are outlined, and one specific experimental system is detailed. The application of thermography to polymer/metal adhesion studies is demonstrated by specific examples, and much of the previous pertinent work reported in the literature is referenced.

Machine Vision Techniques for Integrated Circuit Inspection

Elsevier eBooks, 1989

Verifying The Accuracy Of Machine Vision Algorithms And Systems

The purpose of this paper is threefold (a) to summarize important parameters and procedures for v... more The purpose of this paper is threefold (a) to summarize important parameters and procedures for verifying the measurement and recognition (classification) accuracy of machine vision algorithmspystems; @) to alert the machine vision research community to the current, very inadequate practice in this important area; and (c) to propose some measures to improve this situation. Two example applications from our practice are given in order to illustrate experimental veriftcation procedures we presented. The motivation for the paper is based on the fact that machine vision systems are very hard to model or simulate accurately, so realistic large scale experimenrs seem to be the only reliable means of assessing their accuracy. However. in the machine vision (research) community this verification is seldomly done adequately. We feel that until this situation is improved, the transfer of research ideas into practice will be difficult.

Hierarchical Unsupervised Learning

International Conference on Machine Learning, Jun 29, 2000

(search). SIGN IN SIGN UP. Hierarchical Unsupervised Learning. Authors: Shivakumar Vaithyanathan,... more

An information-theoretic external cluster-validity measure

arXiv (Cornell University), Aug 1, 2002

Download

Enhanced hypertext categorization using hyperlinks

A major challenge in indexing unstructured hypertext databases is to automatically extract meta-d... more A major challenge in indexing unstructured hypertext databases is to automatically extract meta-data that enables structured search using topic taxonomies, circumvents keyword ambiguity, and improves the quality of search and profile-based routing and filtering. Therefore, an accurate classifier is an essential component of a hypertext database. Hyperlinks pose new problems not addressed in the extensive text classification literature. Links clearly contain highquality semantic clues that are lost upon a purely termbased classifier, but exploiting link information is non-trivial because it is noisy. Naive use of terms in the link neighborhood of a document can even degrade accuracy. Our contribution is to propose robust statistical models and a relmation labeling technique for better classification by exploiting link information in a small neighborhood around documents. Our technique also adapts gracefully to the fraction of neighboring documents having known topics. We experimented with pre-classified samples from Yahoo!' and the US Patent Database2. In previous work, we developed a text classifier that misclassified only 13% of the documents in the well-known Reuters benchmark; this was comparable to the best results ever obtained. This classifier misclassified 36% of the patents, indicating that classifying hypertext can be more difficult than classifying text. Naively using terms in neighboring documents increased error to 38%; our hypertext classifier reduced it to 21%. Results with the Yahoo! sample were more dramatic: the text classifier showed 68% error, whereas our hypertext classifier reduced this to only 21%.

Download

Design and implementation of a low-level image segmentation architecture — LISA

Journal of Machine Vision and Applications, Sep 1, 1993

The main focus of this paper is on the architectural and implementation issues of a prototype of ... more The main focus of this paper is on the architectural and implementation issues of a prototype of a low-level image segmentation architecture (LISA). LISA performs real-time (20 Mpixels/sec) gray-level image segmentation, i.e., assignment of image pixels to a few user-selected classes. A decision-theoretic pattern-recognition approach is used, which is divided into a feature extraction part and a decision analysis part. The feature extraction part is based on extracting local and global descriptions for all of the image pixels. In the decision analysis part we designed a novel no-cross-term classifier, which significantly reduced the hardware complexity. The LISA prototype has been built with custom and off-the-shelf VLSI chips. Some measured results will also be reported.

The P300: An Approach to Automated Inspection of Patterned Wafers

Proceedings of SPIE, Jul 19, 1989

Title: The P300: an approach to automated inspection of patterned wafers. Authors: Brecher, Virgi... more

<title>Calibration, setup, and performance evaluation in an IC inspection system</title>

Proceedings of SPIE, Aug 1, 1992

Many papers on automatic inspection systems ignore the issues of calibration, setup and performan... more Many papers on automatic inspection systems ignore the issues of calibration, setup and performance evaluation, assuming (apparently) that they merely involve `straightforward engineering.'' In reality developing effective and robust procedures and algorithms to implement these features can be a demanding process. In fact, unbeknownst to the developers or users, the performance of many inspection systems could be significantly improved through better setup and calibration routines. In this tutorial paper we discuss both theoretical and practical issues. We start by reviewing the statistical framework underlying performance evaluation. Next we examine possible sources of inspection performance degradation. Last we describe calibration, setup and performance evaluation procedures and associated image analysis algorithms for an automated IC inspection system. While these procedures are specific to a particular system, we attempt to generalize them wherever possible.

Byron Dom

Uploads

Papers by Byron Dom

Log In