All in-text references underlined in blue are linked to publications on ResearchGate, letting you... more All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Abstract. This paper presents a new class of moves, called α-expansion-contraction, which general... more Abstract. This paper presents a new class of moves, called α-expansion-contraction, which generalizes α-expansion graph cuts for multi-label en-ergy minimization problems. The new moves are particularly useful for optimizing the assignments in model fitting frameworks whose energies include Label Cost (LC), as well as Markov Random Field (MRF) terms. These problems benefit from the contraction moves ’ greater scope for removing instances from the model, reducing label costs. We demon-strate this effect on the problem of fitting sets of geometric primitives to point cloud data, including real-world point clouds containing millions of points, obtained by multi-view reconstruction. 1
In this paper we introduce a new distance for robustly matching vectors of 3D rotations. A specia... more In this paper we introduce a new distance for robustly matching vectors of 3D rotations. A special representation of 3D rotations, which we coin full-angle quaternion (FAQ), allows us to express this distance as Euclidean. We apply the distance to the problems of 3D shape recognition from point clouds and 2D object tracking in color video. For the former, we introduce a hashing scheme for scale and translation which outperforms the previous state-of-the-art approach on a public dataset. For the latter, we incorporate online subspace learning with the proposed FAQ represen-tation to highlight the benefits of the new representation. 1.
Abstract. This chapter presents a method for vote-based 3D shape recognition and registration, in... more Abstract. This chapter presents a method for vote-based 3D shape recognition and registration, in particular using mean shift on 3D pose votes in the space of direct similarity transformations for the first time. We introduce a new distance between poses in this space—the SRT distance. It is left-invariant, unlike Euclidean distance, and has a unique, closed-form mean, in contrast to Riemannian distance, so is fast to compute. We demonstrate improved performance over the state of the art in both recognition and registration on a (real and) challenging dataset, by comparing our distance with others in a mean shift framework, as well as with the commonly used Hough voting approach. 1
A framework is presented for estimating the pose of a camera based on images extracted from a sin... more A framework is presented for estimating the pose of a camera based on images extracted from a single omnidirectional image of an urban scene, given a 2D map with building outlines with no 3D geometric information nor appearance data. The framework attempts to identify vertical corner edges of buildings in the query image, which we term VCLH, as well as the neighboring plane normals, through vanishing point analysis. A bottom-up process further groups VCLH into elemental planes and subsequently into 3D structural fragments modulo a similarity transformation. A geometric hashing lookup allows us to rapidly establish multiple candidate correspondences between the structural fragments and the 2D map building contours. A voting-based camera pose estimation method is then employed to recover the correspondences admitting a camera pose solution with high consensus. In a dataset that is even challenging for humans, the system returned a top-30 ranking for correct matches out of 3600 camera ...
We present a video-based system which interactively captures the geometry of a 3D object in the f... more We present a video-based system which interactively captures the geometry of a 3D object in the form of a point cloud, then recognizes and registers known objects in this point cloud in a matter of seconds (fig. 1). In order to achieve interactive speed, we exploit both efficient inference algorithms and parallel computation, often on a GPU. The system can be broken down into two distinct phases: geometry capture, and object inference. We now discuss these in further detail. Geometry capture The reconstruction phase consists of two key steps: pose estimation of each video frame, and dense geometry estimation using the input frames and their computed poses. We have two interchangeable methods for real-time camera pose estimation. Both assume known internal camera
www.toshiba-europe.com/research/crl/cvg/ In applying the Hough transform to the problem of 3D sha... more www.toshiba-europe.com/research/crl/cvg/ In applying the Hough transform to the problem of 3D shape recognition and registration, we develop two new and powerful improvements to this popular inference method. The first, intrinsic Hough, solves the problem of exponential memory requirements of the standard Hough transform by exploiting the sparsity of the Hough space. The second, minimum-entropy Hough, explains away incorrect votes, substantially reducing the number of modes in the posterior distribution of class and pose, and improving precision. Our experiments demonstrate that these contributions make the Hough transform not only tractable but also highly accurate for our example application. Both contributions can be applied to other applications that already use the standard Hough transform, as well as making it feasible and competitive for potentially many more. bracket block bearing cog flange piston1 knob pipe piston2 car (a) Standard Hough (b) Max. Hough (c) Min.-entropy Hou...
This paper proposes a method for estimating the 3D body shape of a person with robustness to clot... more This paper proposes a method for estimating the 3D body shape of a person with robustness to clothing. We formulate the problem as optimization over the manifold of valid depth maps of body shapes learned from synthetic training data. The manifold itself is represented using a novel data struc-ture, a Multi-Resolution Manifold Forest (MRMF), which contains vertical edges between tree nodes as well as hori-zontal edges between nodes that correspond to overlapping partitions. We show that this data structure allows both effi-cient localization and navigation on the manifold for on-the-fly building of local linear models (manifold charting). We demonstrate shape estimation of clothed users, showing sig-nificant improvement in accuracy over global shape models and models using pre-computed clusters. We further com-pare the MRMF with alternative manifold charting methods on a public dataset for estimating 3D motion from noisy 2D marker observations, obtaining state-of-the-art results. 1.
Noname manuscript No. (will be inserted by the editor) Demisting the Hough Transform for 3D Shape Recognition and Registration
InapplyingtheHoughtransformtotheproblem of 3D shape recognition and registration, we develop two ... more InapplyingtheHoughtransformtotheproblem of 3D shape recognition and registration, we develop two new and powerful improvements to this popularinferencemethod.Thefirst, intrinsic Hough,solves the problem of exponential memory requirements of the standard Hough transform by exploiting the sparsity of the Hough space. The second, minimum-entropy Hough, explains away incorrect votes, substantially reducing the number of modes in the posterior distribution of class and pose, and improving precision. Our experimentsdemonstratethatthesecontributionsmake the Hough transform not only tractable but also highly accurate for our example application. Both contributions can be applied to other tasks that already use the standard Hough transform. 1
We introduce a generalized representation for a boosted classifier with multiple exit nodes, and ... more We introduce a generalized representation for a boosted classifier with multiple exit nodes, and propose a method to training which combines the idea of propagating scores across boosted classifiers [14, 17] and the use of asymmetric goals [13]. A means for determining the ideal constant asymmetric goal is provided, which is theoretically justified under a conservative bound on the ROC operating point target and empirically near-optimal under the exact bound. Moreover, our method automatically minimizes the number of weak classifiers, avoiding the need to retrain a boosted classifier multiple times for empirical best performance as in conventional methods. Experimental results shows significant reduction in training time and number of weak classifiers, as well as better accuracy, compared to conventional cascades and multi-exit boosted classifiers. 1.
A new distance for scale-invariant 3D shape recognition and registration
This paper presents a method for vote-based 3D shape recognition and registration, in particular ... more This paper presents a method for vote-based 3D shape recognition and registration, in particular using mean shift on 3D pose votes in the space of direct similarity transforms for the first time. We introduce a new distance between poses in this space—the SRT distance. It is left-invariant, unlike Euclidean distance, and has a unique, closed-form mean, in contrast to Riemannian distance, so is fast to compute. We demonstrate improved performance over the state of the art in both recognition and registration on a real and challenging dataset, by comparing our distance with others in a mean shift framework, as well as with the commonly used Hough voting approach. 1.
The integral image is typically used for fast integrating a function over a rectangular region in... more The integral image is typically used for fast integrating a function over a rectangular region in an image. We propose a method that extends the integral image to do fast integration over the interior of any polygon that is not necessarily rectilinear. The integration time of the method is fast, independent of the image resolution, and only linear to the polygon’s number of vertices. We apply the method to Viola and Jones ’ object detection framework, in which we propose to improve classical Haar-like features with polygonal Haar-like features. We show that the extended feature set improves object detection’s performance. The experiments are conducted in three domains: frontal face detection, fixed-pose hand detection, and rock detection for Mars ’ surface terrain assessment. face hand head (HC) car traffic sign license plate Figure 1. Some common targets for object detection. If the region of interest is known, it may be better to use a polygon than a rectangle to approximate the d...
This paper investigates how the speed of an object detector can be rapidly increased through a ca... more This paper investigates how the speed of an object detector can be rapidly increased through a caching framework when input sequences are quasi-repetitive. In the proposed framework, observed output states are discretized into a large number of classes. Each class induces its own discriminant subspace in the feature space, and is associated with a feature exemplar and a local metric. The feature exemplars and the local metrics are learned online using a novel piecewise linear discriminant analysis. The execution of the original object detector is skipped when the current image feature vector is similar to previously observed feature exemplars, and previously detected object states may simply be recalled. The exemplar recognition is carried via a 1-pass approximate nearest neighbor search in an index tree based on k-means clustering. Preliminary results show up to a 5-fold improvement when applied to the Viola & Jones [23] face detector, improving speeds from 10fps to 50fps. Experime...
Face Detection with Asymmetric Boosting: Principled Methods to Rapid Learning and Classification
Asymmetric boosting, while acknowledged to be important to state-of-the-art face detection, is ty... more Asymmetric boosting, while acknowledged to be important to state-of-the-art face detection, is typically based on the trial-and-error practice, rather than on principled methods. This work solves a number of issues related to asymmetric boosting and the use of asymmetric boosting in face detection. It shows how a proper understanding and use of asymmetric boosting leads to significant improvements in the learning time, the learning capacity, the detection speed and the detection accuracy of a face detector. There are four main contributions in this book: 1) a new method to learn online an asymmetric boosted classifier, pioneering a new direction of online learning a face detector; 2) a new weak classifier learning method, significantly reducing the learning time of a face detector from weeks to just a few hours; 3) a new and principled method to learn a face detector cascade, further improving the learning time and the detection speed of a face detector; and 4) a theoretical analysi...
A method dividing an image into plural superpixels of plural pixels of the image. The method calc... more A method dividing an image into plural superpixels of plural pixels of the image. The method calculates an initial set of weights from a measure of similarity between pairs of pixels, from which a resultant set of weights is calculated for pairs of pixels that are less that a threshold distance apart on the image. The calculation calculates a weight for a pair of pixels as the sum over a set of third pixels of the product of initial weight of the first pixel of the pair of pixel with the third pixel and the weight of the third pixel with the second pixel. Each weight is then subjected to a power coefficient operation. The resultant set of weights and the initial set of weights are then compared to check for convergence. If the weights converge, the converged set of weights is used to divide the image into superpixels.
Joint object recognition and pose estimation solely from range images is an important task e.g. i... more Joint object recognition and pose estimation solely from range images is an important task e.g. in robotics applications and in automated manufacturing environments. The lack of color information and limitations of current commodity depth sensors make this task a challenging computer vision problem, and a standard random sampling based approach is prohibitively time-consuming. We propose to address this difficult problem by generating promising inlier sets for pose estimation by early rejection of clear outliers with the help of local belief propagation (or dynamic programming). By exploiting data-parallelism our method is fast, and we also do not rely on a computationally expensive training phase. We demonstrate state-of-the art performance on a standard dataset and illustrate our approach on challenging real sequences.
Design and Development of EMON-An Embedded Control System over Ethernet and TCP/IP
An Embedded Control System over Ethernet and TCP/IP-EMON, developed by the KC. 03-13 project, is ... more An Embedded Control System over Ethernet and TCP/IP-EMON, developed by the KC. 03-13 project, is presented in this paper. The system can be the core of wide variety of embedded applications in Lab and Industrial Automation. The Ethernet with 10/100Mbps ...
Uploads
Papers by Minh-tri Pham