Papers by Francesca Odone

2020 IEEE Winter Conference on Applications of Computer Vision (WACV), 2020
Effective assisted living environments must be able to perform inferences on how their occupants ... more Effective assisted living environments must be able to perform inferences on how their occupants interact with one another as well as with surrounding objects. To accomplish this goal using a vision-based automated approach, multiple tasks such as pose estimation, object segmentation and gaze estimation must be addressed. Gaze direction in particular provides some of the strongest indications of how a person interacts with the environment. In this paper, we propose a simple neural network regressor that estimates the gaze direction of individuals in a multi-camera assisted living scenario, relying only on the relative positions of facial keypoints collected from a single pose estimation model. To handle cases of keypoint occlusion, our model exploits a novel confidence gated unit in its input layer. In addition to the gaze direction, our model also outputs an estimation of its own prediction uncertainty. Experimental results on a public benchmark demonstrate that our approach performs on pair with a complex, datasetspecific baseline, while its uncertainty predictions are highly correlated to the actual angular error of corresponding estimations. Finally, experiments on images from a real assisted living environment demonstrate the higher suitability of our model for its final application.
Journal of Machine Learning Research, 2005
Many works related learning from examples to regularization techniques for inverse problems. Neve... more Many works related learning from examples to regularization techniques for inverse problems. Nevertheless by now there was no formal evidence neither that learning from examples could be seen as an inverse problem nor that theoretical results in learning theory could be independently derived using tools from regularization theory. In this paper we provide a positive answer to both questions. Indeed, considering the square loss, we translate the learning problem in the language of regularization theory and we show that consistency results and optimal regularization parameter choice can be derived by the discretization of the corresponding inverse problem.
We consider applications of clustering techniques, Mean Shift and Self-Organizing Maps, to surfac... more We consider applications of clustering techniques, Mean Shift and Self-Organizing Maps, to surface reconstruction (meshing) from scattered point data and review a novel kernel-based clustering method.
This paper presents a motion segmentation method useful for representing efficiently a video shot... more This paper presents a motion segmentation method useful for representing efficiently a video shot as a static mosaic of the background plus sequences of moving foreground objects. This generates an MPEG-4 compliant, content-based representation useful for video coding, editing and indexing. Segmentation of moving objects is carried out by comparing each frame with a mosaic of the static background, in which the ego-motion of the camera is compensated for with a robust technique. The automatic computation of the mosaic and the segmentation procedure are compared with the current literature and illustrated with real sequences experiments. An example of content-based manipulation is also shown.
Dynamic Video Mosaics and Augmented Reality for Subsea Inspection and Monitoring
This paper reports a powerful technique for building panoramic mosaics from videosequences automa... more This paper reports a powerful technique for building panoramic mosaics from videosequences automatically. No information about the camera motion nor on its opticalparameters are necessary. Mosaics can be built even in the presence of objects movingin front of the target scene (dynamic mosaicing), which are deleted by motionanalysis. The technique also makes augmented reality possible, that is, inserting newelements in
We investigate the automatic estimation of fish weight from sets of morphometric measurements. Ou... more We investigate the automatic estimation of fish weight from sets of morphometric measurements. Our solution combines a vision system with a robust regression method, the Support Vector Machine (SVM). Measurements are taken automatically from two binarised views of each fish in a training sample, then fed to a quadratic SVM along with approximate weight estimates. The SVM learns the law linking weight to shape directly (without computing volume) and compensates for several inaccuracies in the training measurements. We suggest a methodology identifying optimal shape measurements for the task, and report results obtained with a sample of 99 trouts between 300 and 600g, showing good accuracy and reliability, and better performance with respect to length-weight relations adopted commonly in fisheries science.
Finding Objects with Hypothesis Testing
... Our fea-ture selection can be compared with the one proposed by Viola and Jones [19] in the s... more ... Our fea-ture selection can be compared with the one proposed by Viola and Jones [19] in the sense that we both start from a large set of features and we aim at ob-taining a relatively small number of highly descriptive ones. ... [12] TD Rikert, MJ Jones, and P. Viola. ...
We describe a trainable system for face detection and tracking. The structure of the system is ba... more We describe a trainable system for face detection and tracking. The structure of the system is based on multiple cues that discard non face areas as soon as possible: we combine motion, skin, and face detection. The latter is the core of our system and consists of a hierarchy of small SVM classifiers built on the output of an automatic feature selection procedure. Our feature selection is entirely data-driven and allows us to obtain powerful descriptions from a relatively small set of data. Finally, a Kalman tracking on the face region optimizes detection results over time. We present an experimental analysis of the face detection module and results obtained with the whole system on the specific task of counting people entering the scene.

This paper describes a work in progress on a multisensor system for 3D data acquisition. The syst... more This paper describes a work in progress on a multisensor system for 3D data acquisition. The system core structure is a 3D-range scan based on the well known active triangulation procedure and made of a camera, a laser light emitter and a software driven motor. The core system allows us to acquire dense point clouds of objects of about 50 cm. The system today hosts a second camera and thus is able to perform 3D reconstruction from two slightly different viewpoints and produce more dense point clouds. Also, since the motor can be driven back to the original position multiple scans can take place, to obtain smooth surfaces, and multiple information, such as texture and reliability measures. An alternative way of obtaining texture information is by means of a linear camera, also included in the system. We present results obtained with the current system, and describe extensions of the system in estimating noise and producing a more complex geometry description.
Multi-modality is a fundamental feature that characterizes biological systems and lets them achie... more Multi-modality is a fundamental feature that characterizes biological systems and lets them achieve high robustness in understanding skills while coping with uncertainty. Relatively recent studies showed that multi-modal learning is a potentially effective add-on to artificial systems, allowing the transfer of information from one modality to another. In this paper we propose a general architecture for jointly learning visual and motion patterns: by means of regression theory we model a mapping between the two sensorial modalities improving the performance of artificial perceptive systems. We present promising results on a case study of grasp classification in a controlled setting and discuss future developments.

Pattern Analysis and Applications, 2002
: This paper presents a motion segmentation method useful for representing efficiently a video sh... more : This paper presents a motion segmentation method useful for representing efficiently a video shot as a static mosaic of the background plus sequences of the objects moving in the foreground. This generates an MPEG-4 compliant, layered representation useful for video coding, editing and indexing. First, a mosaic of the static background is computed by estimating the dominant motion of the scene. This is achieved by tracking features over the video sequence and using a robust technique that discards features attached to the moving objects. The moving objects get removed in the final mosaic by computing the median of the grey levels. Then, segmentation is obtained by taking the pixelwise difference between each frame of the original sequence and the mosaic of the background. To discriminate between the moving object and noise, temporal coherence is exploited by tracking the object in the binarised difference image sequence. The automatic computation of the mosaic and the segmentation procedure are illustrated with real sequences experiments. Examples of coding and content-based manipulation are also shown.
Neural Computation, 2008
We discuss how a large class of regularization methods, collectively known as spectral regulariza... more We discuss how a large class of regularization methods, collectively known as spectral regularization and originally designed for solving illposed inverse problems, gives rise to regularized learning algorithms.

IEEE Transactions on Image Processing, 2005
In the statistical learning framework the use of appropriate kernels may be the key for substanti... more In the statistical learning framework the use of appropriate kernels may be the key for substantial improvement in solving a given problem. In essence, a kernel is a similarity measure between input points satisfying some mathematical requirements and possibly capturing the domain knowledge. In this paper we focus on kernels for images: we represent the image information content with binary strings and discuss various bitwise manipulations obtained using logical operators and convolution with non-binary stencils. In the theoretical contribution of our work we show that histogram intersection is a Mercer's kernel and we determine the modifications under which a similarity measure based on the notion of Hausdorff distance is also a Mercer's kernel. In both cases we determine explicitly the mapping from input to feature space. The presented experimental results support the relevance of our analysis for developing effective trainable systems.
This thesis considers the representation and the identification of objects in image sequences: ob... more This thesis considers the representation and the identification of objects in image sequences: objects are represented by information extracted from image sequences, and this representation is exploited to solve problems such as object detection, recognition, and scene location.
In this paper we discuss the mathematical properties of a few kernels specifically constructed fo... more In this paper we discuss the mathematical properties of a few kernels specifically constructed for dealing with image data in binary classification and novelty detection problems. First, we show that histogram intersection is a Mercer’s kernel. Then, we show that a similarity measure based on the notion of Hausdorff distance and directly applicable to raw images, though not a Mercer’s kernel, is a kernel for novelty detection. Both kernels appear to be well suited for building effective vision-based learning systems.

In this paper we present a trainable method for selecting features from an overcomplete dictionar... more In this paper we present a trainable method for selecting features from an overcomplete dictionary of measurements. The starting point is a thresholded version of the Landweber algorithm for providing a sparse solution to a linear system of equations. We consider the problem of face detection and adopt rectangular features as an initial representation for allowing straightforward comparisons with existing techniques. For computational efficiency and memory requirements, instead of implementing the full optimization scheme on tenths of thousands of features, we propose to first solve a number of smaller size optimization problems obtained by randomly sub-sampling the feature vector, and then recombining the selected features. The obtained set is still highly redundant, so we further apply feature selection. The final feature selection system is an efficient two-stages architecture. Experimental results of an optimized version of the method on face images and image sequences indicate that this method is a serious competitor of other feature selection schemes recently popularized in computer vision for dealing with problems of real time object detection.

Graphical Models /graphical Models and Image Processing /computer Vision, Graphics, and Image Processing, 2006
The paper tackles the problem of feature points matching between pair of images of the same scene... more The paper tackles the problem of feature points matching between pair of images of the same scene. This is a key problem in computer vision. The method we discuss here is a version of the SVD-matching proposed by Scott and Longuet-Higgins and later modified by Pilu, that we elaborate in order to cope with large scale variations. To this end we add to the feature detection phase a keypoint descriptor that is robust to large scale and view-point changes. Furthermore, we include this descriptor in the equations of the proximity matrix that is central to the SVD-matching. At the same time we remove from the proximity matrix all the information about the point locations in the image, that is the source of mismatches when the amount of scene variation increases. The main contribution of this work is in showing that this compact and easy algorithm can be used for severe scene variations. We present experimental evidence of the improved performance with respect to the previous versions of the algorithm.
In this paper we address the problem of classifying images, by exploiting global features that de... more In this paper we address the problem of classifying images, by exploiting global features that describe color and illumination properties, and by using the statistical learning paradigm. The contribution of this paper is twofold. First, we show that histogram intersection has the required mathematical properties to be used as a kernel function for Support Vector Machines (SVMs). Second, we give two examples of how a SVM, equipped with such a kernel, can achieve very promising results on image classi£cation based on color information.
Learning one class at a time can be seen as an effective solution to classification problems in w... more Learning one class at a time can be seen as an effective solution to classification problems in which only the positive examples are easily identifiable. A kernel method to accomplish this goal consists of a representation stage - which computes the smallest sphere in feature space enclosing the positive examples - and a classification stage - which uses the obtained sphere as a decision surface to determine the positivity of new examples. In this paper we describe a kernel well suited to represent, identify, and recognize 3D objects from unconstrained images. The kernel we introduce, based on Hausdorff distance, is tailored to deal with grey-level image matching. The effectiveness of the proposed method is demonstrated on several data sets of faces and objects of artistic relevance, like statues.
A trainable system for face detection in unconstrained environments
ABSTRACT This paper describes a monitoring system that implements real-time face detection. The s... more ABSTRACT This paper describes a monitoring system that implements real-time face detection. The structure of the system is based on multiple cues that discard non face areas as soon as possible: we combine motion, skin, and face detection. The latter is the core of our system and consists of a hierarchy of small SVM classifiers built on the output of a feature selection procedure. Following face detection, a Kalman tracking on the face region allows us to optimize results over time. We present an experimental analysis of the face detection module and results obtained with the whole system on the specific task of counting people entering the scene.
Uploads
Papers by Francesca Odone