Papers by Jose Santos-Victor
MONOCULAR VS BINOCULAR 3D REAL-TIME BALL TRACKING FROM 2D ELLIPSES
... Nicola Greggio ⊘,, José Gaspar , Alexandre Bernardino , José Santos-Victor ⊘ ARTS Lab - ... more ... Nicola Greggio ⊘,, José Gaspar , Alexandre Bernardino , José Santos-Victor ⊘ ARTS Lab - Scuola Superiore S.Anna, Polo S.Anna Valdera, Viale R. Piaggio, 34 - 56025 Pontedera, Italy Instituto de Sistemas e Robótica, Instituto Superior Técnico, 1049-001 Lisboa ...

We present a novel boosting algorithm where temporal consistency is addressed in a short-term way... more We present a novel boosting algorithm where temporal consistency is addressed in a short-term way. Although temporal correlation of observed data may be an important cue for classification (e.g. of human activities) it is seldom used in boosting techniques. The recently proposed Temporal AdaBoost addresses the same problem but in a heuristic manner, first optimizing the weak learners without temporal integration. The classifier responses for past frames are then averaged together, as long as the total classification error decreases. We extend the GentleBoost algorithm by modeling time in an explicit form, as a new parameter during the weak learner training and in each optimization round. The time consistency model induces a fuzzy decision function, dependent on the temporal support of a feature or data point, with added robustness to noise. Our temporal boost algorithm is further extended to cope with multi class problems, following the JointBoost approach introduced by Torralba et. al. We can thus (i) learn the parameters for all classes at once, and (ii) share features among classes and groups of classes, both in a temporal and fully consistent manner. Finally, the superiority of our proposed framework is demonstrated comparing it to state of the art, temporal and non-temporal boosting algorithms. Tests are performed both on synthetic and 2 real challenging datasets used to recognize a total of 12 different human activities.
Pattern Recognition Letters, 2009
Several approaches to object recognition make extensive use of local image information extracted ... more Several approaches to object recognition make extensive use of local image information extracted in interest points, known as local image descriptors. State-of-the-art methods perform a statistical analysis of the gradient information around the interest point, which often relies on the computation of image derivatives with pixel differencing methods. In this paper we show the advantages of using smooth derivative filters instead of pixel differences in the performance of a well known local image descriptor. The method is based on the use of odd Gabor functions, whose parameters are selectively tuned to as a funcion of the local image properties under analysis. We perform an extensive experimental evaluation to show that our method increases the distinctiveness of local image descriptors for image region matching and object recognition.
We address the empirical feature selection for tracker-less recognition of human actions. We rely... more We address the empirical feature selection for tracker-less recognition of human actions. We rely on the appearance plus motion model over several video frames to model the human movements. We use the L2Boost algorithm, a versatile boosting algorithm which simplifies the gradient search. We study the following options in the feature computation and learning: (i) full model vs. component-wise model, (ii) sampling strategy of the histogram cells and (iii) number of previous frames to include, amongst others. We select the features’ parameters that provide the best compromise between performance and computational efficiency and apply the features in a challenging problem, the tracker-less and detection-less human activity recognition.

This paper presents a novel approach to the weak classifier selection based on the GentleBoost fr... more This paper presents a novel approach to the weak classifier selection based on the GentleBoost framework, based on sharing a set of features at each round. We explore the use of linear dimensionality reduction methods to guide the search for features that share some properties, such as correlations and discriminative properties. We add this feature set as a new parameter of the decision stump, which turns the single branch selection of the classic stump into a fuzzy decision that weights the contribution of both branches. The weights of each branch act as a confidence measure based on the feature set characteristics, which increases the accuracy and robustness to data perturbations. We propose an algorithm that consider the similarities between the weights provided by three linear mapping algorithms: PCA, LDA and MMLMNN [14]. We propose to analyze the row vectors of the linear mapping, grouping vector components with very similar values. Then, the created groups are the inputs of the FuzzyBoost algorithm. This search procedure generalizes the previous temporal FuzzyBoost [10] to any type of features. We present results in features with spatial support (images) and spatio-temporal support (videos), showing the generalization properties of the FuzzyBoost algorithm in other scenarios.

Recently, log-polar images have been successfully used in active-vision tasks such as vergence co... more Recently, log-polar images have been successfully used in active-vision tasks such as vergence control or target tracking. However, while the role of foveal data has been exploited and is well known, that of periphery seems underestimated and not well understood. Nevertheless, peripheral information becomes crucial in detecting non-foveated objects or events. In this paper, a multiple-model approach (MMA) for top-down, model-based attention processes is proposed. The advantages offered by this proposal for space-variant image representations are discussed. A simple but representative frontal-face detection task is given as an example of application of the MMA. The combination of appearance-based features and a linear regression-based classifier proved very effective. Results show the ability of the system to detect faces at very low resolutions, which has implications in fields such as visual surveillance.

Gabor Parameter Selection for Local Feature Detection
Some recent works have addressed the object recognition problem by representing objects as the co... more Some recent works have addressed the object recognition problem by representing objects as the composition of independent image parts, where each part is modeled with “low-level” features. One of the problems to address is the choice of the low-level features to appropriately describe the individual image parts. Several feature types have been proposed, like edges, corners, ridges, Gaussian derivatives, Gabor features, etc. Often features are selected independently of the object to represent and have fixed parameters. In this work we use Gabor features and describe a method to select feature parameters suited to the particular object considered. We propose a method based on the Information Diagram concept, where “good” parameters are the ones that optimize the filter’s response in the filter parameter space. We propose and compare some concrete methodologies to choose the Gabor feature parameters, and illustrate the performance of the method in the detection of facial parts like eyes, noses and mouths. We show also the rotation invariance and robustness to small scale changes of the proposed Gabor feature.
Several approaches to object recognition make extensive use of local image information extracted ... more Several approaches to object recognition make extensive use of local image information extracted in interest points, known as local image descriptors. State-of-the-art methods perform a statistical analysis of the gradient information around the interest point, which often relies on the computation of image derivatives with pixel differencing methods. In this paper we show the advantages of using smooth derivative filters instead of pixel differences in the performance of a well known local image descriptor. The method is based on the use of odd Gabor functions, whose parameters are selectively tuned to as a funcion of the local image properties under analysis. We perform an extensive experimental evaluation to show that our method increases the distinctiveness of local image descriptors for image region matching and object recognition.

The application of learning-based vision techniques to real scenarios usually requires a tunning ... more The application of learning-based vision techniques to real scenarios usually requires a tunning procedure, which involves the acquisition and labeling of new data and in situ experiments in order to adapt the learning algorithm to each scenario. We address an automatic update procedure of the L2boost algorithm that is able to adapt the initial models learned off-line. Our method is named UAL2Boost and present three new contributions: (i) an on-line and continuous procedure that updates recursively the current classifier, reducing the storage constraints, (ii) a probabilistic unsupervised update that eliminates the necessity of labeled data in order to adapt the classifier and (iii) a multi-class adaptation method. We show the applicability of the on-line unsupervised adaptation to human action recognition and demonstrate that the system is able to automatically update the parameters of the L2boost with linear temporal models, thus improving the output of the models learned off-line on new video sequences, in a recursive and continuous way. The automatic adaptation of UAL2Boost follows the idea of adapting the classifier incrementally: from simple to complex.
APPEARANCE BASED SALIENT POINT DETECTION WITH INTRINSIC SCALE-FREQUENCY DESCRIPTOR
Recent object recognition methods propose to represent ob- jects by collections of local appearan... more Recent object recognition methods propose to represent ob- jects by collections of local appearance descriptors in sev - eral interest points. For recognition, this representatio n is matched to image data. Interest points (candidates for matching) are usually selected from images in a purely bottom-up manner. However, in many situations, there is a limited number of objects to search for,

This paper introduces a testbed for sensor and robot network systems, currently composed of 10 ca... more This paper introduces a testbed for sensor and robot network systems, currently composed of 10 cameras and 5 mobile wheeled robots equipped with several sensors for self-localization, obstacle avoidance and vision cameras, and wireless communications. The testbed includes a serviceoriented middleware to enable fast prototyping and implementation of algorithms previously tested in simulation, as well as to simplify integration of subsystems developed by different partners. We survey an integrated approach to human-robot interaction that has been developed supported by the testbed under an European research project. The application integrates innovative methods and algorithms for people tracking and waving detection, cooperative perception among static and mobile cameras to improve people tracking accuracy, as well as decision-theoretical approaches to sensor selection and task allocation within the sensor network.
We present a method to detect people waving using video streams from a fixed camera system. Wavin... more We present a method to detect people waving using video streams from a fixed camera system. Waving is a natural means of calling for attention and can be used by citizens to signal emergency events or abnormal situations in future automated surveillance systems. Our method is based on training a supervised classifier using a temporal boosting method based on optical flow-derived features. The base algorithm shows a low false positive rate and if further improves through the definition of a minimum time for the duration of the waving event. The classifier generalizes well to scenarios very different from where it was trained. We show that a system trained indoors with high resolution and frontal postures can operate successfully, in real-time, in an outdoor scenario with large scale differences and arbitrary postures.
In this paper we compare several optical flow based features in order to distinguish between huma... more In this paper we compare several optical flow based features in order to distinguish between humans and robots in a mixed human-robot environment. In addition, we propose two modifications to the optical flow computation: (i) a way to standardize the optical flow vectors, which relates the real world motions to the image motions, and (ii) a way to improve flow robustness to noise by selecting the sampling times as a function of the spatial displacement of the target in the world. We add temporal consistency to the flow-based features by using a temporal-Boost algorithm. We compare combinations of: (i) several temporal supports, (ii) flow-based features, (iii) flow standardization, and (iv) flow sub-sampling. We implement the approach with better performance and validate it in a real outdoor setup, attaining real-time performance.
Statistical Relational Learning of Object Affordances for Robotic Manipulation
Latest Advances in Inductive Logic Programming, 2014
Using vision for underwater robotics: video mosaics and station keeping
... keeping José Santos-Victor, Nuno Gracias, Sjoerd van der Zwaan Instituto Superior Técnico &am... more ... keeping José Santos-Victor, Nuno Gracias, Sjoerd van der Zwaan Instituto Superior Técnico & Instituto de Sistemas e Robótica ISR - Torre Norte; Av. Rovisco Pais, 1 1049-001 Lisboa; PORTUGAL {jasv,ngracias,sjoerd}@isr.ist.utl.pt ...
IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 2005
We propose a general architecture for action (mimicking) and program (gesture) level visual imita... more We propose a general architecture for action (mimicking) and program (gesture) level visual imitation. Action-level imitation involves two modules. The viewpoint Transformation (VPT) performs a "rotation" to align the demonstrator's body to that of the learner. The Visuo-Motor Map (VMM) maps this visual information to motor data.

IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 2000
In this paper, we present a strategy whereby a robot acquires the capability to learn by imitatio... more In this paper, we present a strategy whereby a robot acquires the capability to learn by imitation following a developmental pathway consisting on three levels: 1) sensory-motor coordination; 2) world interaction; and 3) imitation. With these stages, the system is able to learn tasks by imitating human demonstrators. We describe results of the different developmental stages, involving perceptual and motor skills, implemented in our humanoid robot, Baltazar. At each stage, the system's attention is drawn toward different entities: its own body and, later on, objects and people. Our main contributions are the general architecture and the implementation of all the necessary modules until imitation capabilities are eventually acquired by the robot. Also, several other contributions are made at each level: learning of sensory-motor maps for redundant robots, a novel method for learning how to grasp objects, and a framework for learning task description from observation for program-level imitation. Finally, vision is used extensively as the sole sensing modality (sometimes in a simplified setting) avoiding the need for special data-acquisition hardware.
We present an approach for detecting eyes in face images. This system is meant to provide input t... more We present an approach for detecting eyes in face images. This system is meant to provide input to a face detection system based on the combination of simpler facial features. The eyes are modeled as a feature vector collecting the response o Gabor filters at various orientations and scales. After learning the model from eye images, we search for new instances of the model, by evaluating the Gabor filter responses in image positions distributed according to a log-cartesian or log-polar sampling grids. The space variant resolution of these grids provides a way of locating the "focus of attention" in the center of each grid.
Robust visual tracking by an active observer
Proceedings of the 1997 IEEE/RSJ International Conference on Intelligent Robot and Systems. Innovative Robotics for Real-World Applications. IROS '97, 1997
In this paper we address the problem of tracking a moving target by a monocular observer. The abi... more In this paper we address the problem of tracking a moving target by a monocular observer. The ability to track a moving object has many applications in robotics, teleoperation, surveillance systems, human-machine interfaces, etc. Our goal was the development of a robust tracking system for practical (industrial) applications and therefore based on inexpensive hardware. The strategy we present is based
A purposive strategy for visual-based navigation of a mobile robot
1998 Midwest Symposium on Circuits and Systems, Proceedings, 1999
We address the problem of visual-based navigation of a mobile robot along corridors in indoors en... more We address the problem of visual-based navigation of a mobile robot along corridors in indoors environments. The corridor lines are detected, and their vanishing point calculated. Its deviation from the central column of the image, the corridor lines slopes and the standard devia- ...
Uploads
Papers by Jose Santos-Victor