Academia.eduAcademia.edu

Outline

Camera-based gesture recognition for robot control

Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium

https://doi.org/10.1109/IJCNN.2000.860762

Abstract

Several systems for automatic gesture recognition have been developed using different strategies and approaches. In these systems the recognition engine is mainly based on three algorithms: dynamic pattern matching, statistical classification, and neural networks (NN). In that paper we present four architectures for gesture-based interaction between a human being and an autonomous mobile robot using the above mentioned techniques or a hybrid combination of them. Each of our gesture recognition architecture consists of a preprocessor and a decoder. The preprocessor, which is common to every system, receives an image as input and produces a continuous feature vector. The task of the decoder is to decode a sequence of these vectors into an estimate of the underlying movement. In the first three systems to determine that estimate, we formally consider the recognition problem as a statistical classification task. Three different hybrid stochastic/connectionist architectures are considered. In the first approach NNs are used for the classification of single feature vectors while Hidden Markov Models (HMM) for the modeling of sequences of them. In the second a Radial Basis Function (RBF) network is directly used to compute the HMM state observation probabilities. In the third system that probabilities is calculated by means of recurrent neural networks (RNN) in order to take into account the context information from the previously presented feature vectors. In the last system we face the recognition task as a template matching problem by making use of dynamic programming techniques. Here the strategy is to find the minimal distance between a continuous input feature sequence and the classes. Preliminary experiments with our baseline systems achieved a recognition accuracy up to 92%. All systems use input from a monocular color video camera, are user-independent but so far, they are not yet real-time.

References (12)

  1. L. Baum & T. Petrie (1966) Statistical Inference for Probabilistic Functions of Finite State Markov Chains. Ann. Math. Stat. 37, pp. 1554-1563.
  2. C. Bishop (1995) Neural Networks for Pattern Recognition. Clarendon Press, Oxford.
  3. H. A. Bourlard & N. Morgan (1994) Connectionist Speech Recognition -A Hybrid Approach. Kluwer Academic Publishers.
  4. A. Corradini, U. -D. Braumann, H. -J. Boehme & H. -M. Gross (1998) Contour-based Person Localization by 3d Neural Fields and Steerable Filters. Proceedings of MVA '98, IAPR Workshop on Machine Vision Applications, pp. 93-96.
  5. A. Corradini, H. -J. Boehme & H. -M. Gross (1999) Visual-based Posture Recognition using Hybrid Neural Networks. Proceedings of ESANN'99, pp. 81-86.
  6. L. R. Rabiner & B. H. Juang (1993) Fundamentals of Speech Recognition. Prentice-Hall Inc.
  7. L. R. Rabiner & B. H. Juang (1986) An Introduction to Hidden Markov Models. IEEE ASSP Magazine, pp. 4-16.
  8. J. S. Bridle (1990) Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition. Neurocomputing: Algorithms, Architectures and Applications, Springer Verlag.
  9. T. Darrell, S. Basu, C. Wren & A. Pentland (1997) Perceptually-driven Avatars and Interfaces: Active Methods for Direct Control.
  10. M.I.T Media Laboratory Perceptual Computing Section Technical Report No. 416.
  11. R. E. Kahn (1996) PERSEUS: An Extensible Vision System for Human-Machine Interaction. PhD-Thesis, University of Chicago, Department. of Computer Science.
  12. A. H. Waibel (1989) Modular Construction of Time-Delay Neural Networks for Speech Recognition. Neural Computation (1), pp. 39-46.