Academia.eduAcademia.edu

Outline

Toward true 3D object recognition

2004

Abstract

This paper addresses the problem of recognizing three-dimensional (3D) objects in photographs and image sequences. It revisits viewpoint invariants as a local representation of shape and appearance, and proposes a unified framework for object recognition where object models consist of a collection of small (planar) patches, their invariants, and a description of their 3D spatial relationship. This approach is applied to two fundamental instances of the 3D object recognition problem: (1) modeling rigid 3D objects from a small set of unregistered pictures and recognizing them in cluttered photographs taken from unconstrained viewpoints, and (2) recognizing non-uniform texture patterns despite appearance variations due to non-rigid transformations and changes in viewpoint. It is validated through several experiments, and extensions to the analysis of video sequences and the recognition of object categories are briefly discussed.

References (55)

  1. S. Agarwal and D. Roth. Learning a sparse representa- tion for object detection. In Proc. European Conf. Comp. Vision, volume LNCS 2353, pages 113-127, Copenhagen, Denmark, 2002.
  2. R. Basri and S. Ullman. The alignment of objects with smooth surfaces. In Proc. Int. Conf. Comp. Vision, pages 482-488, Tampa, FL, 1988.
  3. P.N. Belhumeur, J.P. Hesphanha, and D.J. Kriegman. Eigen- faces vs. Fisherfaces: recognition using class-specific linear projection. In Proc. European Conf. Comp. Vision, pages 45-58, 1996.
  4. S. Belongie, J. Malik, and J. Puzicha. Matching shapes. In Proc. Int. Conf. Comp. Vision, pages 454-461, 2001.
  5. C. Bishop. Neural Networks for Pattern Recognition. Ox- ford University Press, 1995.
  6. T.E. Boult and L.G. Brown. Factorization-based segmenta- tion of motions. In IEEE Workshop on Visual Motion, pages 179-186, 1991.
  7. P. Brodatz. Textures: A Photographic Album for Artists and Designers. Dover, New York, 1966.
  8. R.A. Brooks. Symbolic reasoning among 3-D models and 2- D images. Artificial Intelligence Journal, 17(1-3):285-348, 1981.
  9. J. B. Burns, R. S. Weiss, and E. M. Riseman. View variation of point-set and line-segment features. IEEE Trans. Patt. Anal. Mach. Intell., 15(1):51-68, January 1993.
  10. O. Carmichael and M. Hebert. Object recognition by a cas- cade of edge probes. In British Machine Vision Conf., 2002.
  11. J. Costeira and T. Kanade. A multi-body factorization method for motion analysis. In Proc. Int. Conf. Comp. Vi- sion, pages 1071-1076, Boston, MA, 1995.
  12. R.O. Duda, P.E. Hart, and D.G. Stork. Pattern Classifica- tion. Wiley-Interscience, 2001. Second edition.
  13. O. Faugeras, Q.-T. Luong, and T. Papadopoulo. The Geom- etry of Multiple Images. MIT Press, 2001.
  14. C.W. Gear. Multibody grouping in moving objects. Int. J. of Comp. Vision, 29(2):133-150, August/September 1998.
  15. C. Harris and M. Stephens. A combined edge and corner detector. In 4 th Alvey Vision Conference, pages 189-192, Manchester, UK, 1988.
  16. B. Heisele, T. Serre, M. Pontil, and T. Poggio. Component- based face detection. In Proceedings IEEE Conference Computer Vision and Pattern Recognition, pages 657-662, 2001.
  17. D.P. Huttenlocher and S. Ullman. Object recognition using alignment. In Proc. Int. Conf. Comp. Vision, pages 102-111, London, U.K., June 1987.
  18. A.E. Johnson and M. Hebert. Surface matching for object recognition in complex three-dimensional scenes. Image and Vision Computing, 16:635-651, 1998.
  19. L. Kaufman and P.J. Rousseeuw. Finding Groups in Data. An Introduction to Cluster Analysis. John Wiley and sons, New York, 1990.
  20. J.J. Koenderink and A.J. Van Doorn. The structure of locally orderless images. Int. J. of Comp. Vision, 31(2/3):159-168, 1999.
  21. S. Lazebnik, C. Schmid, and J. Ponce. Affine-invariant local descriptors and neighborhood statistics for texture recogni- tion. In Proc. Int. Conf. Comp. Vision, 2003.
  22. S. Lazebnik, C. Schmid, and J. Ponce. Sparse texture rep- resentations using affine-invariant neighborhoods. In Proc. IEEE Conf. Comp. Vision Patt. Recog., 2003.
  23. T. Lindeberg. Feature detection with automatic scale selec- tion. Int. J. of Comp. Vision, 30(2):79-116, 1998.
  24. T. Lindeberg and J. Gårding. Shape-adapted smoothing in estimation of 3-D shape cues from affine deformations of lo- cal 2-D brightness structure. Image and Vision Computing, 15(6):415-434, 1997.
  25. F. Liu and W. Picard. Periodicity, directionality, and ran- domness: World features for image modeling and retrieval. IEEE Trans. Patt. Anal. Mach. Intell., 18(7):722-733, 1996.
  26. D.G. Lowe. The viewpoint consistency constraint. Int. J. of Comp. Vision, 1(1):57-72, 1987.
  27. S. Mahamud, M. Hebert, and J. Lafferty. Combining simple discriminators for object discrimination. In Proc. European Conf. Comp. Vision, Copenhagen, Denmark, May 2002.
  28. K. Mikolajczyk and C. Schmid. An affine invariant inter- est point detector. In Proc. European Conf. Comp. Vision, volume I, pages 128-142, Copenhagen, Denmark, 2002.
  29. J.L. Mundy and A. Zisserman. Geometric Invariance in Computer Vision. MIT Press, Cambridge, Mass., 1992.
  30. J.L. Mundy, A. Zisserman, and D. Forsyth. Applications of Invariance in Computer Vision, volume 825 of Lecture Notes in Computer Science. Springer-Verlag, 1994.
  31. H. Murase and S. Nayar. Visual learning and recognition of 3D objects from appearance. Int. J. of Comp. Vision, 14(1):5-24, 1995.
  32. V.S. Nalwa. Line-drawing interpretation: bilateral symme- try. In Proc. DARPA Image Understanding Workshop, pages 956-967, Los Angeles, CA, February 1987.
  33. R. Nevatia and T.O. Binford. Description and recognition of complex curved objects. Artificial Intelligence Journal, 8:77-98, 1977.
  34. K Nigam, A. McCallum, S. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2-3):103-134, 2000.
  35. C.P. Papageorgiou, M. Oren, and T. Poggio. A general framework for object detection. In Proc. Int. Conf. Comp. Vision, pages 555-562, 1998.
  36. R. Picard, T. Kabir, and F. Liu. Real-time recognition with the entire Brodatz texture database. In Proc. IEEE Conf. Comp. Vision Patt. Recog., pages 638-639, New York City, NY, 1993.
  37. C.J. Poelman and T. Kanade. A paraperspective factoriza- tion method for shape and motion recovery. IEEE Trans. Patt. Anal. Mach. Intell., 19(3):206-218, March 1997.
  38. J. Ponce, D. Chelberg, and W. Mann. Invariant properties of straight homogeneous generalized cylinders and their con- tours. IEEE Trans. Patt. Anal. Mach. Intell., 11(9):951-966, September 1989.
  39. R. Ronfard, C. Schmid, and B. Triggs. Learning to parse pictures of people. In Proc. European Conf. Comp. Vision, volume IV, pages 700-714, Copenhagen, Denmark, 2002.
  40. A. Rosenfeld, R. Hummel, and S. Zucker. Scene labeling by relaxation operations. IEEE Trans. on Systems, Man, and Cybernetics, 6(6):420-433, 1976.
  41. F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. 3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints. In Proc. IEEE Conf. Comp. Vision Patt. Recog., 2003.
  42. H.A. Rowley, S. Baluja, and T. Kanade. Neural network- based face detection. IEEE Trans. Patt. Anal. Mach. Intell., 20(1):23-38, 1998.
  43. Y. Rubner, C. Tomasi, and L.J. Guibas. A metric for distri- butions with applications to image databases. In Proc. Int. Conf. Comp. Vision, 1998.
  44. F. Schaffalitzky and A. Zisserman. Viewpoint invariant tex- ture matching and wide baseline stereo. In Proc. Int. Conf. Comp. Vision, Vancouver, Canada, 2001.
  45. C. Schmid. Constructing models for content-based image retrieval. In CVPR, 2001.
  46. C. Schmid and R. Mohr. Local grayvalue invariants for image retrieval. IEEE Trans. Patt. Anal. Mach. Intell., 19(5):530-535, May 1997.
  47. H. Schneiderman and T. Kanade. A statistical method for 3D object detection applied to faces and cars. In Proc. IEEE Conf. Comp. Vision Patt. Recog., Hilton Head, SC, 2000.
  48. A. Sethi, D. Renaudie, D.J. Kriegman, and J. Ponce. Curve and surface duals and the recognition of curved 3D objects from their silhouette. Int. J. of Comp. Vision, 2003. In press.
  49. C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: a factorization method. Int. J. of Comp. Vision, 9(2):137-154, 1992.
  50. M. Turk and A.P. Pentland. Face recognition using eigen- faces. J. of Cognitive Neuroscience, 3(1), 1991.
  51. M. Varma and A. Zisserman. Classifying images of mate- rials: Achieving viewpoint and illumination independence. In Proc. European Conf. Comp. Vision, 2002.
  52. P. Viola and M. Jones. Robust real-time object detection. Technical Report CRL 01/01, Compaq Cambridge Research Laboratory, 2001.
  53. M. Weber, M. Welling, and P. Perona. Unsupervised learn- ing of models for recognition. In Proc. European Conf. Comp. Vision, Dublin, Ireland, 2000.
  54. K. Xu, B. Georgescu, D. Comaniciu, and P. Meer. Perfor- mance analysis in content-based retrieval with textures. In Proc. Int. Conf. Patt. Recog., 2000.
  55. M. Zerroug and R. Nevatia. From an intensity image to 3D segmented descriptions. In J. Ponce, A. Zisserman, and M. Hebert, editors, Object Representation in Computer Vi- sion II, number 1144 in Lecture Notes in Computer Sci- ences, pages 11-24. Springer-Verlag, 1996.