Academia.eduAcademia.edu

Outline

Deep Convolutional Networks are Hierarchical Kernel Machines

Abstract

In i-theory a typical layer of a hierarchical architecture consists of HW modules pooling the dot products of the inputs to the layer with the transformations of a few templates under a group. Such layers include as special cases the convolutional layers of Deep Convolutional Networks (DCNs) as well as the non-convolutional layers (when the group contains only the identity). Rectifying nonlinearities -- which are used by present-day DCNs -- are one of the several nonlinearities admitted by i-theory for the HW module. We discuss here the equivalence between group averages of linear combinations of rectifying nonlinearities and an associated kernel. This property implies that present-day DCNs can be exactly equivalent to a hierarchy of kernel machines with pooling and non-pooling layers. Finally, we describe a conjecture for theoretically understanding hierarchies of such modules. A main consequence of the conjecture is that hierarchies of trained HW modules minimize memory requiremen...

References (34)

  1. K. Fukushima, "Neocognitron: A self-organizing neural network model for a mecha- nism of pattern recognition unaffected by shift in position," Biological Cybernetics, vol. 36, pp. 193-202, Apr. 1980.
  2. M. Riesenhuber and T. Poggio, "Hierarchical models of object recognition," Nature Neuro- science, vol. 3,11, 2000.
  3. T. Serre, A. Oliva, and T. Poggio, "A feedforward architecture accounts for rapid catego- rization," Proceedings of the National Academy of Sciences of the United States of America, vol. 104, no. 15, pp. 6424-6429, 2007.
  4. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, B. M., A. Berg, and F.-F. L., "Large scale visual recognition challenge," Ima- geNet, arXiv:1409.0575, 2014.
  5. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going Deeper with Convolutions," ArXiv e-prints, Sept. 2014.
  6. M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks.," CoRR, vol. abs/1311.2901, 2013.
  7. T. Poggio and F. Girosi, "A theory of networks for approximation and learning," Labora- tory, Massachusetts Institute of Technology, vol. A.I. memo n1140, 1989.
  8. F. Anselmi, J. Z. Leibo, L. Rosasco, J. Mutch, A. Tacchetti, and T. Poggio, "Unsuper- vised learning of invariant representations in hierarchical architectures," arXiv preprint arXiv:1311.4158, 2013.
  9. F. Anselmi, L. Rosasco, and T. Poggio, "On invariance and selectivity in representation learning," arXiv:1503.05938 and CBMM memo n 29, 2015.
  10. L. Rosasco and T. Poggio, "Convolutional layers build invariant and selective reproducing kernels," CBMM Memo, in preparation, 2015.
  11. D. Hubel and T. Wiesel, "Receptive fields, binocular interaction and functional architec- ture in the cat's visual cortex," The Journal of Physiology, vol. 160, no. 1, p. 106, 1962.
  12. L. Breiman, "Hinging hyperplanes for regression, classification, and function approxima- tion," Tech. Rep. 324, Department of Statistics University of California Berkeley, Califor- nia 94720, 1991.
  13. F. Girosi, M. Jones, and T. Poggio, "Regularization theory and neural networks architec- tures," Neural Computation, vol. 7, pp. 219-269, 1995.
  14. M. Kouh and T. Poggio, "A canonical neural circuit for cortical nonlinear operations," Neural computation, vol. 20, no. 6, pp. 1427-1451, 2008.
  15. N. Cohen and A. Shashua, "Simnets: A generalization of convolutional networks," CoRR, vol. abs/1410.0781, 2014.
  16. T. Poggio and F. Girosi, "Regularization algorithms for learning that are equivalent to multilayer networks," Science, vol. 247, pp. 978-982, 1990.
  17. J. Mihalik, "Hierarchical vector quantization. of images in transform domain.," ELEK- TROTECHN. CA5, 43, NO. 3. 92,94., 1992.
  18. F. Girosi and T. Poggio, "Representation properties of networks: Kolmogorov's theorem is irrelevant," Neural Computation, vol. 1, no. 4, pp. 465-469, 1989.
  19. F. Girosi and T. Poggio, "Networks and the best approximation property," vol. 63, pp. 169- 176, 1990.
  20. D. Yamins, H. Hong, C. Cadieu, E. Solomon, D. Seibert, and J. DiCarlo, "Performance- optimized hierarchical models predict neural responses in higher visual cortex," Proceed- ings of the National Academy of Sciences, 2014.
  21. C. A. Micchelli, Y. Xu, and H. Zhang, "Universal kernels.," Journal of Machine Learning Research, vol. 6, pp. 2651-2667, 2006.
  22. B. K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf, and G. R. G. Lanckriet, "Hilbert space embeddings and metrics on probability measures," Journal of Machine Learning Research, vol. 11, pp. 1517-1561, 2010.
  23. B. Haasdonk and H. Burkhardt, "Invariant kernel functions for pattern analysis and ma- chine learning," Mach. Learn., vol. 68, pp. 35-61, July 2007.
  24. Y. Cho and L. K. Saul, "Kernel methods for deep learning," in Advances in Neural In- formation Processing Systems 22 (Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta, eds.), pp. 342-350, Curran Associates, Inc., 2009.
  25. T. Poggio and F. Girosi, "Extensions of a theory of networks for approximation and learn- ing: dimensionality reduction and clustering," Laboratory, Massachusetts Institute of Tech- nology, vol. A.I. memo n 1167, 1994.
  26. D. S. Broomhead and D. Lowe, "Multivariable Functional Interpolation and Adaptive Net- works," Complex Systems 2, pp. 321-355, 1988.
  27. J. Moody and C. J. Darken, "Fast Learning in Networks of Locally-Tuned Processing Units," Neural Computation, vol. 1, no. 2, pp. 281-294, 1989.
  28. J. Demmel, "The geometry of ill-conditioning," J. Complexity, vol. 3, pp. 201-229, 1987.
  29. S. Mikhlin, The problem of the minimum of a quadratic functional. San Francisco, CA: Holden-Day, 1965.
  30. S. Kirkpatrick, C. Gelatt, and M. Vecchi, "Optimization by simulated annealing," Science, vol. 220, pp. 219-227, 1983.
  31. A. E. Albert, Regression and the Moore-Penrose pseudoinverse. Mathematics in science and engineering, New York, London: Academic Press, 1972.
  32. J. MacQueen, "Some methods of classification and analysis of multivariate observations," in Proc. 5th Berkeley Symposium on Math., Stat., and Prob. (L. LeCam and J. Neyman, eds.), p. 281, Berkeley, CA: U. California Press, 1967.
  33. T. Poggio and S. Edelman, "A network that learns to recognize 3D objects," Nature, vol. 343, pp. 263-266, 1990.
  34. S. Edelman and T. Poggio, "Bringing the grandmother back into the picture: a memory- based view of object recognition," a.i. memo 1181, mitai, 1990.