Deep Convolutional Networks are Hierarchical Kernel Machines
Abstract
In i-theory a typical layer of a hierarchical architecture consists of HW modules pooling the dot products of the inputs to the layer with the transformations of a few templates under a group. Such layers include as special cases the convolutional layers of Deep Convolutional Networks (DCNs) as well as the non-convolutional layers (when the group contains only the identity). Rectifying nonlinearities -- which are used by present-day DCNs -- are one of the several nonlinearities admitted by i-theory for the HW module. We discuss here the equivalence between group averages of linear combinations of rectifying nonlinearities and an associated kernel. This property implies that present-day DCNs can be exactly equivalent to a hierarchy of kernel machines with pooling and non-pooling layers. Finally, we describe a conjecture for theoretically understanding hierarchies of such modules. A main consequence of the conjecture is that hierarchies of trained HW modules minimize memory requiremen...
References (34)
- K. Fukushima, "Neocognitron: A self-organizing neural network model for a mecha- nism of pattern recognition unaffected by shift in position," Biological Cybernetics, vol. 36, pp. 193-202, Apr. 1980.
- M. Riesenhuber and T. Poggio, "Hierarchical models of object recognition," Nature Neuro- science, vol. 3,11, 2000.
- T. Serre, A. Oliva, and T. Poggio, "A feedforward architecture accounts for rapid catego- rization," Proceedings of the National Academy of Sciences of the United States of America, vol. 104, no. 15, pp. 6424-6429, 2007.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, B. M., A. Berg, and F.-F. L., "Large scale visual recognition challenge," Ima- geNet, arXiv:1409.0575, 2014.
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going Deeper with Convolutions," ArXiv e-prints, Sept. 2014.
- M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks.," CoRR, vol. abs/1311.2901, 2013.
- T. Poggio and F. Girosi, "A theory of networks for approximation and learning," Labora- tory, Massachusetts Institute of Technology, vol. A.I. memo n1140, 1989.
- F. Anselmi, J. Z. Leibo, L. Rosasco, J. Mutch, A. Tacchetti, and T. Poggio, "Unsuper- vised learning of invariant representations in hierarchical architectures," arXiv preprint arXiv:1311.4158, 2013.
- F. Anselmi, L. Rosasco, and T. Poggio, "On invariance and selectivity in representation learning," arXiv:1503.05938 and CBMM memo n 29, 2015.
- L. Rosasco and T. Poggio, "Convolutional layers build invariant and selective reproducing kernels," CBMM Memo, in preparation, 2015.
- D. Hubel and T. Wiesel, "Receptive fields, binocular interaction and functional architec- ture in the cat's visual cortex," The Journal of Physiology, vol. 160, no. 1, p. 106, 1962.
- L. Breiman, "Hinging hyperplanes for regression, classification, and function approxima- tion," Tech. Rep. 324, Department of Statistics University of California Berkeley, Califor- nia 94720, 1991.
- F. Girosi, M. Jones, and T. Poggio, "Regularization theory and neural networks architec- tures," Neural Computation, vol. 7, pp. 219-269, 1995.
- M. Kouh and T. Poggio, "A canonical neural circuit for cortical nonlinear operations," Neural computation, vol. 20, no. 6, pp. 1427-1451, 2008.
- N. Cohen and A. Shashua, "Simnets: A generalization of convolutional networks," CoRR, vol. abs/1410.0781, 2014.
- T. Poggio and F. Girosi, "Regularization algorithms for learning that are equivalent to multilayer networks," Science, vol. 247, pp. 978-982, 1990.
- J. Mihalik, "Hierarchical vector quantization. of images in transform domain.," ELEK- TROTECHN. CA5, 43, NO. 3. 92,94., 1992.
- F. Girosi and T. Poggio, "Representation properties of networks: Kolmogorov's theorem is irrelevant," Neural Computation, vol. 1, no. 4, pp. 465-469, 1989.
- F. Girosi and T. Poggio, "Networks and the best approximation property," vol. 63, pp. 169- 176, 1990.
- D. Yamins, H. Hong, C. Cadieu, E. Solomon, D. Seibert, and J. DiCarlo, "Performance- optimized hierarchical models predict neural responses in higher visual cortex," Proceed- ings of the National Academy of Sciences, 2014.
- C. A. Micchelli, Y. Xu, and H. Zhang, "Universal kernels.," Journal of Machine Learning Research, vol. 6, pp. 2651-2667, 2006.
- B. K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf, and G. R. G. Lanckriet, "Hilbert space embeddings and metrics on probability measures," Journal of Machine Learning Research, vol. 11, pp. 1517-1561, 2010.
- B. Haasdonk and H. Burkhardt, "Invariant kernel functions for pattern analysis and ma- chine learning," Mach. Learn., vol. 68, pp. 35-61, July 2007.
- Y. Cho and L. K. Saul, "Kernel methods for deep learning," in Advances in Neural In- formation Processing Systems 22 (Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta, eds.), pp. 342-350, Curran Associates, Inc., 2009.
- T. Poggio and F. Girosi, "Extensions of a theory of networks for approximation and learn- ing: dimensionality reduction and clustering," Laboratory, Massachusetts Institute of Tech- nology, vol. A.I. memo n 1167, 1994.
- D. S. Broomhead and D. Lowe, "Multivariable Functional Interpolation and Adaptive Net- works," Complex Systems 2, pp. 321-355, 1988.
- J. Moody and C. J. Darken, "Fast Learning in Networks of Locally-Tuned Processing Units," Neural Computation, vol. 1, no. 2, pp. 281-294, 1989.
- J. Demmel, "The geometry of ill-conditioning," J. Complexity, vol. 3, pp. 201-229, 1987.
- S. Mikhlin, The problem of the minimum of a quadratic functional. San Francisco, CA: Holden-Day, 1965.
- S. Kirkpatrick, C. Gelatt, and M. Vecchi, "Optimization by simulated annealing," Science, vol. 220, pp. 219-227, 1983.
- A. E. Albert, Regression and the Moore-Penrose pseudoinverse. Mathematics in science and engineering, New York, London: Academic Press, 1972.
- J. MacQueen, "Some methods of classification and analysis of multivariate observations," in Proc. 5th Berkeley Symposium on Math., Stat., and Prob. (L. LeCam and J. Neyman, eds.), p. 281, Berkeley, CA: U. California Press, 1967.
- T. Poggio and S. Edelman, "A network that learns to recognize 3D objects," Nature, vol. 343, pp. 263-266, 1990.
- S. Edelman and T. Poggio, "Bringing the grandmother back into the picture: a memory- based view of object recognition," a.i. memo 1181, mitai, 1990.