Academia.eduAcademia.edu

Outline

Notes on Hierarchical Splines, DCLNs and i-theory

2015

Abstract

We define an extension of classical additive splines for multivariate function approximation that we call hierarchical splines. We show that the case of hierarchical, additive, piece-wise linear splines includes present-day Deep Convolutional Learning Networks (DCLNs) with linear rectifers and pooling (sum or max). We discuss how these observations together with i-theory may provide a framework for a general theory of deep networks. This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216.

References (28)

  1. F. Anselmi, J. Z. Leibo, L. Rosasco, J. Mutch, A. Tacchetti, and T. Poggio, "Unsupervised learning of invariant representations," Theoretical Computer Science, 2015.
  2. F. Anselmi, L. Rosasco, and T. Poggio, "On invariance and selectivity in representation learning," arXiv:1503.05938 and CBMM memo n 29, 2015.
  3. D. Hubel and T. Wiesel, "Receptive fields, binocular interaction and func- tional architecture in the cat's visual cortex," The Journal of Physiology, vol. 160, p. 106, 1962.
  4. M. Riesenhuber and T. Poggio, "Hierarchical models of object recognition," Nature Neuroscience, vol. 3,11, 2000.
  5. M. Kouh and T. Poggio, "A canonical neural circuit for cortical nonlinear operations," Neural computation, vol. 20, no. 6, pp. 1427-1451, 2008.
  6. L. Breiman, "Hinging hyperplanes for regression, classification, and func- tion approximation," Tech. Rep. 324, Department of Statistics University of California Berkeley, California 94720, 1991.
  7. T. Gaertner, KERNELS FOR STRUCTURED DATA. Worls Scientific, 2008.
  8. B. B. Moore and T. Poggio, "Representations properties of multilayer feed- forward networks," Abstracts of the First annual INNS meeting, vol. 320, p. 502, 1998.
  9. J. Schuermann, N. Bartneck, T. Bayer, J. Franke, E. Mandler, and M. Oberlaender, "Document analysis from pixels to contents," Proceed- ings of the IEEE, vol. 80, pp. 1101-1119, 1992.
  10. T. Poggio, "On optimal nonlinear associative recall," Biological Cybernet- ics, vol. 19, pp. 201-209, 1975.
  11. R. Livni, S. Shalev-Shwartz, and O. Shamir, "A provably efficient algorithm for training deep networks," CoRR, vol. abs/1304.7045, 2013.
  12. O. Delalleau and Y. Bengio, "Shallow vs. deep sum-product networks," in Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, Granada, Spain., pp. 666-674, 2011.
  13. F. Girosi, M. Jones, and T. Poggio, "Regularization theory and neural networks architectures," Neural Computation, vol. 7, pp. 219-269, 1995.
  14. N. Cohen, O. Sharir, and A. Shashua, "On the expressive power of deep learning: a tensor analysis," CoRR, vol. abs/1509.0500, 2015.
  15. H. W. and S. Kuhn, "A new scheme for the tensor representation," J. Fourier Anal. Appl., vol. 15, no. 5, pp. 706-722, 2009.
  16. A. Pinkus, "Approximation theory of the mlp model in neural networks," Acta Numerica, pp. 143-195, 1999.
  17. T. Poggio, "What if...," CBMM Views and Reviews, 2015.
  18. S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: From Theory to Algorithms. Cambridge eBooks, 2014.
  19. F. Girosi and T. Poggio, "Networks and the best approximation property," Biological Cybernetics, vol. 63, pp. 169-176, 1990.
  20. G. Cybenko, "Approximation by a superpositions of a sigmoidal function," Math. Control Signal System, vol. 2, pp. 303-314, 1989.
  21. Y. Cho and L. K. Saul, "Kernel methods for deep learning," in Advances in Neural Information Processing Systems 22 (Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta, eds.), pp. 342-350, Curran Asso- ciates, Inc., 2009.
  22. R. Montufar, G. F.and Pascanu, K. Cho, and Y. Bengio, "On the number of linear regions of deep neural networks," Advances in Neural Information Processing Systems, vol. 27, pp. 2924-2932, 2014.
  23. M. Anthony and P. Bartlett, Neural Network Learning -Theoretical Foun- dations. Cambridge University Press, 2002.
  24. A. Maurer, "A chain rule for the expected suprema of gaussian processes," in ALT, 2014.
  25. N. Cohen and A. Shashua, "Simnets: A generalization of convolutional networks," CoRR, vol. abs/1410.0781, 2014.
  26. L. Rosasco and T. Poggio, "Convolutional layers build invariant and selec- tive reproducing kernels," CBMM Memo, in preparation, 2015.
  27. B. K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf, and G. R. G. Lanckriet, "Hilbert space embeddings and metrics on probability mea- sures," Journal of Machine Learning Research, vol. 11, pp. 1517-1561, 2010.
  28. B. Haasdonk and H. Burkhardt, "Invariant kernel functions for pattern analysis and machine learning," Mach. Learn., vol. 68, pp. 35-61, July 2007.