Spectral complexity of deep neural networks
2024, arXiv (Cornell University)
https://doi.org/10.48550/ARXIV.2405.09541Abstract
It is well-known that randomly initialized, push-forward, fully-connected neural networks weakly converge to isotropic Gaussian processes, in the limit where the width of all layers goes to infinity. In this paper, we propose to use the angular power spectrum of the limiting fields to characterize the complexity of the network architecture. In particular, we define sequences of random variables associated with the angular power spectrum, and provide a full characterization of the network complexity in terms of the asymptotic distribution of these sequences as the depth diverges. On this basis, we classify neural networks as low-disorder, sparse, or highdisorder; we show how this classification highlights a number of distinct features for standard activation functions, and in particular, sparsity properties of ReLU networks. Our theoretical results are also validated by numerical simulations.
References (58)
- Robert J. Adler and Jonathan E. Taylor. Random Fields and Geometry. Springer, 2007.
- N. Aronszajn. Theory of Reproducing Kernels. Transactions of the American Mathematical Society, 68(3):337-404, 1950.
- Kendall Atkinson and Weimin Han. Spherical harmonics and approximations on the unit sphere: an introduction, volume 2044. Springer Science & Business Media, 2012.
- Jean-Marc Azais and Mario Wschebor. Level Sets and Extrema of Random Processes and Fields. Wiley, 2009.
- Shayan Aziznejad, Joaquim Campos, and Michael Unser. Measuring Complexity of Learning Schemes Using Hessian-Schatten Total Variation. SIAM Journal on Mathematics of Data Science, 5(2):422-445, 2023.
- Krishnakumar Balasubramanian, Larry Goldstein, Nathan Ross, and Adil Salim. Gaussian ran- dom field approximation via Stein's method with applications to wide random neural networks. Appl. Comput. Harmon. Anal., 72:Paper No. 101668, 27, 2024.
- Peter. Bartlett, Dylan Foster, and Matus Telgarsky. Spectrally-normalized margin bounds for neural networks. Advances in Neural Information Processing Systems (NeurIPS), 30, 2017.
- Peter Bartlett, Nick Harvey, Christopher Liaw, and Abbas Mehrabian. Nearly-tight VC- dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks. Journal of Machine Learning Research, 20(63):1-17, 2019.
- Peter Bartlett, Vitaly Maiorov, and Ron Meir. Almost Linear VC Dimension Bounds for Piece- wise Polynomial Networks. Advances in Neural Information Processing Systems (NeurIPS), 11, 1998.
- Nils Bertschinger, Thomas Natschläger, and Robert Legenstein. At the Edge of Chaos: Real- time Computations and Self-Organized Criticality in Recurrent Neural Networks. Advances in Neural Information Processing Systems (NeurIPS), 17, 2004.
- Monica Bianchini and Franco Scarselli. On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures. IEEE Transactions on Neural Networks and Learning Systems, 25(8):1553-1565, 2014.
- Alberto Bietti. Approximation and learning with deep convolutional models: a kernel perspec- tive. International Conference on Learning Representations (ICLR), 2022.
- Alberto Bietti and Francis Bach. Deep Equals Shallow for ReLU Networks in Kernel Regimes. International Conference on Learning Representations (ICLR), 2021.
- Bastian Bohn, Michael Griebel, and Christian Rieger. A Representer Theorem for Deep Kernel Learning. Journal of Machine Learning Research, 20(64):1-32, 2019.
- Francesco Cagnetta, Alessandro Favero, and Matthieu Wyart. What can be learnt with wide convolutional neural networks? In International Conference on Machine Learning, pages 3347- 3379. PMLR, 2023.
- Valentina Cammarota, Domenico Marinucci, Michele Salvi, and Stefano Vigogna. A quantita- tive functional central limit theorem for shallow neural networks. Modern Stochastics: Theory and Applications, 11(1):85-108, 2023.
- Youngmin Cho and Lawrence Saul. Kernel Methods for Deep Learning. Advances in Neural Information Processing Systems (NeurIPS), 22, 2009.
- Youngmin Cho and Lawrence K. Saul. Analysis and Extension of Arc-Cosine Kernels for Large Margin Classification. arXiv:1112.3712, 2011.
- George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4):303-314, 1989.
- Amit Daniely. Depth separation for neural networks. Proceedings of the 2017 Conference on Learning Theory (ICML), PMLR 65:690-696, 2017.
- Amit Daniely, Roy Frostig, and Yoram Singer. Toward Deeper Understanding of Neural Net- works: The Power of Initialization and a Dual View on Expressivity. Advances in Neural Infor- mation Processing Systems (NeurIPS), 29, 2016.
- I. Daubechies, R. DeVore, S. Foucart, B. Hanin, and G. Petrova. Nonlinear Approximation and (Deep) ReLU Networks. Constructive Approximation, 55(1):127-172, 2022.
- Alexander G. de G. Matthews, Jiri Hron, Mark Rowland, Richard E. Turner, and Zoubin Ghahra- mani. Gaussian Process Behaviour in Wide Deep Neural Networks. International Conference on Learning Representations (ICLR), 2018.
- Simmaco Di Lillo. PhD Thesis. in preparation.
- Ronen Eldan and Ohad Shamir. The power of depth for feedforward neural networks. 29th Annual Conference on Learning Theory (COLT), PMLR 49:907-940, 2016.
- Stefano Favaro, Boris Hanin, Domenico Marinucci, Ivan Nourdin, and Giovanni Peccati. Quan- titative CLTs in Deep Neural Networks. arXiv:2307.06092, 2023.
- Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 9:249-256, 2010.
- Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep Sparse Rectifier Neural Networks. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AIS- TATS), PMLR 15:315-323, 2011.
- Krzysztof M Gorski, Eric Hivon, Anthony J Banday, Benjamin D Wandelt, Frode K Hansen, Mstvos Reinecke, and Matthia Bartelmann. HEALPix: A Framework for High-Resolution Dis- cretization and Fast Analysis of Data Distributed on the Sphere. The Astrophysical Journal, 622(2):759, 2005.
- Alexis Goujon, Arian Etemadi, and Michael Unser. On the number of regions of piecewise linear neural networks. Journal of Computational and Applied Mathematics, 441:115667, 2024.
- Boris Hanin. Random neural networks in the infinite width limit as Gaussian processes. The Annals of Applied Probability, 33(6A):4798 -4819, 2023.
- Boris Hanin and David Rolnick. Complexity of linear regions in deep networks. Proceedings of the 36th International Conference on Machine Learning (ICML), PMLR 97:2596-2604, 2019.
- Boris Hanin and David Rolnick. Deep ReLU Networks Have Surprisingly Few Activation Pat- terns. Advances in Neural Information Processing Systems (NeurIPS), 32, 2019.
- Soufiane Hayou, Arnaud Doucet, and Judith Rousseau. On the Impact of the Activation function on Deep Neural Networks Training. Proceedings of the 36th International Conference on Machine Learning (ICML), PMLR 97:2672-2680, 2019.
- Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2):251-257, 1991.
- Wentao Huang, Houbao Lu, and Haizhang Zhang. Hierarchical Kernels in Deep Kernel Learn- ing. Journal of Machine Learning Research, 24(391):1-30, 2023.
- Arthur Jacot, Franck Gabriel, and Clement Hongler. Neural Tangent Kernel: Convergence and Generalization in Neural Networks. Advances in Neural Information Processing Systems (NeurIPS), 31, 2018.
- Annika Lang and Christoph Schwab. Isotropic Gaussian random fields on the sphere: Regular- ity, fast simulation and stochastic partial differential equations. The Annals of Applied Proba- bility, 25(6):3047-3094, 2015.
- Jaehoon Lee, Jascha Sohl-Dickstein, Jeffrey Pennington, Roman Novak, Sam Schoenholz, and Yasaman Bahri. Deep Neural Networks as Gaussian Processes. International Conference on Learning Representations (ICLR), 2018.
- Moshe Leshno, Vladimir Ya. Lin, Allan Pinkus, and Shimon Schocken. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks, 6(6):861-867, 1993.
- Domenico Marinucci and Giovanni Peccati. Random Fields on the Sphere: Representation, Limit Theorems and Cosmological Applications. Cambridge University Press, 2011.
- Hrushikesh N Mhaskar and Tomaso Poggio. Deep vs. shallow networks: An approximation theory perspective. Analysis and Applications, 14(06):829-848, 2016.
- Guido F Montufar, Razvan Pascanu, Kyunghyun Cho, and Yoshua Bengio. On the Number of Linear Regions of Deep Neural Networks. Advances in Neural Information Processing Systems (NeurIPS), 27, 2014.
- Radford M. Neal. Bayesian Learning for Neural Networks. Springer New York, 1996.
- Suzanna Parkinson, Greg Ongie, Rebecca Willett, Ohad Shamir, and Nathan Srebro. Depth Separation in Norm-Bounded Infinite-Width Neural Networks. arXiv:2402.08808, 2024.
- Allan Pinkus. Approximation theory of the MLP model in neural networks. Acta Numerica, 8:143-195, 1999.
- Ben Poole, Subhaneil Lahiri, Maithra Raghu, Jascha Sohl-Dickstein, and Surya Ganguli. Ex- ponential expressivity in deep neural networks through transient chaos. Advances in Neural Information Processing Systems (NeurIPS), 29, 2016.
- Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learn- ing. The MIT Press, 2005.
- Itay Safran and Ohad Shamir. Depth-width tradeoffs in approximating natural functions with neural networks. Proceedings of the 34th International Conference on Machine Learning (ICML), PMLR 70:2979-2987, 2017.
- Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with ReLU activation function. Ann. Statist., 48(4):1875-1897, 2020.
- Samuel S. Schoenholz, Justin Gilmer, Surya Ganguli, and Jascha Sohl-Dickstein. Deep Informa- tion Propagation. 5th International Conference on Learning Representations (ICLR), 2017.
- Shizhao Sun, Wei Chen, Liwei Wang, Xiaoguang Liu, and Tie-Yan Liu. On the Depth of Deep Neural Networks: A Theoretical View. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 1(30):2066-2072, 2016.
- G. Szegő. Orthogonal Polynomials. American Mathematical Society, 1975.
- Matus Telgarsky. Benefits of depth in neural networks. 29th Annual Conference on Learning Theory (COLT), PMLR 49:1517-1539, 2016.
- Luca Venturi, Samy Jelassi, Tristan Ozuch, and Joan Bruna. Depth separation beyond radial functions. Journal of Machine Learning Research, 23(122):1-56, 2022.
- Christopher Williams. Computing with Infinite Networks. Advances in Neural Information Processing Systems (NeurIPS), 9, 1996.
- Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, and Eric P. Xing. Deep Kernel Learning. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 51:370-378, 2016.
- Lechao Xiao. Eigenspace restructuring: a principle of space and frequency in neural networks. In Conference on Learning Theory, pages 4888-4944. PMLR, 2022.