On Measuring Excess Capacity in Neural Networks
2022
Abstract
We study the excess capacity of deep networks in the context of supervised classification. That is, given a capacity measure of the underlying hypothesis class – in our case, Rademacher complexity – how much can we (a-priori) constrain this class while maintaining an empirical error comparable to the unconstrained setting. To assess excess capacity in modern architectures, we first extend an existing generalization bound to accommodate function composition and addition, as well as the specific structure of convolutions. This then facilitates studying residual networks through the lens of the accompanying capacity measure. The key quantities driving this measure are the Lipschitz constants of the layers and the (2,1) group norm distance to the initializations of the convolution weights. We show that these quantities (1) can be kept surprisingly small and, (2) since excess capacity unexpectedly increases with task difficulty, this points towards an unnecessarily large capacity of unco...
References (36)
- Sanjeev Arora, Rong Ge, Behnam Neyshabur, and Yi Zhang. "Stronger generalization bounds for deep nets via a compression approach". In: ICML. 2018 (cit. on p. 1).
- Cenk Baykal, Lucas Liebenwein, Igor Gilitschenski, Dan Feldman, and Daniela Rus. "Data-Dependent Coresets for Compressing Neural Networks with Application to Generalization Bounds". In: ICLR. 2019 (cit. on p. 1).
- Heinz H. Bauschke and Jonathan M. Borwein. "On projection algorithms for solving convex feasibility problems". In: SIAM Review 36.3 (1996), pp. 367-426 (cit. on p. 8).
- Peter Bartlett, Steven N. Evans, and Philip M. Long. "Representing smooth functions as composi- tions of near-identity functions with implications for deep network optimization". In: arXiv preprint arxiv.org/abs/1804.05012 (2018) (cit. on p. 2).
- Peter Bartlett, Dylan J Foster, and Matus Telgarsky. "Spectrally-normalized margin bounds for neural networks". In: NeurIPS. 2017 (cit. on pp. 1-5, 15, 16, 18, 20, 26, 33).
- Peter L Bartlett and Shahar Mendelson. "Rademacher and Gaussian complexities: Risk bounds and structural results". In: J. Mach. Learn. Res. (JMLR) 3.Nov (2002), pp. 463-482 (cit. on p. 1).
- Yuan Cao and Quanquan Gu. "Generalization bounds of stochastic gradient descent for wide and deep neural networks". In: NeurIPS. 2019 (cit. on p. 1).
- Ching-Yao Chuang, Youssef Mroueh, Kristjan Greenwald, Antonio Torralba, and Stefanie Jegelka. "Measuring Generalization with Optimal Transport". In: NeurIPS. 2021 (cit. on p. 1).
- Terrance DeVries and Graham W. Taylor. "Improved Regularization of Convolutional Neural Networks with Cutout". In: arXiv preprint arxiv.org/abs/1708.04552 (2017) (cit. on p. 9).
- Richard M. Dudley. "The sizes of compact subsets of Hilbert space and continuity of Gaussian processes". In: J. Funct. Anal. 1.3 (1967), pp. 290-330 (cit. on p. 2).
- Henry Gouk, Timothy M. Hospedales, and Massimiliano Pontil. "Distance-based Regularization of Deep Networks for Fine-Tuning". In: ICLR. 2021 (cit. on pp. 2, 3, 5, 7, 8, 15, 16).
- Florian Graf, Christoph Hofer, Marc Niethammer, and Roland Kwitt. "Dissecting Supervised Contrastive Learning". In: ICML. 2021 (cit. on p. 9).
- Noah Golowich, Alexander Rakhlin, and Ohad Shamir. "Size-independent sample complexity of neural networks". In: COLT. 2018 (cit. on pp. 1-3, 5, 7, 15, 16).
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recogni- tion". In: CVPR. 2016 (cit. on pp. 8, 28, 32).
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Identity Mappings in Deep Residual Networks". In: ECCV. 2016 (cit. on p. 8).
- Fengxiang He, Tongliang Liu, and Dacheng Tao. "Why ResNet Works? Residuals Generalize". In: IEEE Trans Neural Netw. Learn. Syst. 31.12 (2020), pp. 5349-5362 (cit. on pp. 2, 6).
- Moritz Hardt and Tengyu Ma. "Identity Matters in Deep Learning". In: ICLR. 2017 (cit. on pp. 2, 9).
- Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. "Densely Connected Convolutional Networks". In: CVPR. 2017 (cit. on pp. 6, 27).
- Antoine Ledent, Waleed Mustafa, Yunwen Lei, and Marius Kloft. "Norm-based generalisation bounds for multi-class convolutional neural networks". In: AAAI. 2021 (cit. on pp. 1-3, 5, 15, 16).
- Hongzhou Lin and Stefanie Jegelka. "ResNet with one-neuron hidden layers is a Universal Approxima- tor". In: NeurIPS. 2018 (cit. on p. 2).
- Jun Liu, Shuiwang Ji, and Jieping Ye. "Multi-task feature learning via efficient l 2,1 -norm minimization". In: UAI. 2009 (cit. on p. 8).
- Philip Long and Hanie Sedghi. "Size-free generalization bounds for convolutional neural networks". In: ICLR. 2020 (cit. on pp. 2, 8, 17).
- Shan Lin and Jingwei Zhang. "Generalization Bounds for Convolutional Neural Networks". In: arXiv preprint arxiv.org/abs/1910.01487 (2019) (cit. on pp. 2, 3, 5, 15, 16).
- Andreas Maurer. "A vector-contraction inequality for Rademacher complexities". In: ALT. 2016 (cit. on pp. 4, 16).
- Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. "Spectral Normalization for Generative Adversarial Networks". In: ICLR. 2018 (cit. on p. 8).
- Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2018 (cit. on p. 4).
- Behnam Neyshabur, Srinadh Bhojanapalli, and Nathan Srebro. "A PAC-Bayesian Approach to Spectrally- Normalized Margin Bounds for Neural Networks". In: ICLR. 2018 (cit. on p. 1).
- Vaishnavh Nagarajan and Zico J. Kolter. "Generalization in deep networks: The role of distance from initialization". In: NeurIPS workshop on Deep Learning: Bridging Theory and Practice. 2017 (cit. on p. 1).
- Vaishnavh Nagarajan and J Zico Kolter. "Uniform convergence may be unable to explain generalization in deep learning". In: NeurIPS. 2019 (cit. on p. 1).
- Behnam Neyshabur, Ryota Tomioka, and Nathan Srebro. "Norm-based capacity control in neural networks". In: COLT. 2015 (cit. on pp. 1-3, 5, 7, 15, 16).
- Taiji Suzuki, Hiroshi Abe, and Tomoaki Nishimura. "Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network". In: ICLR. 2020 (cit. on p. 1).
- Hanie Sedghi, Vineet Gupta, and Philip M Long. "The singular values of convolutional layers". In: ICLR. 2019 (cit. on pp. 2, 8).
- Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, and Nati Srebro. "The implicit bias of gradient descent on separable data". In: ICLR. 2018 (cit. on p. 1).
- Chulhee Yun, Suvrit Sra, and Ali Jadbabaie. "Are deep ResNets provably better than linear predictors?" In: NeurIPS. 2019 (cit. on p. 2).
- Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. "Understanding deep learning requires rethinking generalization". In: ICLR. 2017 (cit. on pp. 1, 10).
- Tong Zhang. "Covering number bounds of certain regularized linear function classes". In: J. Mach. Learn. Res. (JMLR) 2 (Mar. 2002), pp. 527-550 (cit. on pp. 2, 5).