Data Symmetries and Learning in Fully Connected Neural Networks
IEEE Access
https://doi.org/10.1109/ACCESS.2023.3274938Abstract
Symmetries in the data and how they constrain the learned weights of modern deep networks is still an open problem. In this work we study the simple case of fully connected shallow non-linear neural networks and consider two types of symmetries: full dataset symmetries where the dataset X is mapped into itself by any transformation g, i.e. gX = X or single data point symmetries where gx = x, x ∈ X. We prove and experimentally confirm that symmetries in the data are directly inherited at the level of the network's learned weights and relate these findings with the common practice of data augmentation in modern machine learning. Finally, we show how symmetry constraints have a profound impact on the spectrum of the learned weights, an aspect of the so-called network implicit bias. INDEX TERMS Artificial neural networks, symmetry invariance, equivariance.
References (50)
- W. S. McCulloch and W. Pitts, ''A logical calculus of the ideas immanent in nervous activity,'' Bull. Math. Biophys., vol. 5, no. 4, pp. 115-133, Dec. 1943.
- K. Fukushima, ''Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,'' Biol. Cybern., vol. 36, no. 4, pp. 193-202, Apr. 1980.
- T. M. Caelli and Z.-Q. Liu, ''On the minimum number of templates required for shift, rotation and size invariant pattern recognition,'' Pattern Recognit., vol. 21, no. 3, pp. 205-216, Jan. 1988.
- R. Lenz, ''Group invariant pattern recognition,'' Pattern Recognit., vol. 23, nos. 1-2, pp. 199-217, Jan. 1990.
- P. Földiák, ''Learning invariance from transformation sequences,'' Neural Comput., vol. 3, no. 2, pp. 194-200, Jun. 1991.
- A. E. Grace and M. Spann, ''A comparison between Fourier-Mellin descriptors and moment based features for invariant object recogni- tion using neural networks,'' Pattern Recognit. Lett., vol. 12, no. 10, pp. 635-643, Oct. 1991.
- J. Flusser and T. Suk, ''Pattern recognition by affine moment invariants,'' Pattern Recognit., vol. 26, no. 1, pp. 167-174, Jan. 1993.
- B. A. Olshausen, C. H. Anderson, and D. C. Van Essen, ''A multi- scale dynamic routing circuit for forming size-and position-invariant object representations,'' J. Comput. Neurosci., vol. 2, no. 1, pp. 45-62, Mar. 1995.
- L. Van Gool, T. Moons, E. Pauwels, and A. Oosterlinck, ''Vision and Lie's approach to invariance,'' Image Vis. Comput., vol. 13, no. 4, pp. 259-277, May 1995.
- M. Michaelis and G. Sommer, ''A lie group approach to steerable filters,'' Pattern Recognit. Lett., vol. 16, no. 11, pp. 1165-1174, Nov. 1995.
- J. Wood, ''Invariant pattern recognition: A review,'' Pattern Recognit., vol. 29, no. 1, pp. 1-17, Jan. 1996.
- M. Riesenhuber and T. Poggio, ''Hierarchical models of object recog- nition in cortex,'' Nature Neurosci., vol. 2, no. 11, pp. 1019-1025, Nov. 1999.
- M. Lessmann and R. P. Würtz, ''Learning invariant object recognition from temporal correlation in a hierarchical network,'' Neural Netw., vol. 54, pp. 70-84, Jun. 2014.
- R. Gens and P. M. Domingos, ''Deep symmetry networks,'' in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2014, pp. 2537-2545.
- K. Lenc and A. Vedaldi, ''Understanding image representations by mea- suring their equivariance and equivalence,'' in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 991-999.
- Z. Shao and Y. Li, ''Integral invariants for space motion trajectory match- ing and recognition,'' Pattern Recognit., vol. 48, no. 8, pp. 2418-2432, Aug. 2015.
- F. Anselmi, L. Rosasco, and T. Poggio, ''On invariance and selectivity in representation learning,'' Inf. Inference, J. IMA, vol. 5, no. 2, pp. 134-158, Jun. 2016.
- T. S. Cohen and M. Welling, ''Group equivariant convolutional networks,'' in Proc. Int. Conf. Mach. Learn. (ICML), vol. 48, 2016, pp. 1230-1242.
- S. Taco Cohen and M. Welling, ''Steerable CNNs,'' in Proc. Int. Conf. Learn. Represent., 2017.
- A. Achille and S. Soatto, ''Emergence of invariance and disentanglement in deep representations,'' in Proc. Inf. Theory Appl. Workshop (ITA), Feb. 2018, pp. 1-9.
- T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, ''A simple framework for contrastive learning of visual representations,'' in Proc. 37th Int. Conf. Mach. Learn., 2020, pp. 1597-1607.
- S. Chen, E. Dobriban, and J. Lee, ''A group-theoretic framework for data augmentation,'' in Proc. Adv. Neural Inf. Process. Syst., vol. 33, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds. Red Hook, NY, USA: Curran Associates, 2020, pp. 21321-21333.
- M. M. Bronstein, J. Bruna, T. Cohen, and P. Velickovic, ''Geometric deep learning: Grids, groups, graphs, geodesics, and gauges,'' 2021, arXiv:2104.13478.
- N. Dehmamy, R. Walters, Y. Liu, D. Wang, and R. Yu, ''Automatic sym- metry discovery with lie algebra convolutional network,'' in Proc. Adv. Neural Inf. Process. Syst., vol. 34, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., 2021, pp. 2503-2515.
- C. Godfrey, D. Brown, T. Emerson, and H. Kvinge, ''On the symmetries of deep learning models and their internal representations,'' in Proc. Adv. Neural Inf. Process. Syst., vol. 35, 2022, pp. 11893-11905.
- S. Ravanbakhsh, J. Schneider, and B. Poczos, ''Equivariance through parameter-sharing,'' in Proc. 34th Int. Conf. Mach. Learn., vol. 70, 2017, pp. 2892-2901.
- M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, ''Deep sets,'' in Proc. Adv. Neural Inf. Process. Syst., vol. 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Red Hook, NY, USA: Curran Associates, 2017, pp. 3394-3404.
- M. Weiler and G. Cesa, ''General E(2)-equivariant steerable CNNs,'' in Proc. Adv. Neural Inf. Process. Syst., vol. 32, 2019, pp. 14357-14368.
- S. Soatto, ''Steps towards a theory of visual information: Active percep- tion, signal-to-symbol conversion and the interplay between sensing and control,'' 2011, arXiv:1110.2053.
- B. Haasdonk and H. Burkhardt, ''Invariant kernel functions for pattern analysis and machine learning,'' Mach. Learn., vol. 68, no. 1, pp. 35-61, May 2007.
- Y. S. Abu-Mostafa, ''Learning from hints in neural networks,'' J. Complex., vol. 6, no. 2, pp. 192-198, Jun. 1990.
- F. Anselmi, J. Z. Leibo, L. Rosasco, J. Mutch, A. Tacchetti, and T. Poggio, ''Unsupervised learning of invariant representations,'' Theor. Comput. Sci., vol. 633, pp. 112-121, Jun. 2016.
- A. Bietti, L. Venturi, and J. Bruna, ''On the sample complexity of learning under geometric stability,'' in Proc. Adv. Neural Inf. Process. Syst., vol. 34, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds. Red Hook, NY, USA: Curran Associates, 2021, pp. 18673-18684.
- J. Sokolic, R. Giryes, G. Sapiro, and M. R. D. Rodrigues, ''General- ization error of invariant classifiers,'' in Proc. AISTATS, vol. 54, 2017, pp. 253-265.
- S. Zhang, J. Wang, X. Tao, Y. Gong, and N. Zheng, ''Constructing deep sparse coding network for image classification,'' Pattern Recognit., vol. 64, pp. 130-140, Apr. 2017.
- T. S. Cohen, M. Geiger, J. Köhler, and M. Welling, ''Spheri- cal CNNs,'' in Proc. Int. Conf. Learn. Represent., vol. 48, 2018, pp. 2990-2999.
- S. Gunasekar, J. D. Lee, D. Soudry, and N. Srebro, ''Implicit bias of gradient descent on linear convolutional networks,'' in Proc. Adv. Neural Inf. Process. Syst., 2018, pp. 9461-9471.
- C. Yun, S. Krishnan, and H. Mobahi, ''A unifying view on implicit bias in training linear neural networks,'' 2020, arXiv:2010.02501.
- J. Sahs, A. Damaraju, R. Pyle, O. Tavaslioglu, J. O. Caro, H. Y. Lu, and A. Patel, ''A functional characterization of randomly initialized gradient descent in deep ReLU networks,'' Tech. Rep., 2020. [Online]. Available: https://openreview.net/forum?id=BJl9PRVKDS
- F. Williams, M. Trager, D. Panozzo, C. Silva, D. Zorin, and J. Bruna, ''Gradient dynamics of shallow univariate ReLU networks,'' in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 8376-8385.
- B. Woodworth, S. Gunasekar, J. D. Lee, E. Moroshko, P. Savarese, I. Golan, D. Soudry, and N. Srebro, ''Kernel and rich regimes in overparametrized models,'' 2020, arXiv:2002.09277.
- J. Sahs, R. Pyle, A. Damaraju, J. O. Caro, O. Tavaslioglu, A. Lu, and A. Patel, ''Shallow univariate ReLU networks as splines: Initialization, loss surface, hessian, & gradient flow dynamics,'' 2020, arXiv:2008.01772.
- Z. Li, R. Wang, D. Yu, S. S. Du, W. Hu, R. Salakhutdinov, and S. Arora, ''Enhanced convolutional neural tangent kernels,'' 2019, arXiv:1911.00809.
- S. Arora, S. S. Du, W. Hu, Z. Li, R. R. Salakhutdinov, and R. Wang, ''On exact computation with an infinitely wide neural net,'' in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 8139-8148.
- S. Lang, Algebra. Cham, Switzerland: Springer, 2012.
- G. Landi and A. Zampini, Linear Algebra and Analytic Geometry for Physical Sciences. Cham, Switzerland: Springer, 2018.
- F. Anselmi, A. Patel, and L. Rosasco, ''Neurally plausible mechanisms for learning selective and invariant representations,'' J. Math. Neurosci., vol. 10, no. 1, pp. 1-15, Dec. 2020.
- N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F. Hamprecht, Y. Bengio, and A. Courville, ''On the spectral bias of neural networks,'' in Proc. Int. Conf. Mach. Learn., 2019, pp. 5301-5310.
- W. Hoeffding, ''Probability inequalities for sums of bounded ran- dom variables,'' J. Amer. Stat. Assoc., vol. 58, no. 301, pp. 13-30, Mar. 1963.
- D. P. Kingma and J. Ba, ''Adam: A method for stochastic optimization,'' 2014, arXiv:1412.6980.