How Important is Weight Symmetry in Backpropagation?
2015, arXiv (Cornell University)
https://doi.org/10.48550/ARXIV.1510.05067Abstract
Gradient backpropagation (BP) requires symmetric feedforward and feedback connections-the same weights must be used for forward and backward passes. This "weight transport problem" (Grossberg 1987) is thought to be one of the main reasons to doubt BP's biologically plausibility. Using 15 different classification datasets, we systematically investigate to what extent BP really depends on weight symmetry. In a study that turned out to be surprisingly similar in spirit to Lillicrap et al.'s demonstration (Lillicrap et al. 2014) but orthogonal in its results, our experiments indicate that: (1) the magnitudes of feedback weights do not matter to performance (2) the signs of feedback weights do matter-the more concordant signs between feedforward and their corresponding feedback connections, the better (3) with feedback weights having random magnitudes and 100% concordant signs, we were able to achieve the same or even better performance than SGD. (4) some normalizations/stabilizations are indispensable for such asymmetric BP to work, namely Batch Normalization (BN) (Ioffe and Szegedy 2015) and/or a "Batch Manhattan" (BM) update rule.
References (23)
- Abdel-Hamid et al. 2012] Abdel-Hamid, O.; Mohamed, A.; Jiang, H.; and Penn, G. 2012. Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4277-4280.
- Bengio et al. 2015] Bengio, Y.; Lee, D.-H.; Bornschein, J.; and Lin, Z. 2015. Towards biologically plausible deep learn- ing. arXiv preprint arXiv:1502.04156. [Bengio 2014] Bengio, Y. 2014. How auto-encoders could provide credit assignment in deep networks via target prop- agation. arXiv preprint arXiv:1407.7906. [Chinta and Tweed 2012] Chinta, L. V., and Tweed, D. B. 2012. Adaptive optimal control without weight transport. Neural computation 24(6):1487-1518.
- Coates, Ng, and Lee 2011] Coates, A.; Ng, A. Y.; and Lee, H. 2011. An analysis of single-layer networks in unsuper- vised feature learning. In International conference on artifi- cial intelligence and statistics, 215-223.
- Crick 1989] Crick, F. 1989. The recent excitement about neural networks. Nature 337(6203):129-132.
- Fanello et al. 2013] Fanello, S. R.; Ciliberto, C.; Santoro, M.; Natale, L.; Metta, G.; Rosasco, L.; and Odone, F. 2013. icub world: Friendly robots help building good vi- sion data-sets. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on, 700-705. IEEE.
- Fei-Fei, Fergus, and Perona 2007] Fei-Fei, L.; Fergus, R.; and Perona, P. 2007. Learning generative visual models from few training examples: An incremental bayesian ap- proach tested on 101 object categories. Computer Vision and Image Understanding 106(1):59-70. [Garofolo et al. ] Garofolo, J.; Lamel, L.; Fisher, W.; Fiscus, J.; Pallett, D.; Dahlgren, N.; and Zue, V. Timit acoustic- phonetic continuous speech corpus.
- Graves, Wayne, and Danihelka 2014] Graves, A.; Wayne, G.; and Danihelka, I. 2014. Neural turing machines. arXiv preprint arXiv:1410.5401.
- Griffin, Holub, and Perona 2007] Griffin, G.; Holub, A.; and Perona, P. 2007. Caltech-256 object category dataset. [Grossberg 1987] Grossberg, S. 1987. Competitive learning: From interactive activation to adaptive resonance. Cognitive science 11(1):23-63.
- Hinton and McClelland 1988] Hinton, G. E., and McClel- land, J. L. 1988. Learning representations by recirculation. In Neural information processing systems, 358-366. New York: American Institute of Physics. [Hinton and Salakhutdinov 2006] Hinton, G. E., and Salakhutdinov, R. R. 2006. Reducing the dimensionality of data with neural networks. Science 313(5786):504-507.
- Hinton et al. 2012] Hinton, G.; Deng, L.; Yu, D.; Dahl, G. E.; Mohamed, A.-r.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T. N.; et al. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE 29(6):82-97.
- Hornik, Stinchcombe, and White 1989] Hornik, K.; Stinch- combe, M.; and White, H. 1989. Multilayer feedforward networks are universal approximators. Neural networks 2(5):359-366.
- Huang et al. 2008] Huang, G. B.; Mattar, M.; Berg, T.; and Learned-Miller, E. 2008. Labeled faces in the wild: A database for studying face recognition in unconstrained en- vironments. In Workshop on faces in real-life images: De- tection, alignment and recognition (ECCV). [Ioffe and Szegedy 2015] Ioffe, S., and Szegedy, C. 2015. Batch normalization: Accelerating deep network train- ing by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.
- Krizhevsky, Sutskever, and Hinton 2012] Krizhevsky, A.; Sutskever, I.; and Hinton, G. 2012. ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems. [Krizhevsky 2009] Krizhevsky, A. 2009. Learning multiple layers of features from tiny images.
- Le Cun 1986] Le Cun, Y. 1986. Learning process in an asymmetric threshold network. In Disordered systems and biological organization. Springer. 233-240.
- LeCun, Cortes, and Burges ] LeCun, Y.; Cortes, C.; and Burges, C. J. The mnist database.
- Leibo, Liao, and Poggio 2014] Leibo, J. Z.; Liao, Q.; and Poggio, T. 2014. Subtasks of Unconstrained Face Recogni- tion. In International Joint Conference on Computer Vision, Imaging and Computer Graphics, VISIGRAPP. [Lillicrap et al. 2014] Lillicrap, T. P.; Cownden, D.; Tweed, D. B.; and Akerman, C. J. 2014. Random feedback weights support learning in deep neural networks. arXiv preprint arXiv:1411.0247.
- Mazzoni, Andersen, and Jordan 1991] Mazzoni, P.; Ander- sen, R. A.; and Jordan, M. I. 1991. A more biologically plausible learning rule for neural networks. Proceedings of the National Academy of Sciences 88(10):4433-4437.
- Mikolov et al. 2013] Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; and Dean, J. 2013. Distributed represen- tations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (NIPS), 3111-3119.
- Netzer et al. 2011] Netzer, Y.; Wang, T.; Coates, A.; Bis- sacco, A.; Wu, B.; and Ng, A. Y. 2011. Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learn- ing, volume 2011, 5. [Nilsback and Zisserman 2006] Nilsback, M.-E., and Zisser- man, A. 2006. A visual vocabulary for flower classifica- tion. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. IEEE. [Nilsback and Zisserman 2008] Nilsback, M.-E., and Zisser- man, A. 2008. Automated flower classification over a large number of classes. In Computer Vision, Graphics & Image Processing, 2008. ICVGIP'08. Sixth Indian Conference on. IEEE.
- O'Reilly 1996] O'Reilly, R. C. 1996. Biologically plausi- ble error-driven learning using local activation differences: The generalized recirculation algorithm. Neural computa- tion 8(5):895-938.
- Pinto et al. 2011] Pinto, N.; Stone, Z.; Zickler, T.; and Cox, D. 2011. Scaling up biologically-inspired computer vi- sion: A case study in unconstrained face recognition on face- book. In Computer Vision and Pattern Recognition Work- shops (CVPRW), 2011 IEEE Computer Society Conference on, 35-42. IEEE. [Quattoni and Torralba 2009] Quattoni, A., and Torralba, A. 2009. Recognizing indoor scenes. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 413-420. IEEE. [Riedmiller and Braun 1993] Riedmiller, M., and Braun, H. 1993. A direct adaptive method for faster backpropagation learning: The rprop algorithm. In Neural Networks, 1993., IEEE International Conference on, 586-591. IEEE.
- Rumelhart, Hinton, and Williams 1988] Rumelhart, D. E.; Hinton, G. E.; and Williams, R. J. 1988. Learning repre- sentations by back-propagating errors. Cognitive modeling. [Smolensky 1986] Smolensky, P. 1986. Information process- ing in dynamical systems: Foundations of harmony theory. [Stellwagen and Malenka 2006] Stellwagen, D., and Malenka, R. C. 2006. Synaptic scaling mediated by glial tnf-α. Nature 440(7087):1054-1059. [Taigman et al. 2014] Taigman, Y.; Yang, M.; Ranzato, M.; and Wolf, L. 2014. Deepface: Closing the gap to human- level performance in face verification. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, 1701-1708. IEEE. [Turrigiano and Nelson 2004] Turrigiano, G. G., and Nelson, S. B. 2004. Homeostatic plasticity in the developing nervous system. Nature Reviews Neuroscience. [Turrigiano 2008] Turrigiano, G. G. 2008. The self-tuning neuron: synaptic scaling of excitatory synapses. Cell 135(3):422-435.
- Vedaldi and Lenc 2015] Vedaldi, A., and Lenc, K. 2015. MatConvNet -Convolutional Neural Networks for MAT- LAB [Zamanidoost et al. 2015] Zamanidoost, E.; Bayat, F. M.; Strukov, D.; and Kataeva, I. 2015. Manhattan rule training for memristive crossbar circuit pattern classifiers. [Zamanidoost et al. 2015] Zamanidoost, E.; Bayat, F. M.; Strukov, D.; and Kataeva, I. 2015. Manhattan rule training for memristive crossbar circuit pattern classifiers.