Academia.eduAcademia.edu

Outline

Symmetry of backpropagation and chain rule

2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290)

https://doi.org/10.1109/IJCNN.2002.1005528

Abstract

Gradient backpropagation, understood as a method of computing derivatives of composite functions, is commonly understood as a version of the chain rule. We show that this is not true, and both methods are in a sense opposite. While for the chain rule one needs derivatives with respect to all variables that influence a given intermediate variable, the backpropagation calls for derivatives of all variables that are influenced by the present variable. Knowing this, the derivation of the gradient for even complicated neural networks is almost trivial. In a matrix form, both methods differ in the order of matrix multiplication. Use of the chain rule is almost automatic since we all know it from math analysis education; use of the backpropagation could be as automatic if introduced in university math education as an equivalent alternative version of derivative calculation for composite functions.

References (4)

  1. P.J. Werbos, Beyond Regression: New Tools For Prediction and Analysis in the Behavioral Sciences, Ph.D. Thesis, Harvard Uni- versity, Cambridge, MA, 1974.
  2. P. Werbos, "Backpropagation: Past and future," IEEE Int. Con- ference on Neural Networks, San Diego, California, July 1988, vol. I, pp. 343-353, 1988
  3. P. Werbos, The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting, Wiley 1994
  4. D.E. Rummelhart, G.E. Hinton, and R.J. Williams "Learning internal representation by errror propagation," in Parallel Dis- tributed Processing: Exploration in the Microstructure of Cog- nition, D.E. Rummelhart and J.L. McClelland, Eds., vol. 1, Chap. 8, Cambridge, MA, MIT Press, 1986