Backpropagation through time: what it does and how to do it

P.J. Werbos

doi:10.1109/5.58337

Outline

Backpropagation through time: what it does and how to do it

Paul Werbos

1990, Proceedings of the IEEE

https://doi.org/10.1109/5.58337

visibility

…

description

11 pages

link

1 file

Abstract

Backpropagation is now the most widely used tool in the field of artificial neural networks. At the core of backpropagation is a method for calculating derivatives exactly and efficiently in any large system made up of elementary subsystems or calculations which are represented by known, differentiable functions; thus, backpropagation has many applications which do not involve neural networks as such. This paper first reviews basic backpropagation, a simple method which is now being widely used in areas like pattern recognition and fault diagnosis. Next, it presents the basic equations for backpropagation through time, and discusses applications to areas like pattern recognition involving dynamic systems, systems identification, and control. Finally, it describes further extensions of this method, to deal with systems other than neural networks, systems involving simultaneous equations or true recurrent networks, and other practical issues which arise with this method. Pseudocode is provided to clarify the algorithms. The chain rule for ordered derivatives-the theorem which underlies backpropagation-is briefly discussed.

References (16)

P. Werbos, "Beyond regression: New tools for prediction and analysis in the behavioral sciences," Ph.D. dissertation, Com- mittee on Appl. Math., Haivard Univ., Cambridge, MA, Nov. 1974. t21 t31 [41 [SI [I 3 1 1560 D. Rumelhart, D. Hinton, and G. Williams, "Learning internal representations by error propagation," in D. Rumelhart and F. McClelland, eds., Parallel Distributed Processing, Vol. 7. Cambridge, M A M.I.T. Press, 1986.
D. B. Parker, "Learning-logic,"M.I.T. Cen. Computational Res. Economics Management Sci., Cambridge, MA, TR-47,1985. Y. Le Cun, "Une procedure d'apprentissage pour reseau a seuil assymetrique," in Proc. Cognitiva '85, Paris, France, June R. Watrous and L. Shastri, "Learning phonetic features using connectionist networks: an experiment in speech recogni- tion,'' in Proc. 7St I€€€ lnt. Conf. Neural Networks, June 1987. H. Sawai,A. Waibel, P. Haffner, M. Miyatake, and K. Shikano, "Parallelism, hierarchy, scaling in time-delay neural net- works for spotting Japanese phonemeslCV-syllables," in Proc. lEEE Int. joint Conf. Neural Networks, June 1989. D. Nguyen dnd B. Widrow, "the truck backer-upper: An example of self-learning in, neural networks," in W. T. Miller, R. Sutton, and P. Werbos, Eds., Neural Networks for Robotics a n d Control. Cambridge, MA: M.I.T. Press, 1990.
M. Jordan, "Generic constraints on underspecified target tra- jectories," in Proc. /€€€ lnt. joint Conf. Neural Networks, June 1989. M. Kawato, "Computational schemes and neural network models for formation and control of multijoint arm trajec- tory," in W. T. Miller, R. Sutton, and P. Werbos, Eds., Neural Networks for Robotics and Control. Cambridge, MA: M.I.T. Press, 1990.
R. Narendra, "Adaptive control using neural networks," in W. T. Millet, R. Sutton, and P. Werbos, Eds., NeuralNetworks for Robotics andControl. Cambridge, MA: M.I.T. Press, 1990. P. Werbos, "Maximizing long-term gas industry profits in two minutes in Lotus using neural network methods," /FE€ Trans. Syst., Man, Cyberh., Mar./Apr. 1989.
-, "Backpropagation: Past and future," in Proc. 2nd / FEE lnt. Conf. Neural Networks, June 1988. The transcript of the talk and slides; available from the author, are more intro- ductory in natureand morecomprehensive in some respects. I. Guyon, I. Poujaud, L. Personnaz, G. Dreyfus, J. Denker, and Y. Le Cun, "Comparing different neural network architec- tures for classifying handwritten digits," ih Proc. lEEElnt.joint Conf. Neural Networks, Jude 1989.
P. Werbos, "Applications of advances in nonlinear sensitivity analysis," in R. Drenick and F. Kozin, Eds., Systems Modeling and Optimization: Proc. 70th lFlP Conf. (1981). New York: Springer-Verlag, 1982.
-, "Generalization of backpropagation with application to a recurrent gas market model," Neural Networks, Oct. 1988. F. J. Pineda, "Generalization of backpropagation to recurrent and higher order networks," in Proc. l€EEConf. Neurallnform. Processing Syst., 1987. 1985, pp. 599-604.
L. B. Almeida, "A learning rule for asynchronous perceptrons with feedback in a combinatorial environment," in Proc. 7st F E E lnt. Conf. Neural Networks, 1987.
B. A. Pearlmutter, "Learning state space trajectories in recur- rent neural networks," in Proc. lnt. joint Conf. Neural Net- works, June 1989.
R. Williams, "Adaptive state representation and estimation using recurrent connectionist networks," in W. T. Miller, R. Sutton, and P. Werbos, Eds., Neural Networks for Robotics and Controol. Cambridge, M A M.I.T. Press, 1990.
P. Werbos, "Learning howtheworld works: Specificationsfor predictive networks in robots and brains," in Proc. 1987 /€E€ lnt. Conf. Syst., Man, Cybern., 1987.
-, "Consistency of HDP applied to a simple reinforcement learning problem," Neural Networks, Mar. 1990.
J. Dennis and R. Schnabel, Numerical Methods for Uncon- strained Optimization and Nonlinear Equations. Englewood Cliffs, NJ: Prentice-Hall, 1983.
D. Shanno, "Conjugate-gradient methods with inexact searches," Math. Oper. Res., vol. 3, Aug. 1978.
-, "Recent advances in numerical techniques for large-scale optimization," in W. T. Miller, R. Sutton, and P. Werbos, Eds., Neural Networks for Robotics a n d Control. Cambridge, MA: M.I.T. Press, 1990.
Paul J. Werbos received degrees from Har- vard University, Cambridge, MA, and the London School of Economics, London, England, that emphasized mathematical physics, international political economy, and economics. He developed backprop- agation for the Ph.D. degree in applied mathematics at Harvard. He iscurrently Program Director for Neu- roengineering and Emerging Technology Initiation at the National Science Founda- tion (NSF) and Secretary of the International Neural Network Soci- ety. While an Assistant Professor at the University of Maryland, he developed advanced adaptive critic designs for neurocontrol. Before joining the hlSF in 1989, he worked nine years at the Energy Information Administration (EIA) of DOE, where he variously held lead responsibility for evaluating long-range forecasts (under Charles Smith), and for building models of industrial, transpor- tation, and commercial demand, and natural gas supply using backpropagation as one among several methods. In previous years, he was Regional Director and Washington representative of the L-5 Society, a predecessor to the National Space Society, and an organizer of the Global Futures Roundtable. He has worked on occasion with the National Space Society, the Global Tomorrow Coalition, the Stanford Energy Modeling Forum, and Adelphi Friends Meeting. He also retains an active interest in fuel cells for transportation and in the foundations of physics.

In this work, we propose a Spiking Neural Network (SNN) consisting of input neurons sparsely connected by plastic synapses to a randomly interlinked liquid, referred to as Liquid-SNN, for unsupervised speech and image recognition. We adapt the strength of the synapses interconnecting the input and liquid using Spike Timing Dependent Plasticity (STDP), which enables the neurons to self-learn a general representation of unique classes of input patterns. The presented unsupervised learning methodology makes it possible to infer the class of a test input directly using the liquid neuronal spiking activity. This is in contrast to standard Liquid State Machines (LSMs) that have fixed synaptic connections between the input and liquid followed by a readout layer (trained in a supervised manner) to extract the liquid states and infer the class of the input patterns. Moreover, the utility of LSMs has primarily been demonstrated for speech recognition. We find that training such LSMs is challenging for complex pattern recognition tasks because of the information loss incurred by using fixed input to liquid synaptic connections. We show that our Liquid-SNN is capable of efficiently recognizing both speech and image patterns by learning the rich temporal information contained in the respective input patterns. However, the need to enlarge the liquid for improving the accuracy introduces scalability challenges and training inefficiencies. We propose SpiLinC that is composed of an ensemble of multiple liquids operating in parallel. We use a "divide and learn" strategy for SpiLinC, where each liquid is trained on a unique segment of the input patterns that causes the neurons to self-learn distinctive input features. SpiLinC effectively recognizes a test pattern by combining the spiking activity of the constituent liquids, each of which identifies characteristic input features. As a result, SpiLinC offers competitive classification accuracy compared to the Liquid-SNN with added sparsity in synaptic connectivity and faster training convergence, both of which lead to improved energy efficiency in neuromorphic hardware implementations. We validate the efficacy of the proposed Liquid-SNN and SpiLinC on the entire digit subset of the TI46 speech corpus and handwritten digits from the MNIST dataset.

Backpropagation through time: what it does and how to do it

Sign up for access to the world's latest research

Abstract

Related papers

References (16)

Related papers

Related topics

Cited by