An Optimized Deep Spiking Neural Network Architecture Without GradientsIEEE Access
We present an end-to-end trainable modular event-driven neural architecture that uses local synaptic and threshold adaptation rules to perform transformations between arbitrary spatio-temporal spike patterns. The architecture represents a highly abstracted model of existing Spiking Neural Network (SNN) architectures. The proposed Optimized Deep Event-driven Spiking neural network Architecture (ODESA) can simultaneously learn hierarchical spatio-temporal features at multiple arbitrary time scales. ODESA performs online learning without the use of error back-propagation or the calculation of gradients. Through the use of simple local adaptive selection thresholds at each node, the network rapidly learns to appropriately allocate its neuronal resources at each layer for any given problem without using an error measure. These adaptive selection thresholds are the central feature of ODESA, ensuring network stability and remarkable robustness to noise as well as to the selection of initial system parameters. Network activations are inherently sparse due to a hard Winner-Take-All (WTA) constraint at each layer. We evaluate the architecture on existing spatio-temporal datasets, including the spike-encoded IRIS, latency-coded MNIST, Oxford Spike pattern and TIDIGITS datasets, as well as a novel set of tasks based on International Morse Code that we created. These tests demonstrate the hierarchical spatio-temporal learning capabilities of ODESA. Through these tests, we demonstrate ODESA can optimally solve practical and highly challenging hierarchical spatio-temporal learning tasks with the minimum possible number of computing nodes. 17 18 ered in the brain and given our mature understanding of 47 the behaviour of biological neural networks, such evi-48 dence is unlikely to emerge. Second, successful error 49 back-propagation requires global, precise, and repeated prop-50 agation of error measures through all the involved compu-51 tational nodes in a network -beginning from the inputs 52 to the highest layers and back. This has been dubbed the 53 'weight transport problem', as the weights of higher layers 54 have to be made available to the lower layers for successful 55 backpropagation of error values [4]. Again, no evidence for 56 such processes has been, or is likely to be, found. The third 57 and most crucial pillar of error back-propagation is the dif-58 ferentiability requirement for all the constituent components 59 of a given network. Indeed, it is this very aspect of the 60 error back-propagation algorithm which makes it difficult to 61 use on non-differentiable data domains like spatio-temporal 62 spikes patterns which the brains use as the primary mode of 63 computation and communication. This leads to the biggest 64 problem in the use of error back-propagation in computa-65 tional neuroscience namely, the credit-assignment problem. 66 There have been multiple attempts at approximating error 67 back-propagation and applying gradient descent to SNN 68 architectures. SpikeProp [5] was among the first works to 69 derive a supervised learning rule for SNNs from the error 70 back-propagation algorithm. Tempotron [6] was introduced 71 in which the error back-propagation was applied by defining 72 loss functions based on the maximum voltage and the thresh-73 old voltage of the output neurons. Chronotron [7] was intro-74 duced as an improvement over Tempotron by using a new 75 distance metric between the predicted and target spike trains. 76 More recent works applied error back-propagation to SNN 77 architectures by using different surrogate gradients for the 78 hard thresholding activation functions of the spiking neurons 79 [8], [9], [10], [11]. However, they don't address how biology 80 can realize the computation of gradients and their access 81 to the neurons involved in the computation. Furthermore, 82 we don't have evidence on how batching of data happens 83 in biology, and most of the gradient descent approaches rely 84 on batching the data. Despite the lack of bio-plausibility, the 85 error back-propagation based approaches have been adopted 86 in computational neuroscience as useful alternative tools to 87 discover the required connectivity in SNNs for a given spe-88 cific task [12], [13], [14]. 89 Feedback alignment has been used as an alternative to error 90 back-propagation for SNNs in [15] to solve the 'weight trans-91 port' problem. Feedback alignment shows that multiplying 92 errors by random synaptic weights is enough for effective 93 error back-propagation without requiring a precise symmet-94 ric backward connectivity pattern. There have been parallel 95 investigations of more bio-plausible local learning rules for 96 SNNs which do not require access to the weights of other 97 neurons in the network. This set of learning rules for SNNs 98 can be characterized as synaptic plasticity rules which use 99 Spike-Timing-Dependent Plasticity (STDP) in some form. 100 STDP rules have more commonly been used for extracting 101 paper, we propose to use these adaptive selection thresholds 158 for multi-layer supervised learning and propose our method 159 as a simple solution to the credit assignment problem. Our 160 proposed method is the first to not require the transport of 161 weights across neurons in the network or random connections 162 for error propagation. We achieve all feedback to earlier 163 layers using precisely timed binary attention signals which 164 signal ''reward'' and ''punishment'' of the recently active 165 neurons. 166 II. BACKGROUND AND RELATED WORK 167 A. TIME SURFACES 168 Tapson et al. [47] proposed the use of exponentially decaying 169 kernels for processing event data produced by neuromorphic 170 vision sensors. We shall use the terminology introduced by 171 Lagorce et al. [45] and refer to these kernels as time surfaces. 172 The time surface is the trace of events per each input channel 173 which is updated only when an event arrives at a channel. An event from an event-based sensor can be described as: Equation ( ) describes an event from a pixel location 177 (x i , y i ) as the coordinates on the sensor, with polarity p i , 178 arriving at time t i . A similar notation can be used to represent 179 a spike as an event: 180 181 Equation (2) represents a spike s i from channel c i at time t i . We can use the time surfaces to keep the trace of spikes from 183 a spiking source. The exponential time surface S t [i] of a channel i at time t 185 with a time constant of τ can be calculated as follows: 3) attempting to demonstrate biological plausibility through 259 detailed phenomenological modelling from the voltage-gated 260 ion channels to the delays at the neuronal synapses tend to 261 be limited in their performance and utility in the context of 262 challenging machine learning tasks. The computational cost 263 of these models increases with the bio-plausibility of the 264 model. Different neuronal models have been proposed which 265 approximate and abstract the details of these complexities 266 with easy-to-handle mathematical and probabilistic models 267 [50], [51], [52], [53]. Leaky-Integrate and Fire (LIF) neu-268 ron model [49] and specifically the Spike Response Model 269 (SRM) [54] are among the most popular choices of neuron 270 models in the SNNs, even though the degree to which they 271 explain the neuronal dynamics is limited compared to other 272 models like Hodgkin-Huxley [55] or Izhkevich [52] mod-273 els. Their vast adoption can be attributed to their analytical 274 tractability and computational simplicity compared to other 275 neuronal models. But even the SNN models which use sim-276 pler neuronal models like LIF or Adaptive Leaky-Integrate 277 and Fire (ALIF) neurons [56] require additional complexities 278 such as Excitatory-Inhibitory Balance, and the right amount 279 of lateral excitation and inhibition to instil behaviours like 280 WTA. These complex processes make it difficult to scale 281 up the simulations of the multi-layered SNNs and limit the 282 exploration of broader system-level learning mechanisms of 283 the SNNs as there are a lot of variables in the system. In the 284 same way, that time surfaces represent simplified hardware 285 friendly abstractions of the EPSP, the FEAST network can be 286 best understood as a highly abstracted, functionally equiva-287 lent, modular implementation of a well-balanced excitatory 288 SNN with inhibitory feedback leading to a winner take all 289 operation at a single layer. In this way, a FEAST layer rep-290 resents a neuron group. Picking only one winner in each 291 layer of FEAST for any input event is a proxy for hard WTA 292 motif in a neuron group, without requiring any forms of 293 inhibition. Simpler and computationally easier abstract SNN 294 models like FEAST can help us explore more system-level 295 learning rules in SNNs without having to worry about prob-296 lems like achieving EI balance and promoting or removing 297 oscillations in the networks. Just like Address-Event Rep-298 resentations (AER) being used in Neuromorphic hardware 299 to facilitate the communication in SNNs, we can use novel 300 abstractions like FEAST to explore the space of local learning 301 rules in Spiking Neural Architectures. Continuing in this 302 approach and extending it, the Optimized Deep Event-driven 303 Spiking neural network Architecture (ODESA) introduced in 304 this paper, represents a method to locally train hierarchies of ''1'', the network also has to learn the position of the same 788 symbol in the sequence. We used a two-layered ODESA 789 network, one layer to learn the symbols (0 and 1), and the 790 second layer, which is the output layer, to learn the sequence 791 of the symbols. The output layer has two neurons for the 792 two classes. The ODESA network can learn an intermedi-793 ate representation of the symbol ''1'' without relying on an 794 explicit supervisory signal for symbol ''1'' due to the local 795...