CLASSIC: A cortex-inspired hardware accelerator
2019, Journal of Parallel and Distributed Computing
https://doi.org/10.1016/J.JPDC.2019.08.009Abstract
This work explores the feasibility of specialized hardware implementing the Cortical Learning Algorithm (CLA) in order to fully exploit its inherent advantages. This algorithm, which is inspired in the current understanding of the mammalian neo-cortex, is the basis of the Hierarchical Temporal Memory (HTM). In contrast to other machine learning (ML) approaches, the structure is not application dependent and relies on fully unsupervised continuous learning. We hypothesize that a hardware implementation will be able not only to extend the already practical uses of these ideas to broader scenarios but also to exploit the hardware-friendly CLA characteristics. The architecture proposed will enable an unfeasible scalability for software solutions and will fully capitalize on one of the many CLA advantages: very low computational requirements and optimal storage utilization. Compared to a state-of-the-art CLA software implementation it could be possible to improve by 4 orders of magnitude in performance and up to 8 orders of magnitude in energy efficiency. Embracing the problem's complex nature, we found that the most demanding issue, from a scalability standpoint, is the massive degree of connectivity required. We propose to use a packet-switched network to tackle this. The paper addresses the fundamental issues of such an approach, proposing solutions to achieve scalable solutions. We will analyze cost and performance when using well-known architecture techniques and tools. The results obtained suggest that even with CMOS technology, under constrained cost, it might be possible to implement a large-scale system. We found that the proposed solutions enable a saving of ~90% of the original communication costs running either synthetic or realistic workloads.
References (57)
- P. Abad, P. Prieto, L. G. Menezo, A. Colaso, V. Puente, and J.-Á. Gregorio, "TOPAZ: An Open- Source Interconnection Network Simulator for Chip Multiprocessors and Supercomputers," in 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip, 2012, pp. 99-106.
- P. Abad, V. Puente, and J.-A. Gregorio, "LIGERO: A light but efficient router conceived for cache- coherent chip multiprocessors," ACM Transactions on Architecture and Code Optimization, vol. 9, no. 4, pp. 1-21, Jan. 2013.
- S. Ahmad and J. Hawkins, "Properties of Sparse Distributed Representations and their Application to Hierarchical Temporal Memory," Mar. 2015.
- and G. Z. Amit Agarwal, Eldar Akchurin, Chris Basoglu, Guoguo Chen, Scott Cyphers, Jasha Droppo, Adam Eversole, Brian Guenter, Mark Hillebrand, Ryan Hoens, Xuedong Huang, Zhiheng Huang, Vladimir Ivanov, Alexey Kamenev, Philipp Kranen, Oleksii Kuchaiev, Wolfgang Man, "An Introduction to Computational Networks and the Computational Network Toolkit," 2016.
- C. T. Anderson, P. L. Sheets, T. Kiritani, and G. M. G. Shepherd, "Sublayer-specific microcircuits of corticospinal and corticostriatal neurons in motor cortex.," Nature neuroscience, vol. 13, no. 6, pp. 739-44, Jun. 2010.
- S. Bahrampour, N. Ramakrishnan, L. Schott, and M. Shah, "Comparative Study of Caffe, Neon, Theano, and Torch for Deep Learning," Nov. 2015.
- Y. Bengio, P. Simard, and P. Frasconi, "Learning long-term dependencies with gradient descent is difficult," IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157-166, Mar. 1994.
- S. Billaudelle and S. Ahmad, "Porting HTM Models to the Heidelberg Neuromorphic Computing Platform," pp. 1-10, 2015.
- B. H. Bloom, "Space/time trade-offs in hash coding with allowable errors," Communications of the ACM, vol. 13, no. 7, pp. 422-426, Jul. 1970.
- F. Briggs, "Organizing principles of cortical layer 6.," Frontiers in neural circuits, vol. 4, p. 3, Jan. 2010.
- R. M. Bruno and D. J. Simons, "Feedforward mechanisms of excitatory and inhibitory cortical receptive fields.," The Journal of neuroscience, vol. 22, no. 24, pp. 10966-10975, 2002.
- D. P. Buxhoeveden, "The minicolumn hypothesis in neuroscience," Brain, vol. 125, no. 5, pp. 935- 951, May 2002.
- A. S. Cassidy et al., "Real-Time Scalable Cortical Computing at 46 Giga-Synaptic OPS/Watt with ~100× Speedup in Time-to-Solution and ~100,000× Reduction in Energy-to-Solution," in SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, 2014, pp. 27-38.
- T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, "DianNao," in Proceedings of the 19th international conference on Architectural support for programming languages and operating systems -ASPLOS '14, 2014, pp. 269-284.
- S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, "cuDNN: Efficient Primitives for Deep Learning," Oct. 2014.
- F. Clascá, P. Rubio-Garrido, and D. Jabaudon, "Unveiling the diversity of thalamocortical neuron subtypes," European Journal of Neuroscience, vol. 35, no. 10, pp. 1524-1532, 2012.
- Y. Cui, C. Surpur, S. Ahmad, and J. Hawkins, "Continuous online sequence learning with an unsupervised neural network model," Dec. 2015.
- DeSieno, "Adding a conscience to competitive learning," in IEEE International Conference on Neural Networks, 1988, pp. 117-124 vol.1.
- D. Dewey, "Artificial General Intelligence," vol. 6830, pp. 309-314, 2011.
- J. Duato, S. Yalamanchili, and L. Ni, "Interconnection Networks: An Engineering Approach," Oct. 1997.
- D. George and J. Hawkins, "Towards a mathematical theory of cortical micro-circuits.," PLoS computational biology, vol. 5, no. 10, p. e1000532, Oct. 2009.
- C. Grienberger, X. Chen, and A. Konnerth, "Dendritic function in vivo," Trends in Neurosciences, vol. 38, no. 1, pp. 45-54, Nov. 2014.
- S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, "Deep Learning with Limited Numerical Precision," Feb. 2015.
- S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, "EIE: Efficient Inference Engine on Compressed Deep Neural Network," Feb. 2016.
- J. Hawkins and S. Ahmad, "Why Neurons Have Thousands of Synapses, a Theory of Sequence Memory in Neocortex," Frontiers in Neural Circuits, vol. 10, p. arXiv:1511.00083 [q-bio.NC], Mar. 2016.
- M. Henaff, A. Szlam, and Y. LeCun, "Orthogonal RNNs and Long-Memory Tasks," Feb. 2016.
- M. Horowitz, "1.1 Computing's energy problem (and what we can do about it)," in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014, pp. 10-14.
- E. M. Izhikevich and F. C. Hoppensteadt, "Bursts as a unit of neural information: selective communication via resonance -uncut version," Trends in Neurosciences, no. Box 1, pp. 1-13, 2003.
- N. E. Jerger, L. S. Peh, and M. Lipasti, "Virtual circuit tree multicasting: A case for on-chip hardware multicast support," in 35th International Symposium on Computer Architecture -ISCA'08, 2008, pp. 229-240.
- A. Lavin and S. Ahmad, "Evaluating Real-Time Anomaly Detection Algorithms --The Numenta Anomaly Benchmark," in 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), 2015, pp. 38-44.
- Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
- R. Lorente De Nó, "Studies on the structure of the cerebral cortex. II. Continuation of the study of the ammonic system.," Journal für Psychologie und Neurologie, vol. 46, pp. 113-117, 1934.
- and X. Z. Mart ín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghe-mawat, Ian Goodfellow, Andrew Harp, Geoffrey Irv-ing, Michael Isard, Yangqing Jia, Rafal Jozefow, "TensorFlow: Large- scale machine learning on heterogeneous systems," 2015.
- J. Mnatzaganian, E. Fokoué, and D. Kudithipudi, "A Mathematical Formalization of Hierarchical Temporal Memory Cortical Learning Algorithm's Spatial Pooler," no. August, pp. 1-11, Jan. 2016.
- V. Mountcastle, "An organizing principle for cerebral function: the unit model and the distributed system," in The Mindful Brain, 1978, pp. 7-50.
- V. B. Mountcastle, "The columnar organization of the neocortex," Brain, vol. 120. pp. 701-722, 1997.
- N. Muralimanohar, R. Balasubramonian, and N. Jouppi, "Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0," in 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), 2007, pp. 3-14.
- T. Nguyen, "Total Number of Synapses in the Adult Human Neocortex," Undergraduate Journal of Mathematical Modeling: One + Two, vol. 3, no. 1, pp. 1-13, 2013.
- B. A. Olshausen and D. J. Field, "Sparse coding of sensory inputs," Current Opinion in Neurobiology, vol. 14. pp. 481-487, 2004.
- B. A. Olshausen and D. J. Field, "Emergence of simple-cell receptive field properties by learning a sparse code for natural images," Nature, vol. 381, no. 6583, pp. 607-609, Jun. 1996.
- V. Puente, "Cortexim." [Online]. Available: https://github.com/cortexsim.
- V. Puente, J. A. Gregorio, F. Vallejo, and R. Beivide, "Immunet: a cheap and robust fault-tolerant packet routing mechanism," in Proceedings. 31st Annual International Symposium on Computer Architecture, 2004., 2004, pp. 198-209.
- V. Puente, C. Izu, R. Beivide, J. A. Gregorio, F. Vallejo, and J. M. Prellezo, "The Adaptive Bubble Router," Journal of Parallel and Distributed Computing, vol. 61, no. 9, pp. 1180-1208, 2001.
- G. J. Rinkus, "Sparsey: event recognition via deep hierarchical sparse distributed codes," Frontiers in Computational Neuroscience, vol. 8, Dec. 2014.
- A. Schüz and G. Palm, "Density of neurons and synapses in the cerebral cortex of the mouse.," The Journal of comparative neurology, vol. 286, no. 4, pp. 442-455, 1989.
- S. M. Sherman, "The function of metabotropic glutamate receptors in thalamus and cortex," Neuroscientist, vol. 20, no. 2, pp. 136-149, 2014.
- W. Sin, K. Haas, E. Ruthazer, and H. Cline, "Dendrite growth increased by visual activity requires NMDA receptor and Rho GTPases," Nature, vol. 2112, no. 1998, pp. 2108-2112, 2002.
- C. Sun, C. H. O. Chen, G. Kurian, L. Wei, J. Miller, A. Agarwal, L. S. Peh, and V. Stojanovic, "DSENT -A tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling," Proceedings of the 2012 6th IEEE/ACM International Symposium on Networks-on-Chip, NoCS 2012, pp. 201-210, 2012.
- I. Theophilou, N. N. Lathiotakis, M. A. L. Marques, and N. Helbig, "Generalized Pauli constraints in reduced density matrix functional theory," Mar. 2015.
- A. M. Thomson, "Neocortical layer 6, a review.," Frontiers in neuroanatomy, vol. 4, no. March, p. 13, Jan. 2010.
- D. C. VanEssen and H. a Drury, "Structural and functional analyses of human cerebral cortex using a surface-based atlas," Journal of Neuroscience, vol. 17, no. 18, pp. 7079-7102, 1997.
- M. Vélez-Fort, C. V. Rousseau, C. J. Niedworok, I. R. Wickersham, E. A. Rancz, A. P. Y. Brown, M. Strom, and T. W. Margrie, "The Stimulus Selectivity and Connectivity of Layer Six Principal Cells Reveals Cortical Microcircuits Underlying Visual Processing," Neuron, vol. 83, no. 6, pp. 1431-43, Aug. 2014.
- F. D. S. Webber, "Semantic Folding Theory And its Application in Semantic Fingerprinting," Nov. 2015.
- W. W. Wilcke, "IBM Cortical Learning Center (CLC)," in NICE III Workshop, 2015, vol. 32.
- J. T. Wixted, L. R. Squire, Y. Jang, M. H. Papesh, S. D. Goldinger, J. R. Kuhn, K. A. Smith, D. M. Treiman, and P. N. Steinmetz, "Sparse and distributed coding of episodic memory in neurons of the human hippocampus.," Proceedings of the National Academy of Sciences of the United States of America, vol. 111, no. 26, pp. 9621-6, Jul. 2014.
- "NuPIC project: Numenta Platform for Intelligent Computing." [Online]. Available: numenta.org.
- "AlphaGo Wins Final Game In Match Against Champion Go Playe," IEEE Spectrum, 2016. [Online]. Available: http://spectrum.ieee.org/tech-talk/computing/networks/alphago-wins-match-against-top- go-player.