A Review of the Optimal Design of Neural Networks Based on FPGA

Zhongqiang Luo

doi:10.3390/APP122110771

Outline

A Review of the Optimal Design of Neural Networks Based on FPGA

Zhongqiang Luo

Applied Sciences

https://doi.org/10.3390/APP122110771

visibility

…

description

45 pages

link

1 file

Abstract

Deep learning based on neural networks has been widely used in image recognition, speech recognition, natural language processing, automatic driving, and other fields and has made breakthrough progress. FPGA stands out in the field of accelerated deep learning with its advantages such as flexible architecture and logic units, high energy efficiency ratio, strong compatibility, and low delay. In order to track the latest research results of neural network optimization technology based on FPGA in time and to keep abreast of current research hotspots and application fields, the related technologies and research contents are reviewed. This paper introduces the development history and application fields of some representative neural networks and points out the importance of studying deep learning technology, as well as the reasons and advantages of using FPGA to accelerate deep learning. Several common neural network models are introduced. Moreover, this paper reviews the current mainstr...

References (165)

Subramanian, A.S.; Weng, C.; Watanabe, S.; Yu, M.; Yu, D. Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition. Comput. Speech Lang. 2022, 75, 101360. https://doi.org/10.1016/j.csl.2022.101360.
Kumar, L.A.; Renuka, D.K.; Rose, S.L.; Shunmuga-priya, M.C.; Wartana, I.M. Deep learning based assistive technology on audio visual speech recognition for hearing impaired. Int. J. Cogn. Comput. Eng. 2022, 3, 24-30. https://doi.org/10.1016/j.ijcce.2022.01.003.
Roßbach, J.; Kollmeier, B.; Meyer, B.T. A model of speech recognition for hearing-impaired listeners based on deep learning. J. Acoust. Soc. Am. 2022, 151, 1417-1427. https://doi.org/10.1121/10.0009411.
Garcia, G.R.; Michau, G.; Ducoffe, M.; Gupta, J.S.; Fink, O. Temporal signals to images: Monitoring the condition of industrial assets with deep learning image processing algorithms. Proc. Inst. Mech. Eng. Part O J. Risk Reliab. 2022, 236, 617-627. https://doi.org/10.1177/1748006X21994446.
Suganyadevi, S.; Seethalakshmi, V.; Balasamy, K. A review on deep learning in medical image analysis. Int. J. Multimed. Inf. Retr. 2022, 11, 19-38. https://doi.org/10.1007/s13735-021-00218-1.
Zuo, C.; Qian, J.; Feng, S.; Yin, W.; Li, Y.; Fan, P.; Han, J.; Qian, K.; Qian, C. Deep learning in optical metrology: A review. Light Sci. Appl. 2022, 11, 1-54. https://doi.org/10.1038/s41377-022-00714-x.
Lauriola, I.; Lavelli, A.; Aiolli, F. An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing 2022, 470, 443-456. https://doi.org/10.1016/j.neucom.2021.05.103.
Razumovskaia, E.; Glavaš, G.; Majewska, O.; Ponti, E.M.; Vulic, I. Natural Language Processing for Multilingual Task-Oriented Dialogue. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, Dublin, Ireland, 22-27 May 2022; pp. 44-50. https://doi.org/10.18653/v1/2022.acl-tutorials.8.
Li, B.; Hou, Y.; Che, W. Data Augmentation Approaches in Natural Language Processing: A Survey; AI Open: Beijing, China, 2022. https://doi.org/10.1016/j.aiopen.2022.03.001.
Hu, Y.; Liu, Y.; Liu, Z. A Survey on Convolutional Neural Network Accelerators: GPU, FPGA and ASIC. In Proceedings of the 2022 14th International Conference on Computer Research and Development (ICCRD), Shenzhen, China, 7-9 January 2022; pp. 100- 107. https://doi.org/10.1109/ICCRD54409.2022.9730377.
Mittal, S.; Umesh, S. A survey on hardware accelerators and optimization techniques for RNNs. J. Syst. Archit. 2021, 112, 101839. https://doi.org/10.1016/j.sysarc.2020.101839.
Shrivastava, N.; Hanif, M.A.; Mittal, S.; Sarangi, S.R.; Shafique, M. A survey of hardware architectures for generative adversarial networks. J. Syst. Archit. 2021, 118, 102227. https://doi.org/10.1016/j.sysarc.2021.102227.
Liu, T.; Zhu, J.; Zhang, Y. Review on FPGA-Based Accelerators in Deep Learning. J. Front. Comput. Sci. Technol. 2021, 15, 2093-2104. https://doi.org/10.3778/j.issn.1673-9418.2104012.
Jiao, L.; Sun, Q.; Yang, Y.; Feng, Y.; Li, X. Development, Implementation and Prospect of FPGA-Based Deep Neural Networks. Chin. J. Comput. 2022, 45, 441-471. https://doi.org/10.11897/SP.J.1016.2022.00441.
McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 5, 115-133. https://doi.org/10.1007/BF02478259.
Turing, A. Intelligent Machinery (1948);
B. Jack Copeland: Oxford, NY, USA, 2004; p. 395.
Hebb, D.O. The Organization of Behavior: A Neuropsychological Theory; Psychology Press: Mahwah, NJ, USA; London, UK, 2005. https://doi.org/10.1037/h0088061.
Rosenblatt, F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 1958, 65, 386. https://doi.org/10.1037/h0042519.
Minsky, M.; Papert, S.A. Perceptrons, Reissue of the 1988 Expanded Edition with a new foreword by Léon Bottou: An Introduction to Computational Geometry; MIT press: Cambridge, MA, USA, 2017.
Werbos, P.J. Backpropagation through time: What it does and how to do it. Proc. IEEE 1990, 78, 1550-1560. https://doi.org/10.1109/5.58337.
Fukushima, K.; Miyake, S. Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition. In Competition and Cooperation in Neural Nets; Springer: Berlin/Heidelberg, Germany, 1982; pp. 267-285. https://doi.org/10.1007/978- 3-642-46466-9_18.
Hopfield, J.J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 1982, 79, 2554-2558. https://doi.org/10.1073/pnas.79.8.2554.
Ackley, D.H.; Hinton, G.E.; Sejnowski, T.J. A learning algorithm for Boltzmann machines. Cogn. Sci. 1985, 9, 147-169. https://doi.org/10.1016/S0364-0213(85)80012-4.
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533-536. https://doi.org/10.1038/323533a0.
LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541-551. https://doi.org/10.1162/neco.1989.1.4.541.
Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504-507. https://doi.org/10.1126/science.1127647.
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84-90. https://doi.org/10.1145/3065386.
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D. Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and Pattern Recognition, Boston, MA, USA, 7-12 June 2015; pp. 1-9. https://doi.org/10.1109/cvpr.2015.7298594.
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. https://doi.org/10.48550/arXiv.1409.1556.
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139-144. https://doi.org/10.1145/3422622.
Sun, Y.; Wang, X.; Tang, X. Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 23-28 June 2014; pp. 1891-1898. https://doi.org/10.1109/CVPR.2014.244.
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 23-28 June 2014; pp. 580-587. https://doi.org/10.1109/CVPR.2014.81.
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27-30 June 2016; pp. 779-788. https://doi.org/10.1109/CVPR.2016.91.
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21-26 July 2017; pp. 7263-7271. https://doi.org/10.1109/CVPR.2017.690.
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. https://doi.org/10.48550/arXiv.1804.02767.
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. https://doi.org/10.48550/arXiv.2004.10934.
Li, C.; Li, L.; Jiang, H.; Wenig, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W. et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. https://doi.org/10.48550/arXiv.2209.02976.
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detec- tors. arXiv 2022, arXiv:2207.02696. https://doi.org/10.48550/arXiv.2207.02696.
Guo, W.; Xu, G.; Liu, B.; Wang, Y. Hyperspectral Image Classification Using CNN-Enhanced Multi-Level Haar Wavelet Features Fusion Network. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1-5. https://doi.org/10.1109/LGRS.2022.3167535.
Chakraborty, S.; Paul, S.; Hasan, K.M. A transfer learning-based approach with deep cnn for covid-19-and pneumonia-affected chest x-ray image classification. SN Comput. Sci. 2022, 3, 1-10. https://doi.org/10.1007/s42979-021-00881-5.
Sharma, T.; Nair, R.; Gomathi, S. Breast cancer image classification using transfer learning and convolutional neural network. Int. J. Mod. Res. 2022, 2, 8-16. Available online: http://ijmore.co.in/index.php/ijmore/article/view/6 (accessed on 23 September 2022)
Han, G.; Huang, S.; Ma, J.; He, Y. Meta faster r-cnn: Towards accurate few-shot object detection with attentive feature alignment. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 1 March-22 February 2022; Volume 36, pp. 780-789. https://doi.org/10.48550/arXiv.2104.07719.
Ramachandra, A.C. Real Time Object Detection System with YOLO and CNN Models: A Review. arXiv 2022, arXiv: 2208.00773. https://doi.org/10.48550/arXiv.2208.00773.
Saralioglu, E.; Gungor, O. Semantic segmentation of land cover from high resolution multispectral satellite images by spectral- spatial convolutional neural network. Geocarto Int. 2022, 37, 657-677. https://doi.org/10.1080/10106049.2020.1734871.
Valdez-Rodríguez, J.E.; Calvo, H.; Felipe-Riverón, E.; Moreno-Armendariz, M.A. Improving Depth Estimation by Embedding Se- mantic Segmentation: A Hybrid CNN Model. Sensors 2022, 22, 1669. https://doi.org/10.3390/s22041669.
Nguyen, C.; Asad, Z.; Deng, R.; Huo, Y. Evaluating transformer-based semantic segmentation networks for pathological image segmentation. In Proceedings of the Medical Imaging 2022: Image Processing, Tianjin, China, 14-16 January 2022; Volume 12032, pp. 942-947. https://doi.org/10.1117/12.2611177.
Sağlam, S.; Tat, F.; Bayar, S. FPGA Implementation of CNN Algorithm for Detecting Malaria Diseased Blood Cells. In Proceedings of the 2019 International Symposium on Advanced Electrical and Communication Technologies (ISAECT), Rome, Italy, 27-29 No- vember 2019; pp. 1-5. https://doi.org/10.1109/ISAECT47714.2019.9069724.
Zhang, Q. Application of CNN Optimization Design Based on APSOC in the Classification of Congenital Heart Disease. Master's Thesis, Yunnan University, Kunming, China, 2020. https://doi.org/10.27456/d.cnki.gyndu.2020.002099.
Zhu, J.; Yang, T.; Liu, R.; Xu, X.; Zhu, X. Image recognition of CT diagnosis for cholangiocarcinoma treatment based on FPGA processor and neural network. Microprocess. Microsyst. 2021, 81, 103645. https://doi.org/10.1016/j.micpro.2020.103645.
Xiong, S.; Wu, G.; Fan, X.; Feng, X.; Huang, Z.; Cao, W.; Zhou, X.; Ding, S.; Yu, J.; Wang, L. et al. MRI-based brain tumor segmen- tation using FPGA-accelerated neural network. BMC Bioinform. 2021, 22, 1-15. https://doi.org/10.1186/s12859-021-04347-6.
Liu, H.; Panahi, A.; Andrews, D.; Nelson, A. An FPGA-Based Upper-Limb Rehabilitation Device for Gesture Recognition and Motion Evaluation Using Multi-Task Recurrent Neural Networks. In Proceedings of the 2020 International Conference on Field- Programmable Technology (ICFPT), Maui, HI, USA, 9-11 December 2020; pp. 296-297. https://doi.org/10.1109/JSEN.2022.3141659.
Wang, C. Implementation and Verification of CNN Based on FPGA. Ph.D. Thesis, Hebei University, Baoding, China, 2020. https://doi.org/10.27103/d.cnki.ghebu.2020.001331.
Qin, B.; Gao, L.; Jiang, J.; Dou, Y. Design and Implementation of Accelerator for Aircrafts Key Points Detection Based on FPGA. Ship Electron. Eng. 2020, 40, 149-155. https://doi.org/10.3969/j.issn.1672-9730.2020.03.036.
Ferreira, J.C.; Fonseca, J. An FPGA implementation of a long short-term memory neural network. In Proceedings of the 2016 Inter- national Conference on ReConFigurable Computing and FPGAs (ReConFig), Cancun, Mexico, 30 November-2 December 2016; pp. 1-8. https://doi.org/10.1109/ReConFig.2016.7857151.
Guan, Y.; Yuan, Z.; Sun, G.; Cong, J. FPGA-based accelerator for long short-term memory recurrent neural networks. In Proceed- ings of the 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), Chiba, Japan, 16-19 January 2017; pp. 629-634. https://doi.org/10.1109/ASPDAC.2017.7858394.
Zhang, Y.; Wang, C.; Gong, L.; Lu, Y.; Sun, F.; Xu, C.; Li, X.; Zhou, X. Implementation and optimization of the accelerator based on FPGA hardware for LSTM network. In Proceedings of the 2017 IEEE international symposium on parallel and distributed pro- cessing with applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), Guangzhou, China, 12-15 December 2017; pp. 614-621. https://doi.org/10.1109/ISPA/IUCC.2017.00098.
Zhang, Y.; Wang, C.; Gong, L.; Lu, Y.; Sun, F.; Xu, C.; Li, X.; Zhou, X. A power-efficient accelerator based on FPGAs for LSTM network. In Proceedings of the 2017 IEEE International Conference on Cluster Computing (CLUSTER), Honolulu, HI, USA, 5-8 September 2017; pp. 629-630. https://doi.org/10.1109/CLUSTER.2017.45.
Han, S.; Kang, J.; Mao, H.; Hu, Y.; Li, X.; Li, Y.; Xie, D.; Luo, H.; Yao, S.; Wang, Y. et al. Ese: Efficient speech recognition engine with sparse lstm on fpga. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Mon- terey, CA, USA, 22-24 February 2017; pp. 75-84. https://doi.org/10.1145/3020078.3021745.
Li, Z.; Ding, C.; Wang, S.; Wen, W.; Zhou, Y.; Liu, C.; Qiu, Q.; Xu, W.; Lin, X.; Qian, X. et al. E-RNN: Design optimization for efficient recurrent neural networks in FPGAs. In Proceedings of the 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), Washington, DC, USA, 16-20 February 2019; pp. 69-80. https://doi.org/10.1109/HPCA.2019.00028.
Zheng, Y.; Yang, H.; Huang, Z.; Li, T.; Jia, Y. A high energy-efficiency FPGA-based LSTM accelerator architecture design by struc- tured pruning and normalized linear quantization. In Proceedings of the 2019 International Conference on Field-Programmable Technology (ICFPT), Tianjin, China, 9-13 December 2019; pp. 271-274. https://doi.org/10.1109/ICFPT47387.2019.00045.
Sun, Y.; Amano, H. FiC-RNN: A multi-FPGA acceleration framework for deep recurrent neural networks. IEICE Trans. Inf. Syst. 2020, 103, 2457-2462. https://doi.org/10.1587/transinf.2020PAP0003.
Gao, C.; Rios-Navarro, A.; Chen, X.; Liu, S.C.; Delbruck, T. EdgeDRNN: Recurrent neural network accelerator for edge inference. IEEE J. Emerg. Sel. Top. Circuits Syst. 2020, 10, 419-432. https://doi.org/10.1109/JETCAS.2020.3040300.
Kim, J.; Kim, J.; Kim, T.H. AERO: A 1.28 MOP/s/LUT reconfigurable inference processor for recurrent neural networks in a re- source-limited FPGA. Electronics 2021, 10, 1249. https://doi.org/10.3390/electronics10111249.
Jiang, J.; Jiang, M.; Zhang, J.; Dong, F. A CPU-FPGA Heterogeneous Acceleration System for Scene Text Detection Network. IEEE Trans. Circuits Syst. II Express Briefs 2022, 69, 2947-2951. https://doi.org/10.1109/TCSII.2022.3167022.
Gao, C.; Delbruck, T.; Liu, S.C. Spartus: A 9.4 top/s fpga-based lstm accelerator exploiting spatio-temporal sparsity. IEEE Trans. Neural Netw. Learn. Syst. 2022, 10, 1425-1437. https://doi.org/10.1109/TNNLS.2022.3180209.
Yazdanbakhsh, A.; Brzozowski, M.; Khaleghi, B.; Ghodrati, S.; Samadi, K.; Kim, N.S.; Esmaeilzadeh, H. Flexigan: An end-to-end solution for fpga acceleration of generative adversarial networks. In Proceedings of the 2018 IEEE 26th Annual International Sym- posium on Field-Programmable Custom Computing Machines (FCCM), Boulder, CO, USA, 29 April-1 May 2018; pp. 65-72. https://doi.org/10.1109/FCCM.2018.00019.
Chang, J.W.; Ahn, S.; Kang, K.W.; Kang, S.J. Towards design methodology of efficient fast algorithms for accelerating generative adversarial networks on FPGAs. In Proceedings of the 2020 25th Asia and South Pacific Design Automation Conference (ASP- DAC), Beijing, China, 13-16 January 2020; pp. 283-288. https://doi.org/10.48550/arXiv.1911.06918.
Shi, X.P. Research on the Infrared Image Enhancement Based on Generative Adversarial Networks. Master's Thesis, Tianjin Uni- versity, Tianjin, China, 2019. https://doi.org/10.27356/d.cnki.gtjdu.2019.002968.
Danopoulos, D.; Anagnostopoulos, K.; Kachris, C.; Soudris, D. FPGA Acceleration of Generative Adversarial Networks for Image Reconstruction. In Proceedings of the 2021 10th International Conference on Modern Circuits and Systems Technologies (MO- CAST), Thessaloniki, Greece, 5-7 July 2021; pp. 1-5. https://doi.org/10.1109/MOCAST52088.2021.9493361.
Liu, Y.; Zhao, C. Research on FPGA-based Generative Adversarial Network implementation method. In Proceedings of the 33rd China Simulation Conference, Harbin, China, 9-11 July 2021; pp. 91-97. https://doi.org/10.26914/c.cnkihy.2021.024944.
Vanhoucke, V.; Senior, A.; Mao, M.Z. Improving the Speed of Neural Networks on CPUs. In Proceedings of the Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011, Granada, Spain, 15 December 2011.
Zhang, S.; Cao, J.; Zhang, Q.; Zhang, Q.; Zhang, Y.; Wang, Y. An fpga-based reconfigurable cnn accelerator for yolo. In Proceedings of the 2020 IEEE 3rd International Conference on Electronics Technology (ICET), Chengdu, China, 8-11 May 2020; pp. 74-78. https://doi.org/10.1109/ICET49382.2020.9119500.
Li, Z.; Chen, J.; Wang, L.; Cheng, B.; Yu, J.; Jiang, S. CNN Weight Parameter Quantization Method for FPGA. In Proceedings of the 2020 IEEE 6th International Conference on Computer and Communications (ICCC), Chengdu, China, 11-14 December 2020; pp. 1548-1553. https://doi.org/10.1109/ICCC51575.2020.9345248.
Chang, S.E.; Li, Y.; Sun, M.; Shi, R.; So, H.K.H.; Qian, X.; Wang, Y.; Lin, X. Mix and match: A novel fpga-centric deep neural network quantization framework. In Proceedings of the 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Seoul, Korea, 27 February-3 March 2021; pp. 208-220. https://doi.org/10.1109/HPCA51647.2021.00027.
Zhao, X.; Wang, Y.; Cai, X.; Liu, C.; Zhang, L. Linear Symmetric Quantization of Neural Networks for Low-Precision Integer Hard- ware. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020.
Bao, Z.; Zhan, K.; Zhang, W.; Guo, J. LSFQ: A Low Precision Full Integer Quantization for High-Performance FPGA-Based CNN Acceleration. In Proceedings of the 2021 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS), Tokyo, Japan, 14-16 April 2021; pp. 1-6. https://doi.org/10.1109/COOLCHIPS52128.2021.9410327.
Zhao, X.; Zhang, X.; Yang, F.; Xu, P.; Li, W.; Chen, F. Research on Machine Learning Optimization Algorithm of CNN for FPGA Architecture. J. Phys. Conf. Ser. 2021, 2006, 012012. https://doi.org/10.1088/1742-6596/2006/1/012012.
Shi, T.J.; Liu, Y.F.; Tian, J.; Zhao, Y.X. Design of FPGA recurrent neural network accelerator based on high level synthesis. Inform. Technol. Inform. 2022, 1, 151-153. https://doi.org/10.3969/j.issn.1672-9528.2022.01.042.
Fowers, J.; Ovtcharov, K.; Strauss, K.; Chung, E.S.; Sitt, G. A high memory bandwidth fpga accelerator for sparse matrix-vector multiplication. In Proceedings of the 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Compu- ting Machines, Boston, MA, USA, 11-13 May 2014; pp. 36-43. https://doi.org/10.1109/FCCM.2014.23.
Nurvitadhi, E.; Sheffield, D.; Sim, J.; Mishra, A.; Venkatesh, G.; Marr, D. Accelerating binarized neural networks: Comparison of FPGA, CPU, GPU, and ASIC. In Proceedings of the 2016 International Conference on Field-Programmable Technology (FPT), Xi'an, China, 7-9 December 2016; pp. 77-84. https://doi.org/10.1109/FPT.2016.7929192.
Gupta, A.; Suneja, K. Hardware Design of Approximate Matrix Multiplier based on FPGA in Verilog. In Proceedings of the 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 13-15 May 2020; pp. 496- 498. https://doi.org/10.1109/ICICCS48265.2020.9121004.
Iakovidis, D.K.; Maroulis, D.E.; Bariamis, D.G. FPGA architecture for fast parallel computation of co-occurrence matrices. Micro- process. Microsyst. 2007, 31, 160-165. https://doi.org/10.1016/j.micpro.2006.02.013.
Abbaszadeh, A.; Iakymchuk, T.; Bataller-Mompeán, M.; Francés-Villora, J.V.; Rosado-Muñoz, A. Anscalable matrix computing unit architecture for FPGA, and SCUMO user design interface. Electronics 2019, 8, 94. https://doi.org/10.3390/electronics8010094.
Kala, S.; Nalesh, S. Efficient cnn accelerator on fpga. IETE J. Res. 2020, 66, 733-740. https://doi.org/10.1080/03772063.2020.1821797.
Kang, S.; Lee, S.; Kim, B.; Kim, H.; Sohn, K.; Kim, N.S.; Lee, E. An FPGA-based RNN-T Inference Accelerator with PIM-HBM. In Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Virtual, 27 February-1 March 2022; pp. 146-152. https://doi.org/10.1145/3490422.3502355.
Lavin, A.; Gray, S. Fast algorithms for convolutional neural networks. In Proceedings of the IEEE conference on computer vision and Pattern Recognition, Las Vegas, NV, USA, 27-30 June 2016; pp. 4013-4021. https://doi.org/10.1109/CVPR.2016.435.
Lu, L.; Liang, Y. SpWA: An efficient sparse winograd convolutional neural networks accelerator on FPGAs. In Proceedings of the 55th Annual Design Automation Conference, San Francisco, CA, USA, 24-29 June 2018; pp. 1-6. https://doi.org/10.1145/3195970.3196120.
Kala, S.; Mathew, J.; Jose, B.R.; Nalesh, S. UniWiG: Unified winograd-GEMM architecture for accelerating CNN on FPGAs. In Proceedings of the 2019 32nd International Conference on VLSI Design and 2019 18th International Conference on Embedded Systems (VLSID), Delhi, India, 5-9 January 2019; pp. 209-214. https://doi.org/10.1109/VLSID.2019.00055.
Bao, C.; Xie, T.; Feng, W.; Chang, L.; Yu, C. A power-efficient optimizing framework fpga accelerator based on winograd for yolo. IEEE Access 2020, 8, 94307-94317. https://doi.org/10.1109/ACCESS.2020.2995330.
Wang, X.; Wang, C.; Cao, J.; Gong, L.; Zhou, X. Winonn: Optimizing fpga-based convolutional neural network accelerators using sparse winograd algorithm. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2020, 39, 4290-4302. https://doi.org/10.1109/TCAD.2020.3012323.
Li, B.; Qi, Y.R.; Zhou, Q.L. Design and optimization of target detection accelerator based on Winograd algorithm. Acta Electron. Sin. 2022, 50, 2387-2397. https://doi.org/10.12263/DZXB.20201371.
Tang, F.; Zhang, W.; Tian, X.; Fan, X.; Cao, X. Optimization of Convolution Neural Network Algorithm Based on FPGA. ESTC 2017. Communications in Computer and Information Science; Springer: Singapore, 2018; Volume 857, pp. 131-140. https://doi.org/10.1007/978-981-13-1026-3_10.
Yu, F.; Cao, Y.; Tang, Y. Realization of Quantized Neural Network for Super-resolution on PYNQ. In Proceedings of the 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Fayetteville, AR, USA, 3- 6 May 2020; p. 233. https://doi.org/10.1109/FCCM48280.2020.00066.
Ye, T.; Kuppannagari, S.R.; Kannan, R.; Prasanna, V.K. Performance Modeling and FPGA Acceleration of Homomorphic Encrypted Convolution. In Proceedings of the 2021 31st International Conference on Field-Programmable Logic and Applications (FPL), Dres- den, Germany, 30 August-3 September 2021; pp. 115-121. https://doi.org/10.1109/FPL53798.2021.00027.
Zhang, H.; Jiang, J.; Fu, Y.; Chang, Y.C. Yolov3-tiny Object Detection SoC Based on FPGA Platform. In Proceedings of the 2021 6th International Conference on Integrated Circuits and Microsystems (ICICM), Nanjing, China, 22-24 October 2021; pp. 291-294. https://doi.org/10.1109/ICICM54364.2021.9660358.
Xiao, C.; Shi, C.; Xu, D.; Lin, F.; Ning, K. SDST-Accelerating GEMM-based Convolution through Smart Data Stream Transfor- mation. In Proceedings of the 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), Taormina, Italy, 16-19 May 2022; pp. 396-403. https://doi.org/10.1109/CCGrid54584.2022.00049.
Özkilbaç, B.; Ozbek, I.Y.; Karacali, T. Real-Time Fixed-Point Hardware Accelerator of Convolutional Neural Network on FPGA Based. In Proceedings of the 2022 5th International Conference on Computing and Informatics (ICCI), New Cairo, Egypt, 9-10 March 2022; pp. 1-5. https://doi.org/10.1109/ICCI54321.2022.9756093.
Liu, Z.; Dou, Y.; Jiang, J.; Xu, J.; Li, S.; Zhou, Y.; Xu, Y. Throughput-optimized FPGA accelerator for deep convolutional neural networks. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 2017, 10, 1-23. https://doi.org/10.1145/3079758.
Xing, Y.; Liang, S.; Sui, L.; Jia, X.; Qiu, J.; Liu, X.; Wang, Y.; Shan, Y.; Wang, Y. Dnnvm: End-to-end compiler leveraging heteroge- neous optimizations on fpga-based cnn accelerators. IEEE Trans. Comput.-Aid. Des. Integr. Circuits Syst. 2019, 39, 2668-2681. https://doi.org/10.1109/TCAD.2019.2930577.
Wang, W.; Zhou, K.; Wang, Y.; Wang, G.; Yang, Z.; Yuan, J. FPGA parallel structure design of convolutional neural network (CNN) algorithm. Microelectron. Comput. 2019, 36, 57-62, 66. https://doi.org/10.19304/j.cnki.issn1000-7180.2019.04.012.
Wen, D.; Jiang, J.; Dou, Y.; Xu, J.; Xiao, T. An energy-efficient convolutional neural network accelerator for speech classification based on FPGA and quantization. CCF Trans. High Perform. Comput. 2021, 3, 4-16. https://doi.org/10.1007/s42514-020-00055-4.
Varadharajan, S.K.; Nallasamy, V. P-SCADA-a novel area and energy efficient FPGA architectures for LSTM prediction of heart arrthymias in BIoT applications. Expert Syst. 2022, 39, e12687. https://doi.org/10.1111/exsy.12687.
Williams, S.; Waterman, A.; Patterson, D. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 2009, 52, 65-76. https://doi.org/10.1145/1498765.1498785.
Siracusa, M.; Di-Tucci, L.; Rabozzi, M.; Williams, S.; Sozzo, E.D.; Santambrogio, M.D. A cad-based methodology to optimize hls code via the roofline model. In Proceedings of the 39th International Conference on Computer-Aided Design, Virtual, 2-5 Novem- ber 2020; pp. 1-9. https://doi.org/10.1145/3400302.3415730.
Calore, E.; Schifano, S.F. Performance assessment of FPGAs as HPC accelerators using the FPGA Empirical Roofline. In Proceed- ings of the 2021 31st International Conference on Field-Programmable Logic and Applications (FPL), Dresden, Germany, 30 Au- gust-3 September 2021; pp. 83-90. https://doi.org/10.1109/FPL53798.2021.00022.
Feng, Y.X.; Hu, S.Q.; Li, X.M.; Yu, J.C. Implementation and optimisation of pulse compression algorithm on open CL-based FPGA. J. Eng. 2019, 2019, 7752-7754. https://doi.org/10.1049/joe.2019.0757.
Di, X.; Yang, H.G.; Jia, Y.; Huang, Z.; Mao, N. Exploring efficient acceleration architecture for winograd-transformed transposed convolution of GANs on FPGAs. Electronics 2020, 9, 286. https://doi.org/10.3390/electronics9020286.
Yu, X.L.; Li, B.Q.; Dong, M.S.; Yin, W.M. Target Detection and Tracking System Based on FPGA. In Proceedings of the IOP Con- ference Series: Materials Science and Engineering. IOP Publ. 2020, 793, 012008. https://doi.org/10.1088/1757-899X/793/1/012008.
Li, T.Y.; Zhang, F.; Guo, W.; Shen, J.J.; Sun, M.Q. An FPGA-based JPEG preprocessing accelerator for image classification. J. Eng. 2022, 2022, 919-927. https://doi.org/10.1049/tje2.12174.
Zhang, H.; Li, Z.; Yang, H.; Cheng, X.; Zeng, X. A High-Efficient and Configurable Hardware Accelerator for Convolutional Neural Network. In Proceedings of the 2021 IEEE 14th International Conference on ASIC (ASICON), Kunming, China, 26-29 October 2021; pp. 1-4. https://doi.org/10.1109/ASICON52560.2021.9620305.
Nguyen, X.Q.; Pham-Quoc, C. An FPGA-based Convolution IP Core for Deep Neural Networks Acceleration. REV J. Electron. Commun. 2022, 1, 1-2. https://doi.org/10.21553/rev-jec.286.
Dinelli, G.; Meoni, G.; Rapuano, E.; Pacini, T.; Fanucci, L. MEM-OPT: A scheduling and data re-use system to optimize on-chip memory usage for CNNs on-board FPGAs. IEEE J. Emerg. Sel. Top. Circuits Syst. 2020, 10, 335-347. https://doi.org/10.1109/JETCAS.2020.3015294.
Miyajima, T.; Sano, K. A memory bandwidth improvement with memory space partitioning for single-precision floating-point FFT on Stratix 10 FPGA. In Proceedings of the 2021 IEEE International Conference on Cluster Computing (CLUSTER), Portland, OR, USA, 7-10 September 2021; pp. 787-790. https://doi.org/10.1109/Cluster48925.2021.00117.
Zhang, B.; Zeng, H.; Prasanna, V. Accelerating large scale GCN inference on FPGA. In Proceedings of the 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Fayetteville, AR, USA, 3-6 May 2020; p. 241. https://doi.org/10.1109/FCCM48280.2020.00074.
Du, Z.; Zhang, Q.L.; Lin, M.; Li, S.; Li, X.; Ju, L. A comprehensive memory management framework for CPU-FPGA heterogenous SoCs. IEEE Trans. Comput.-Aid. Des. Integr. Circuits Syst. 2022, in press. https://doi.org/10.1109/TCAD.2022.3179323.
Li, X.; Huang, H.; Chen, T.; Gao, H.; Hu, X.; Xiong, X. A hardware-efficient computing engine for FPGA-based deep convolutional neural network accelerator. Microelectron. J. 2022, 128, 105547. https://doi.org/10.1016/j.mejo.2022.105547.
Gong, Y.; Xu, Z.; He, Z.; Zhang, W.; Tu, X.; Liang, X.; Jiang, L. N3H-Core: Neuron-designed Neural Network Accelerator via FPGA- based Heterogeneous Computing Cores. In Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programma- ble Gate Arrays, Virtual, 27 February-1 March 2022; pp. 112-122. https://doi.org/10.1145/3490422.3502367.
Sun, M.; Li, Z.; Lu, A.; Li, Y.; Chang, S.E.; Ma, X.; Lin, X.; Fang, Z. FILM-QNN: Efficient FPGA Acceleration of Deep Neural Net- works with Intra-Layer, Mixed-Precision Quantization. In Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Virtual, 27 February-1 March 2022; pp. 134-145. https://doi.org/10.1145/3490422.3502364.
Neda, N.; Ullah, S.; Ghanbari, A.; Mahdiani, H.; Modarressi, M.; Kumar, A. Multi-Precision Deep Neural Network Acceleration on FPGAs. In Proceedings of the 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC), Taipei, Taiwan, 17-20 January 2022; pp. 454-459. https://doi.org/10.1109/ASP-DAC52403.2022.9712485.
Li, H.; Yue, X.; Wang, Z.; Chai, Z.; Wang, W.; Tomiyama, H.; Meng, L. Optimizing the deep neural networks by layer-wise refined pruning and the acceleration on FPGA. Comput. Intell. Neurosci. 2022, 2022, 8039281. https://doi.org/10.1155/2022/8039281.
Wang, H.; Fu, Y.; Ma, L. FPGA-Based High-Performance Data Compression Deep Neural Network Accelerator. In Proceedings of the 2022 International Conference on Big Data, Information and Computer Network (BDICN), Sanya, China, 20-22 January 2022; pp. 563-569. https://doi.org/10.1109/BDICN55575.2022.00109.
Chen, G.; Ling, Y.; He, T.; Meng, H.; He, S.; Zhang, Y.; Huang, K. Stereoengine: An fpga-based accelerator for real-time high-quality stereo estimation with binary neural network. IEEE Trans. Comput.-Aid. Des. Integr. Circuits Syst. 2020, 39, 4179-4190. https://doi.org/10.1109/TCAD.2020.3012864.
Jain, A.; Goel, P.; Aggarwal, S.; Fell, A.; Anand, S. Symmetric $ k $-means for deep neural network compression and hardware acceleration on FPGAs. IEEE J. Sel. Top. Signal Process. 2020, 14, 737-749. https://doi.org/10.1109/JSTSP.2020.2968810.
Zhu, C.; Huang, K.; Yang, S.; Zhu, Z.; Zhang, H.; Shen, H. An Efficient Hardware Accelerator for Structured Sparse Convolutional Neural Networks on FPGAs. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2020, 28, 1953-1965. https://doi.org/10.1109/TVLSI.2020.3002779.
Shen, Y.; Ferdman, M.; Milder, P. Maximizing CNN accelerator efficiency through resource partitioning. In Proceedings of the 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), Toronto, ON, Canada, 24-28 June 2017; pp. 535-547. https://doi.org/10.1145/3079856.3080221.
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735-1780, https://doi.org/10.1162/neco.1997.9.8.1735.
Cho, K.; Van-Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. https://doi.org/10.48550/arXiv.1406.1078.
He, D.; He, J.; Liu, J.; Yang, J.; Yan, Q.; Yang, Y. An FPGA-Based LSTM Acceleration Engine for Deep Learning Frameworks. Elec- tronics 2021, 10, 681. https://doi.org/10.3390/electronics10060681.
Nan, G.; Wang, Z.; Wang, C.; Wu, B.; Wang, Z.; Liu, W.; Lombardi, F. An Energy Efficient Accelerator for Bidirectional Recurrent Neural Networks (BiRNNs) Using Hybrid-Iterative Compression with Error Sensitivity. IEEE Trans. Circuits Syst. I Regul. Pap. 2021, 68, 3707-3718. https://doi.org/10.1109/TCSI.2021.3091318.
Jiang, J.; Xiao, T.; Xu, J.; Wen, D.; Gao, L.; Dou, Y. A low-latency LSTM accelerator using balanced sparsity based on FPGA. Micro- process. Microsyst. 2022, 89, 104417. https://doi.org/10.1016/j.micpro.2021.104417.
Terada, H.; Shouno, H. B-DCGAN: Evaluation of Binarized DCGAN for FPGA. In Lecture Notes in Computer Science, Proceedings of the International Conference on Neural Information Processing, Sydney, NSW, Australia, 12-15 December 2019; Springer: Cham, Switzerland, 2019; pp. 55-64. https://doi.org/10.1007/978-3-030-36708-4_5.
Nakamura, K.; Nakahara, H. Optimizations of Ternary Generative Adversarial Networks. In Proceedings of the 2022 IEEE 52nd International Symposium on Multiple-Valued Logic (ISMVL), Warsaw, Poland, 22-24 May 2022; pp. 158-163. https://doi.org/10.1109/ISMVL52857.2022.00031.
Alhussain, A.; Lin, M. Hardware-Efficient Deconvolution-Based GAN for Edge Computing. In Proceedings of the 2022 56th An- nual Conference on Information Sciences and Systems (CISS), Princeton, NJ, USA, 9-11 March 2022; pp. 172-176. https://doi.org/10.1109/CISS53076.2022.9751185.
Chang, J.W.; Kang, K.W.; Kang, S.J. SDCNN: An Efficient Sparse Deconvolutional Neural Network Accelerator on FPGA. In Pro- ceedings of the 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), Florence, Italy, 25-29 March 2019; pp. 968-971. https://doi.org/10.23919/DATE.2019.8715055.
Mao, W.; Yang, P.; Wang, Z. FTA-GAN: A Computation-Efficient Accelerator for GANs with Fast Transformation Algorithm. IEEE Trans. Neural Netw. Learn. Syst. 2021, in press. https://doi.org/10.1109/TNNLS.2021.3110728.
Xie, X.; Chai, M.; Du, Z.; Yang, K.; Yin, S. A Reconfigurable Parallelization of Generative Adversarial Networks based on Array Processor. In Proceedings of the 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Tokyo, Japan, 14-17 December 2021; pp. 127-132.
Yin, T.; Mao, W.; Lu, J.; Wang, Z. A Reconfigurable Accelerator for Generative Adversarial Network Training Based on FPGA. In Proceedings of the 22021 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Tampa, FL, USA, 7-9 July 2021; pp. 144- 149. https://doi.org/10.1109/ISVLSI51109.2021.00036.
Ghasemzadeh, S.A.; Tavakoli, E.B.; Kamal, M.; Afzali-Kusha, A.; Pedram, M. BRDS: An FPGA-based LSTM accelerator with row- balanced dual-ratio sparsification. arXiv 2021, arXiv:2101.02667. https://doi.org/10.48550/arXiv.2101.02667.
Que, Z.; Nakahara, H.; Fan, H.; Meng, J.; Tsoi, K.H.; Niu, X.; Nurvitadhi, E.; Luk, W. A Reconfigurable Multithreaded Accelerator for Recurrent Neural Networks. In Proceedings of the 2020 International Conference on Field-Programmable Technology (ICFPT), Maui, HI, USA, 9-11 December 2020; pp. 20-28. https://doi.org/10.1109/ICFPT51103.2020.00012.
Yi, Q.; Sun, H.; Fujita, M. FPGA Based Accelerator for Neural Networks Computation with Flexible Pipelining. arXiv 2021, arXiv:2112.15443. https://doi.org/10.48550/arXiv.2112.15443.
Fan, H.; Ferianc, M.; Que, Z.; Liu, S.; Niu, X.; Rodrigues, M.; Luk, W. FPGA-based Acceleration for Bayesian Convolutional Neural Networks. IEEE Trans. Comput.-Aid. Des. Integr. Circuits Syst. 2022, in press. https://doi.org/10.1109/TCAD.2022.3160948.
Ioannou, L.; Fahmy, S.A. Streaming Overlay Architecture for Lightweight LSTM Computation on FPGA SoCs. ACM Trans. Recon- figurable Technol. Syst. (TRETS) 2022. https://doi.org/10.1145/3543069.
Ram, S.R.; Kumar, M.V.; Subramanian, B.; Bacanin, N.; Zivkovic, M.; Strumberger, I. Speech enhancement through improvised conditional generative adversarial networks. Microprocess. Microsyst. 2020, 79, 103281. https://doi.org/10.1016/j.micpro.2020.103281.
Jiang, W.; Yu, H.; Ha, Y. A High-Throughput Full-Dataflow MobileNetv2 Accelerator on Edge FPGA. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2022, in press. https://doi.org/10.1109/TCAD.2022.3198246.
Zhang, F.; Li, Y.; Ye, Z. Apply Yolov4-Tiny on an FPGA-Based Accelerator of Convolutional Neural Network for Object Detection. In Proceedings of the Journal of Physics: Conference Series. IOP Publ. 2022, 2303, 012032.
Latotzke, C.; Ciesielski, T.; Gemmeke, T. Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA. arXiv 2022, arXiv:2208.04854. https://doi.org/10.48550/arXiv.2208.04854.
Elloumi, H.; Sellami, D.; Rabah, H. A Flexible Hardware Accelerator for Morphological Filters on FPGA. In Proceedings of the 2022 8th International Conference on Control, Decision and Information Technologies (CoDIT), Istanbul, Turkey, 17-20 May 2022; pp. 550-555. https://doi.org/10.1109/CoDIT55151.2022.9804025.
Xuan, L.; Un, K.F.; Lam, C.S.; Martins, R.P. An FPGA-Based Energy-Efficient Reconfigurable Depthwise Separable Convolution Accelerator for Image Recognition. IEEE Trans. Circuits Syst. II: Express Briefs 2022, 69, 4003-4007. https://doi.org/10.1109/TCSII.2022.3180553.
Jiang, K.Y.; Wang, H.Y.; Wu, Hwang, Y.T.; Fan, C.P. Quantized Lite Convolutional Neural Network Hardware Accelerator Design with FPGA for Face Direction Recognition. In Proceedings of the 2022 IEEE International Conference on Consumer Elec- tronics-Taiwan, Taipei, Taiwan, 6-8 July 2022; pp. 61-62. https://doi.org/10.1109/ICCE-Taiwan55306.2022.9869249.
Liu, B.; Zhou, Y.; Feng, L.; Fu, H.; Fu, P. Hybrid CNN-SVM Inference Accelerator on FPGA Using HLS. Electronics 2022, 11, 2208. https://doi.org/10.3390/electronics11142208.
Tian, T.; Zhao, L.; Wang, X.; Wu, Q.; Yuan, W.; Jin, X. FP-GNN: Adaptive FPGA accelerator for Graph Neural Networks. Future Gener. Comput. Syst. 2022, 136, 294-310. https://doi.org/10.1016/j.future.2022.06.010.
Peng, H.; Huang, S.; Geng, T.; Li, A.; Jiang, W.; Liu, H.; Wang, S.; Ding, C. Accelerating Transformer-based Deep Learning Models on FPGAs using Column Balanced Block Pruning. In Proceedings of the 2021 22nd International Symposium on Quality Electronic Design (ISQED), Santa Clara, CA, USA, 7-9 April 2021; pp. 142-148. https://doi.org/10.1109/ISQED51717.2021.9424344.
Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M. et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 16-20 November 2020; pp. 38-45. https://doi.org/10.18653/v1/2020.emnlp-demos.6.
Rapuano, E.; Pacini, T.; Fanucci, L. A Post-training Quantization Method for the Design of Fixed-Point-Based FPGA/ASIC Hard- ware Accelerators for LSTM/GRU Algorithms. Comput. Intell. Neurosci. 2022, 2022, 9485933. https://doi.org/10.1155/2022/9485933.
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805.
Liu, Z.; Li, G.; Cheng, J. Hardware Acceleration of Fully Quantized BERT for Efficient Natural Language Processing. In Proceedings of the 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 1-5 February 2021; pp. 513- 516. https://doi.org/10.23919/DATE51398.2021.9474043.
Kim, J.; Hur, S.; Lee, E.; Lee, S.; Kim, J. NLP-Fast: A Fast, Scalable, and Flexible System to Accelerate Large-Scale Heterogeneous NLP Models. In Proceedings of the 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), Atlanta, GA, USA, 26-29 September 2021; pp. 75-89. https://doi.org/10.1109/PACT52795.2021.00013.
Jian, Y. T-OPU: An FPGA-based Overlay Processor for Natural Language Processing. Master's Thesis, University of California, Los Angeles, CA, USA, 2022.
Keddous, F.; Nguyen, H.N.; Nakib, A. FFCNN: Fast FPGA based Acceleration for Convolution neural network inference. arXiv 2022, arXiv:2208.13250. https://doi.org/10.48550/arXiv.2208.13250.
Huang, C.; Ni, S.; Chen, G. A layer-based structured design of CNN on FPGA. In Proceedings of the 2017 IEEE 12th International Conference on ASIC (ASICON), Guiyang, China, 25-28 October 2017; pp. 1037-1040. https://doi.org/10.1109/ASICON.2017.8252656.
Nguyen, D.T.; Je, H.; Nguyen, T.N.; Ryu, S.; Lee, K.; Lee, H.J. ShortcutFusion: From Tensorflow to FPGA-Based Accelerator with a Reuse-Aware Memory Allocation for Shortcut Data. IEEE Trans. Circuits Syst. I: Regul. Pap. 2022, 69, 2477-2489. https://doi.org/10.1109/TCSI.2022.3153288.
Li, Z.; Sun, M.; Lu, A.; Ma, H.; Yuan, G.; Xie, Y.; Tang, H.; Li, Y.; Leeser, M.; Wang, Z. et al. Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization. arXiv 2022, arXiv:2208.05163. https://doi.org/10.48550/arXiv.2208.05163.
Cao, Y.; Guo, S.; Jiang, S.; Zhou, X.; Wang, X.; Luo, Y.; Yu, Z.; Zhang, Z.; Deng, Y. Parallel Optimisation and Implementation of a Real-Time Back Projection (BP) Algorithm for SAR Based on FPGA. Sensors 2022, 22, 2292. https://doi.org/10.3390/s22062292.
Almomany, A.; Jarrah, A.; Al-Assaf, A. FCM Clustering Approach Optimization Using Parallel High-Speed Intel FPGA Technol- ogy. J. Electr. Comput. Eng. 2022, 2022, 8260283. https://doi.org/10.1155/2022/8260283.

A Review of the Optimal Design of Neural Networks Based on FPGA

Sign up for access to the world's latest research

Abstract

Related papers

References (165)

Related papers

Related topics