Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Figure 19 after the other, on the test-chip. Data is transferred back and forth between MATLAB (running on a host PC) and the test- chip, via an FPGA board. Table. II shows the detailed mapping of the 4 CONV/FC layers to the CSRAM array to compute the convolutions. Let us first consider layer C3. It has a filter size of 5 x 5, with 6 input channels and 16 output channels (number of 3-D filters). Each of the 16 3-D filters are mapped to one of the 16 local arrays in the CSRAM. Since each row in the local array has 64 bit-cells, a maximum of 2 (= Se \) input channels can fit per L5x5 row. Therefore, 3 (= ) rows are required in each local array to fit the entire 3-D filter. In every clock cycle, 50 (= 5 x 5 x 2) X7n’s are sent through 90 x 16 x 2 operations ( a buffer (shift-registers) to the CSRAM array to compute 16 partial convolution outputs. Thus, the CSRAM array processes MAV = 2 OPs: 1 multiply + 1 add/average) per clock cycle. For layer F5, the entire filter cannot fit at once in the CSRAM array (due to its limited 16 Kb size in the test-chip). Hence, the entire process, explained above, is repeated multiple times to finish all the computations. However, having multiple CSRAM arrays operating in parallel can easily alleviate this pro together on-chip. blem, by fitting all the filter weights Fig. 19. Test setup for automatically running the 4 CONV/FC layers of LeNet-5 CNN on Cony-SRAM, for a given input image (28 x 28).
Discover breakthrough research and expand your academic network
Join for free