A massively parallel RNS architecture
[1991] Conference Record of the Twenty-Fifth Asilomar Conference on Signals, Systems & Computers
Sign up for access to the world's latest research
Abstract
In this paper parallelism on the algorithmic, architectural, and arithmetic levels is exploited in the design of a Residue Number System (RNS) based architecture. The architecture is based on modulo processors. Each modulo processor is implemented by two dimensional systolic array composed of very simple cells. The decoding stage is implemented using a 2-0 array, too. The decoding bottleneck is eliminated. The whole architecture is pipelined which leads to high throughput rate.
Related papers
IAEME PUBLICATION, 2020
Arithmetic operations like addition and multiplication are the most important part for the computation in signal processing applications. Mathematical operations can be computed with increased speed in Residue Number System (RNS) than the conventional binary number systems because all the operations are done carry free and in parallel in RNS system. These carry free and parallel operations speed up in the RNS based computation. Selection of proper moduli set and their values offer maximum speed and minimum hardware for designing a signal processing system. In Residue Number System many intermediate results are constant and processing time can also be reduced using look-up table. Though there are many limitations in the RNS system, still RNS system can be used to speed up the total execution time of a system in comparison with conventional binary system.
American Journal of Applied Sciences
Residue Number System (RNS) is an unweighted number system that symbolizes big integers with smaller numbers. It can perform operations in particular addition and multiplication in parallel. Because of this property, RNS is extensively used in communication, Finite Impulse Response (FIR), cryptography and signal processing devices. The transfer of data in digital channels is very important for some critical applications where accuracy is very important. In this study, we proposed a novel algorithm that is premised on the Hamming Distance (HD) and one of the reverse conversion methods, which is, the Chinese Remainder Theorem (CRT) and) as a joint technique for the detection and correction of multiple bit errors in RNS. The proposed algorithm provides a more efficient technique that improves on the hardware size and increases the processing speed with fewer iterations when compared with other stateof-the-art schemes. The work analyses the area and delay of the hardware architecture and compared with other similar schemes. The results indicated the effectiveness of our proposed scheme in terms of the area and delay specifications.
Residue Number System (RNS) allows performing computation more efficiently. Natural parallelism of representation and processing of numbers makes this number system suitable for applying to high performance computing. We address the main features of application of RNS to high-performance parallel computing. We consider and analyze different stages of data processing in RNS. Based on this analysis, we describe the process of decomposition of algorithms using RNS
Proceedings of 9th Symposium on Computer Arithmetic
Decoding in Residue Number System (RNS) based architectures can be a bottleneck. A high speed and flexible modulo decoder is an essential computational element to maintain the advantages of RNS. In this paper, a fast and flexible modulo decoder, based on the Chinese Remainder Theorem (CRT), is presented. It decodes a set of residues into its equivalent representation in either unsigned magnitude or 2's complement binary number system. Two different architectures are analyzed; the first one is based on using Carry Save Adders(CSA), while, the other is based on utilizing a modified structure of Carry Save Adders(MCSA). Both architectures are modular and are based on simple cells which leads to efficient VLSI implementation. it has a time complexity of e(IogN).
The Computer Journal, 1999
In this paper we introduce a new algorithm for division in residue number system, which can be applied to any moduli set. Simulation results indicated that the algorithm is faster than the most competitive published work. To further improve this speed, we customize this algorithm to serve two specific moduli sets: (2 k , 2 k − 1, 2 k−1 − 1) and (2 k + 1, 2 k , 2 k − 1). The customization results in eliminating memory devices (ROMs), thus increasing the speed of operation. A semi-custom VLSI design for this algorithm for the moduli (2 k + 1, 2 k , 2 k − 1) has been implemented, fabricated and tested.
IEEJ Transactions on Electronics, Information and Systems, 2005
The multiplier and divider used for specific hardware of public key cryptosystem arithmetic are constructed from many adders and subtractors to improve the accuracy of the key. However, with the increase of accuracy, the propagation delay problem becomes unavoidable. Although some paper have proposed that a divider using redundant binary representation is effective to cope with this problem, no considerations were given to the problems of rounding error and accuracy of the remainder. This paper proposes a method, based on inherent bit sliced architecture, that can cope with these problems and that is expandable to any level of accuracy. It is expected to make it applicable to hardware for public key cryptosystem that can be flexible in coping with the expansion of the key string and the variable length key.
IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 1995
nanoseconds. We selected a representative example of a FIR filter with binary weights, and verified using simulation results that the neural network yields weights that enable the filter to perform very close to the theoretical peak performance that one can obtain from the given filter. We showed that the conventional LMS approach is unable to match the performance of the neural network since it cannot select the correct minimum from all the possible minima of the error function.
Radix- modulo multipliers and adders are introduced in this paper. The proposed architectures are shown to require several times less area than previously reported architectures, for particular moduli of operation. The proposed architectures are preferable in an area-time sense for several cases. The complexity reduction is achieved by extending the carry-ignore property of modulo operations to radices higher than 2, but not powers of 2. Detailed hardware complexity models are offered. RNS systems are particularly efficient for executing algorithms which contain a significant amount of multiply-accumulate operations (such as DSP algorithms) even when the unavoidable forward and inverse conversion overhead is considered. Bases of the form
This paper presents circuits for conversion from radix-2 signed-digit residue numbers to binary form. Four reverse converters for combined RNS/SD number systems based on different moduli sets are presented. Implementations are compared with respect to timing, area and area-delay products. Finite impulse response (FIR) filters are used as reference designs in order to evaluate the performance of RNS/SD processing in a typical DSP block using the suggested moduli sets
Euromicro Symposium on Digital System Design, Proceedings, 2003
This paper is focused on low power programmable fast Digital Signal Processors (DSP) design based on a configurable -stage RISC core architecture and on Residue Number Systems (RNS). Several innovative aspects are introduced at the control and datapath architecture levels, which support both the binary system and the RNS. A new moduli set ¾ Ò ½ ¾ ¾Ò ¾ Ò · ½ is also proposed for balancing the processing time in the different RNS channels. Experimental results, obtained trough RDSP implementation on FPGA and ASIC, show that not only a significant reduction in circuit area and power consumption but also a speedup may be achieved with RNS when compared with a binary DSP.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (4)
- E'. J. Taylor, "Residue Arithmetic: A Tutorial with Examples, 'I IEEE Computer Magazine, pp. 50-62,May 1984.
- K. M. Elleithy, M. A. Bay- oumi, and K. P. Lee, "O(10g
- Architectures f o r RNS Arithmetic Decoding," Proc. of the 9th Symposium on Computer Arithmetic, pp. 202-209, Sep. 1989.
- K. M. Elleithy and M. A. Bayoumi, "A 0(1) Algorithm for modulo Addition," IEEE Transactions on Circuits and Systems, vol. 3 7 , no. 5, pp. 628-631, May. 1990.