A massively parallel RNS architecture

Khaled Elleithy

Outline

Title

Hardware and Architecture

A massively parallel RNS architecture

Khaled Elleithy

[1991] Conference Record of the Twenty-Fifth Asilomar Conference on Signals, Systems & Computers

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

In this paper parallelism on the algorithmic, architectural, and arithmetic levels is exploited in the design of a Residue Number System (RNS) based architecture. The architecture is based on modulo processors. Each modulo processor is implemented by two dimensional systolic array composed of very simple cells. The decoding stage is implemented using a 2-0 array, too. The decoding bottleneck is eliminated. The whole architecture is pipelined which leads to high throughput rate.

IAEME Publication

IAEME PUBLICATION, 2020

Arithmetic operations like addition and multiplication are the most important part for the computation in signal processing applications. Mathematical operations can be computed with increased speed in Residue Number System (RNS) than the conventional binary number systems because all the operations are done carry free and in parallel in RNS system. These carry free and parallel operations speed up in the RNS based computation. Selection of proper moduli set and their values offer maximum speed and minimum hardware for designing a signal processing system. In Residue Number System many intermediate results are constant and processing time can also be reduced using look-up table. Though there are many limitations in the RNS system, still RNS system can be used to speed up the total execution time of a system in comparison with conventional binary system.

downloadDownload free PDF View PDFchevron_right

A Novel Exploitation of Errors in Redundant Residue Number System Architecture

Yaw Afriyie

American Journal of Applied Sciences

Residue Number System (RNS) is an unweighted number system that symbolizes big integers with smaller numbers. It can perform operations in particular addition and multiplication in parallel. Because of this property, RNS is extensively used in communication, Finite Impulse Response (FIR), cryptography and signal processing devices. The transfer of data in digital channels is very important for some critical applications where accuracy is very important. In this study, we proposed a novel algorithm that is premised on the Hamming Distance (HD) and one of the reverse conversion methods, which is, the Chinese Remainder Theorem (CRT) and) as a joint technique for the detection and correction of multiple bit errors in RNS. The proposed algorithm provides a more efficient technique that improves on the hardware size and increases the processing speed with fewer iterations when compared with other stateof-the-art schemes. The work analyses the area and delay of the hardware architecture and compared with other similar schemes. The results indicated the effectiveness of our proposed scheme in terms of the area and delay specifications.

downloadDownload free PDF View PDFchevron_right

High Performance Parallel Computing in Residue Number System

Andrei Tchernykh

Residue Number System (RNS) allows performing computation more efficiently. Natural parallelism of representation and processing of numbers makes this number system suitable for applying to high performance computing. We address the main features of application of RNS to high-performance parallel computing. We consider and analyze different stages of data processing in RNS. Based on this analysis, we describe the process of decomposition of algorithms using RNS

downloadDownload free PDF View PDFchevron_right

theta (logN) architectures for RNS arithmetic decoding

Khaled Elleithy

Proceedings of 9th Symposium on Computer Arithmetic

Decoding in Residue Number System (RNS) based architectures can be a bottleneck. A high speed and flexible modulo decoder is an essential computational element to maintain the advantages of RNS. In this paper, a fast and flexible modulo decoder, based on the Chinese Remainder Theorem (CRT), is presented. It decodes a set of residues into its equivalent representation in either unsigned magnitude or 2's complement binary number system. Two different architectures are analyzed; the first one is based on using Carry Save Adders(CSA), while, the other is based on utilizing a modified structure of Carry Save Adders(MCSA). Both architectures are modular and are based on simple cells which leads to efficient VLSI implementation. it has a time complexity of e(IogN).

downloadDownload free PDF View PDFchevron_right

Semi-Custom VLSI Design and Implementation of a New Efficient RNS Division Algorithm

Hoda S Abdel-Aty

The Computer Journal, 1999

In this paper we introduce a new algorithm for division in residue number system, which can be applied to any moduli set. Simulation results indicated that the algorithm is faster than the most competitive published work. To further improve this speed, we customize this algorithm to serve two specific moduli sets: (2 k , 2 k − 1, 2 k−1 − 1) and (2 k + 1, 2 k , 2 k − 1). The customization results in eliminating memory devices (ROMs), thus increasing the speed of operation. A semi-custom VLSI design for this algorithm for the moduli (2 k + 1, 2 k , 2 k − 1) has been implemented, fabricated and tested.

downloadDownload free PDF View PDFchevron_right

Super-high Speed, Accuracy, and Modularized Residue Number System based on Redundant Binary Representation

Narito Fuyutsume

IEEJ Transactions on Electronics, Information and Systems, 2005

The multiplier and divider used for specific hardware of public key cryptosystem arithmetic are constructed from many adders and subtractors to improve the accuracy of the key. However, with the increase of accuracy, the propagation delay problem becomes unavoidable. Although some paper have proposed that a divider using redundant binary representation is effective to cope with this problem, no considerations were given to the problems of rounding error and accuracy of the remainder. This paper proposes a method, based on inherent bit sliced architecture, that can cope with these problems and that is expandable to any level of accuracy. It is expected to make it applicable to hardware for public key cryptosystem that can be flexible in coping with the expansion of the key string and the variable length key.

downloadDownload free PDF View PDFchevron_right

A systolic architecture for modulo multiplication

Khaled Elleithy

IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 1995

nanoseconds. We selected a representative example of a FIR filter with binary weights, and verified using simulation results that the neural network yields weights that enable the filter to perform very close to the theoretical peak performance that one can obtain from the given filter. We showed that the conventional LMS approach is unable to match the performance of the neural network since it cannot select the correct minimum from all the possible minima of the error function.

downloadDownload free PDF View PDFchevron_right

Novel high-radix Residue Number System multipliers and adders

Vassilis Paliouras

Radix- modulo multipliers and adders are introduced in this paper. The proposed architectures are shown to require several times less area than previously reported architectures, for particular moduli of operation. The proposed architectures are preferable in an area-time sense for several cases. The complexity reduction is achieved by extending the carry-ignore property of modulo operations to radices higher than 2, but not powers of 2. Detailed hardware complexity models are offered. RNS systems are particularly efficient for executing algorithms which contain a significant amount of multiply-accumulate operations (such as DSP algorithms) even when the unavoidable forward and inverse conversion overhead is considered. Bases of the form

downloadDownload free PDF View PDFchevron_right

Reverse conversion architectures for signed-digit residue number systems

Lars Bengtsson

This paper presents circuits for conversion from radix-2 signed-digit residue numbers to binary form. Four reverse converters for combined RNS/SD number systems based on different moduli sets are presented. Implementations are compared with respect to timing, area and area-delay products. Finite impulse response (FIR) filters are used as reference designs in order to evaluate the performance of RNS/SD processing in a typical DSP block using the suggested moduli sets

downloadDownload free PDF View PDFchevron_right

RDSP: A RISC DSP based on residue number system

Leonel Sousa

Euromicro Symposium on Digital System Design, Proceedings, 2003

This paper is focused on low power programmable fast Digital Signal Processors (DSP) design based on a configurable -stage RISC core architecture and on Residue Number Systems (RNS). Several innovative aspects are introduced at the control and datapath architecture levels, which support both the binary system and the RNS. A new moduli set ¾ Ò ½ ¾ ¾Ò ¾ Ò · ½ is also proposed for balancing the processing time in the different RNS channels. Experimental results, obtained trough RDSP implementation on FPGA and ASIC, show that not only a significant reduction in circuit area and power consumption but also a speedup may be achieved with RNS when compared with a binary DSP.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (4)

E'. J. Taylor, "Residue Arithmetic: A Tutorial with Examples, 'I IEEE Computer Magazine, pp. 50-62,May 1984.
K. M. Elleithy, M. A. Bay- oumi, and K. P. Lee, "O(10g
Architectures f o r RNS Arithmetic Decoding," Proc. of the 9th Symposium on Computer Arithmetic, pp. 202-209, Sep. 1989.
K. M. Elleithy and M. A. Bayoumi, "A 0(1) Algorithm for modulo Addition," IEEE Transactions on Circuits and Systems, vol. 3 7 , no. 5, pp. 628-631, May. 1990.

Magdy Bayoumi, Khaled Elleithy

IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 1992

An implementation of a fast and flexible residue decoder for residue number system (RNS)-based architectures is proposed. The decoder is based on the Chinese Remainder Theorem (CRT). It decodes a set of residues to its equivalent representation in weighted binary number system. This decoder is flexible since the decoded data can be selected to be either unsigned magnitude or 2's complement binary number. Two different architectures are analyzed; the first one is based on using carry-save adders (CSA's), while the other is based on utilizing modulo adders (MA). The implementation of both architectures is modular and is based on simple cells, which leads to efficient VLSI realization. The proposed decoder is fast; it has a time complexity of O(log N ) ( N is the number of moduli).

downloadDownload free PDF View PDFchevron_right

Novel high-radix residue number system architectures

T. Stouraitis

IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 2000

Novel radix-modulo-arithmetic units for residue number system (RNS)-based architectures are introduced in this paper. The proposed circuits are shown to require several times less area than previously reported architectures for particular moduli of operation, while also being preferable in the area time complexity sense. The complexity reduction is achieved by extending the carry-ignore property of modulo-2 operations to radices higher than two, which are not powers of two. The carry-ignore property is efficiently exploited by introducing simplified digit adders, instead of general radixadders. The proposed simplification of digit adders is possible, since the maximum values of certain intermediate digits produced in the architecture are found to be less than 1. Detailed area and time complexity models are derived for the arithmetic units. The proposed radixarchitectures include multipliers, adders, and merged multipliers-adders. In addition, efficient radixbinary-to-residue and residue-to-binary conversion techniques and architectures are introduced.

downloadDownload free PDF View PDFchevron_right

Multifunction architectures for RNS processors

T. Stouraitis

IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 1999

Novel very large-scale integration architectures and a design methodology for adder-based residue number system (RNS) processors are presented in this paper. The new architectures compute residues for more than one modulus either serially or in parallel, while their use can increase the resource utilization in a processor. Complexity is reduced by sharing common intermediate results among the various RNS moduli channels and/or operations that compose an RNS processor. The presented architectures are distinguished into two subtypes, depending on whether the inter-channel parallelism is preserved or not. The multifunction architecture paradigm is demonstrated by its application in residue multiplication, binary-to-residue conversion, quadratic RNS (QRNS) mapping, and base extension. The derived architectures are compared to previously reported equivalent ones and are found to be efficient in area 2 time product sense. Finally, the proposed design methodology reveals a new tradeoff in residue processor design, leading to more efficient RNS processors.

downloadDownload free PDF View PDFchevron_right

Design and implementation of an RNS division algorithm

Ahmad Hiasat, Hoda S Abdel-Aty

Proceedings 13th IEEE Sympsoium on Computer Arithmetic, 1997

In a recent publication [l], we introduced the main outlines of a new algorithm for division in Residue Number System, which can be applied to any moduli set. Simulation results proved that the algorithm was many times faster than most competitive published work [2]. Determining the position of the most significant nonzero bit of any residue number in that algorithm is the major speed limiting factor. In this paper, we customize the same algorithm to serve two specific moduli sets: (2k,2k -1,2"l -1) and (2k + 1, 2k, 2kl), and thus, eliminate that speed limiting factor. Based on this work, hardware needed to determine most significant bit position has been reduced to a single adder. Therefore, computation time and hardware requirements are substantially improved. This would enable RNS to be a stronger force in building general purpose computers.

downloadDownload free PDF View PDFchevron_right

Formal design of RNS processors

Khaled Elleithy

1991 International Conference on Circuits and Systems, 1991

In this paper a formal design methodorogy is used to design a Residue Number System (RNS) processor. An optimal architecture for the residue decoding process is obtained through this design approach. The architecture is modular, consists of simple cells, and is general for any set of moduli.

downloadDownload free PDF View PDFchevron_right

Residue to binary conversion for RNS arithmetic using only modular look-up tables

Ramdas Kumaresan

Circuits and Systems, IEEE …, 1988

A novel technique for converting from the residue digits in a residue number system (RNS) to weighted binary digits is proposed. This technique is an alternative to existing methods based on the Chinese remainder theorem (CRT) and the mixed-radix conversion (MRC) algorithm. The proposed technique obtains the binary digits in a slice-by-slice fashion, duectly from the residues. The primary advantage of the method is that this conversion technique can be implemented using only modular look-up tables.

downloadDownload free PDF View PDFchevron_right

Design of a Reconfigurable DSP Processor with Bit Efficient Residue Number System

AMITABHA SINHA

International Journal of VLSI Design & Communication Systems, 2012

Residue Number System (RNS), which originates from the Chinese Remainder Theorem, offers a promising future in VLSI because of its carry-free operations in addition, subtraction and multiplication. This property of RNS is very helpful to reduce the complexity of calculation in many applications. A residue number system represents a large integer using a set of smaller integers, called residues. But the area overhead, cost and speed not only depend on this word length, but also the selection of moduli, which is a very crucial step for residue system. This parameter determines bit efficiency, area, frequency etc. In this paper a new moduli set selection technique is proposed to improve bit efficiency which can be used to construct a residue system for digital signal processing environment. Subsequently, it is theoretically proved and illustrated using examples, that the proposed solution gives better results than the schemes reported in the literature. The novelty of the architecture is shown by comparison the different schemes reported in the literature. Using the novel moduli set, a guideline for a Reconfigurable Processor is presented here that can process some predefined functions. As RNS minimizes the carry propagation, the scheme can be implemented in Real Time Signal Processing & other fields where high speed computations are required.

downloadDownload free PDF View PDFchevron_right

An Algorithmic and Architectural Study on Montgomery Exponentiation in RNS

Gianluca Paravati

IEEE Transactions on Computers, 2012

The modular exponentiation on large numbers is computationally intensive. An effective way for performing this operation consists in using Montgomery exponentiation in the Residue Number System (RNS). This paper presents an algorithmic and architectural study of such exponentiation approach. From the algorithmic point of view, new and state-of-the-art opportunities that come from the reorganization of operations and precomputations are considered. From the architectural perspective, the design opportunities offered by well-known computer arithmetic techniques are studied, with the aim of developing an efficient arithmetic cell architecture. Furthermore, since the use of efficient RNS bases with a low Hamming weight are being considered with ever more interest, four additional cell architectures specifically tailored to these bases are developed and the trade-off between benefits and drawbacks is carefully explored. An overall comparison among all the considered algorithmic approaches and cell architectures is presented, with the aim of providing the reader with an extensive overview of the Montgomery exponentiation opportunities in RNS.

downloadDownload free PDF View PDFchevron_right

A massively parallel RNS architecture

Sign up for access to the world's latest research

Abstract

Related papers

References (4)

Related papers

Related topics