A radix-4 scalable design

Tawalbeh, L.A.; Tenca, A.F.; Koc, C.K.

doi:10.1109/MP.2005.1462460

Outline

Title

Abstract

Introduction

Experimental Results and Analysis

Conclusion

References

A radix-4 scalable design

A. Tenca

Loai Tawalbeh

2005, IEEE Potentials

https://doi.org/10.1109/MP.2005.1462460

visibility

…

description

5 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

This paper presents an algorithm and architecture for a scalable radix-4 multiplier that makes use of two types of digit recoding in order to generate an efficient solution. Experimental results are shown to demonstrate that the proposed radix-4 Montgomery Multiplier design has better area/time tradeoff than previous radix-2 and radix-8 scalable designs.

Arjuna Madanayake

2011 IEEE 54th International Midwest Symposium on Circuits and Systems (MWSCAS), 2011

Novel multiple radix architectures for 7 N x 7 N integer mUltiplications, where N = 2k and k is a non-negative integer, based on recent developments involving multiple radix representation of numbers is presented. Hardware implemen tations for a 7 x 7 bit multiple-radix multiplier is provided, followed by larger multiplier architectures that employ the 7 x 7 architecture as a building-block based on the Karatsuba Algo rithm. The methodology employed for prototyping the multiplier circuits using Xilinx FPGA devices is described. Measured results in terms of speed and area complexity from on-chip physical FPGA realizations are provided. This recursive architecture provides a new method for building multiple-radix parallel hardware multipliers operating on large integer multiplicands with potential future applications in areas such as computational number theory, digital arithmetic and computer security.

downloadDownload free PDF View PDFchevron_right

54x54-bit Radix-4 Multiplier

Antyakula Ramesh

In this paper, we describe a low power and high speed multiplier suitable for standard cell-based ASIC design methodologies. For the purpose, an optimized booth encoder, compact 28-2, 27-2, …, and 10-2 compressors, and XOR based adder are proposed. While the whole design is coded in Verilog-HDL language and implemented through commercially available EDA tool chain, the implementation gives comparable results to full custom designs [1] . Realistic simulations using extracted timing parameters from the layout show that the propagation time of a critical path is 3.25ns at 2.5V on a 0.18um process technology, which is almost 21% faster than the conventional multiplier .

downloadDownload free PDF View PDFchevron_right

Fast Energy Efficient Radix-16 Sequential Multiplier

Vijaya Lakshmi

We propose a new sequential multiplier design that generates the radix-16 partial products (e.g., ) as two high ( ) and low ( ) components, such that = 4 + , , ∈ {0, 1, 2, 3} × , where denotes the multiplicand. The required hard 3 multiple is generated in a preliminary cycle to the advantage of reducing the cycle time of the main iteration. Two radix-16 carry-save adders are used to generate the radix-16 accumulated partial product. The synthesis results show improved latency, power dissipation, and energy consumption over the previous relevant designs at the cost of additional silicon area, while however, the energy-area product is also lowered.

downloadDownload free PDF View PDFchevron_right

Design Of High Performance Configurable Radix-4 Booth Multiplier Using Cadence Tools

Esther Rani

CVR Journal of Science & Technology, 2014

Fast multipliers are crucial in digital signal processing systems. The speed of multiply operation is of great importance in digital signal processors and general purpose processors especially since the media processing took off. As the need for efficient design is increasing without compromising the performance, industry has to concentrate on the tradeoffs. Here, a modified Booth multiplier is implemented using an algorithm that reduces the number of partial Products to be generated using the fastest multiplication algorithm. In this work, 8X8 multipliers with maximum range of input from-128 to +127 and negative numbers represented in 2's complement form can be used. Booth Encoder i.e., Partial Product Generator and Hybrid adder are used for the design of modified booth multiplier to achieve minimum delay and less area.

downloadDownload free PDF View PDFchevron_right

Improved Design of High-Performance Parallel Decimal Multipliers

Álvaro Vázquez

IEEE Transactions on Computers, 2010

The new generation of high-performance decimal floating-point units (DFUs) is demanding efficient implementations of parallel decimal multipliers. In this paper, we describe the architectures of two parallel decimal multipliers. The parallel generation of partial products is performed using signed-digit radix-10 or radix-5 recodings of the multiplier and a simplified set of multiplicand multiples. The reduction of partial products is implemented in a tree structure based on a decimal multioperand carry-save addition algorithm that uses unconventional (non BCD) decimal-coded number systems. We further detail these techniques and present the new improvements to reduce the latency of the previous designs, which include: optimized digit recoders for the generation of 2 n-tuples (and 5-tuples), decimal carry-save adders (CSAs) combining different decimal-coded operands, and carry-free adders implemented by special designed bit counters. Moreover, we detail a design methodology that combines all these techniques to obtain efficient reduction trees with different area and delay trade-offs for any number of partial products generated. Evaluation results for 16-digit operands show that the proposed architectures have interesting area-delay figures compared to conventional Booth radix-4 and radix-8 parallel binary multipliers and outperform the figures of previous alternatives for decimal multiplication.

downloadDownload free PDF View PDFchevron_right

A New Family of High.Performance Parallel Decimal Multipliers

Álvaro Vázquez

18th IEEE Symposium on Computer Arithmetic (ARITH '07), 2007

This paper introduces two novel architectures for parallel decimal multipliers. Our multipliers are based on a new algorithm for decimal carry-save multioperand addition that uses a novel BCD-4221 recoding for decimal digits. It significantly improves the area and latency of the partial product reduction tree with respect to previous proposals. We also present three schemes for fast and efficient generation of partial products in parallel. The recoding of the BCD-8421 multiplier operand into minimally redundant signed-digit radix-10, radix-4 and radix-5 representations using new recoders reduces the complexity of partial product generation. In addition, SD radix-4 and radix-5 recodings allow the reuse of a conventional parallel binary radix-4 multiplier to perform combined binary/decimal multiplications. Evaluation results show that the proposed architectures have interesting area-delay figures compared to conventional Booth radix-4 and radix-8 parallel binary multipliers and other representative alternatives for decimal multiplication.

downloadDownload free PDF View PDFchevron_right

High Performance RADIX-8 Multiplier Using 8 : 2 Compressors 1

Koteswara Rao Vaddempudi

2013

A high performance and efficient area implementation of Radix-8 booth multiplier is presented in this paper. This is implemented using 4:2 and 8:2 compressors and the design structure may reach up to 126 bits. The speed of operation is been increased four times by appending it with carry look ahead adder. The performance of the proposed algorithm is analyzed using higher order FIR filter. The results are evaluated and synthesized using Xilinx ISE 9.2i and targeted towards Spartan 3 FPGA

downloadDownload free PDF View PDFchevron_right

RADIX-10 PARALLEL DECIMAL MULTIPLIER

mathew george

This paper introduces novel architecture for Radix-10 decimal multiplier. The new generation of highperformance decimal floating-point units (DFUs) is demanding efficient implementations of parallel decimal multiplier. The parallel generation of partial products is performed using signed-digit radix-10 recoding of the multiplier and a simplified set of multiplicand multiples. The reduction of partial products is implemented in a tree structure based on a new algorithm decimal multioperand carry-save addition that uses a unconventional decimal-coded number systems. We further detail these techniques and it significantly improves the area and latency of the previous design, which include: optimized digit recoders, decimal carry-save adders (CSA's) combining different decimal-coded operands, and carry free adders implemented by special designed bit counters.

downloadDownload free PDF View PDFchevron_right

Fast Radix-10 Multiplication Using Redundant BCD Codes

ALVARO FLORES VAZQUEZ

IEEE Transactions on Computers, 2014

We present the algorithm and architecture of a BCD parallel multiplier that exploits some properties of two different redundant BCD codes to speedup its computation: the redundant BCD excess-3 code (XS-3), and the overloaded BCD representation (ODDS). In addition, new techniques are developed to reduce significantly the latency and area of previous representative highperformance implementations. Partial products are generated in parallel using a signed-digit radix-10 recoding of the BCD multiplier with the digit set [-5, 5], and a set of positive multiplicand multiples (0X, 1X, 2X, 3X, 4X, 5X) coded in XS-3. This encoding has several advantages. First, it is a self-complementing code, so that a negative multiplicand multiple can be obtained by just inverting the bits of the corresponding positive one. Also, the available redundancy allows a fast and simple generation of multiplicand multiples in a carryfree way. Finally, the partial products can be recoded to the ODDS representation by just adding a constant factor into the partial product reduction tree. Since the ODDS uses a similar 4-bit binary encoding as non-redundant BCD, conventional binary VLSI circuit techniques, such as binary carry-save adders and compressor trees, can be adapted efficiently to perform decimal operations. To show the advantages of our architecture, we have synthesized a RTL model for 16 Â 16-digit and 34 Â 34-digit multiplications and performed a comparative survey of the previous most representative designs. We show that the proposed decimal multiplier has an area improvement roughly in the range 20-35 percent for similar target delays with respect to the fastest implementation.

downloadDownload free PDF View PDFchevron_right

A low-power high-radix serial-parallel multiplier

Danny Crookes

2007 18th European Conference on Circuit Theory and Design, 2007

In this paper, we introduce a novel high-radix binary signed digit (BSD) serial-parallel multiplier suitable for low-power high-speed multiplication. The proposed N-bit×N-bit radix-16 serial-parallel multiplier can reduce the number of accumulation cycles of partial products to as much as N/4, and eliminate most of the invertion operations which consume power in a conventional multiplier in generating the partial products. Unlike other high-radix methods, the pre-multiplication in the new algorithm employs a BSD method which requires no extra adder, and thus removes the extra delay for additions which hinders other high-radix algorithms. I.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (6)

P. L. Montgomery, "Modular multiplication without trial division," Mathematics of Computation, vol. 44, no. 170, pp. 519-521, April 1985.
A. F. Tenca and C. K. Koc, "A word-based algorithm and scalable architecture for montgomery multiplication," in Cryptographic Hardware and Embedded Systems -CHES 1999, C ¸. K. Koç and C. Paar, Eds. 1999, Lecture Notes in Computer Science, No. 1717, pp. 94-108, Springer, Berlin, Germany.
A. F. Tenca, G. Todorov, and C ¸. K. Koç, "High-radix design of a scalable modular multiplier," in Cryptographic Hardware and Embedded Systems -CHES 2001, C ¸. K. Koç and C. Paar, Eds. 2001, Lecture Notes in Computer Science, No. 1717, pp. 189-206, Springer, Berlin, Germany.
A. D. Booth, "A signed binary multiplication technique," Q. J. Mech. Appl. Math., vol. 4, no. 2, pp. 236-240, 1951.
G. Todorov, "ASIC design, implementation and analysis of a scalable high-radix Montgomery multiplier," Master thesis, Oregon State University, USA, December 2000.
L. A. Tawalbeh, "Radix-4 ASIC Design of a Scalable Montgomery Modular Multiplier using Encoding Tech- niques," M.S. thesis, Oregon State University, USA, October 2002.

Loai Tawalbeh

Asilomar Conference on Signals, Systems & Computers, 2003

This paper presents the algorithm and architecture of a scalable radix-4 Montgomery multiplier. The straightforward implementation of a radix-4 design based on the techniques already published results in a poor solution. In this paper we present an algorithm and architecture for the scalable radix-4 multiplier that makes use of two types of digit receding in order to generate an efficient

downloadDownload free PDF View PDFchevron_right

An Improved Unified Scalable Radix2 Montgomery Multiplier

Steven Hsu

2005

Tenca-Koç unified scalable radix-2 Montgomery multiplier with half the latency for small and moderate precision o perands and half the queue memory requirement. Like the Tenca-Koç multiplier, this design is reconfigurable to accept any input precision in either GF(p) or GF(2 n ) up to the size of the on-chip memory. An FPGA implementation can perform 1024-bit modular exponentiation in 16 ms using 5598 4-input lookup tables, making it the fastest unified scalable design yet reported.

downloadDownload free PDF View PDFchevron_right

A hybrid radix-4/radix-8 low power, high speed multiplier architecture for wide bit widths

Eby Friedman

1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96, 1996

A hybrid radix-4/radix-8 architecture targeted for high bit multipliers is presented as a compromise between the high speed of a radix-4 multiplier architecture and the low power dissipation of a radix-8 multiplier architecture. In this hybrid radix-4/radix-8 multiplier architecture, the performance bottleneck of a radix-8 multiplier, the generation of three times the multiplicand for use in generating the radix-8 partial product, is performed in parallel with the reduction of the radix+ partird products rather than serially, as in a radix-8 multiplier. This hybrid radix+radix-8 multiplier architecture requires 13% less power for a 64 x 64 bit multiplier, and results in only a 9% increase in delay, as com-"pared with a radix~implementation, When supply voltage is sealed such that all multipliers exhibit the same delay, the 64 x 64 bit hybrid radixJVradix-8 multiplier dissipates less power than either the radix-4 or radix-8 multipliers. The hybrid radix-4/radix-8 amhiteeture is therefore appropriate for those applications that must dissipate minimal power and operate at high speeds.

downloadDownload free PDF View PDFchevron_right

Design and Comparison of High Speed Radix-8 and Radix-16 Booth's Multipliers

romika Choudhary

Multiplier is one of the hardware block which generally occupies a significant chip area and is required to be minimized which will be fruitful to number of applications in which multiplier blocks constitute an important unit such as digital signal processing (DSP) systems or computational techniques. Battery operated systems require low power devices to be implemented which can be minimized if the hardware required for the device is reduced logically. This paper focuses the DSP applications in which multiplier is significantly used and proposes a technique that helps in reducing the hardware as well as delay leading to the rise in performance of the system thus helping in increasing the operation frequency by a significant value. A 16-bit multiplier has been designed using a radix-8 and radix-16 Booth's multiplication that reduces number of partial products.

downloadDownload free PDF View PDFchevron_right

Parallelized radix-4 scalable montgomery multipliers

David Haris

Proceedings of the 20th annual conference on Integrated circuits and systems design - SBCCI '07, 2007

This paper describes a parallelized radix-4 scalable Montgomery multiplier implementation. The design does not require hardware multipliers, and uses parallelized multiplication to shorten the critical path. By left-shifting the sources rather than right-shifting the result, the latency between processing elements is shortened from two cycles to nearly one. The new design can perform 1024-bit modular exponentiation in 8.7 ms and 256-bit exponentiation in 0.36 ms using 5916 Virtex2 4-input lookup tables. This is comparable to radix-2 for long multiplies and nearly twice as fast for short ones.

downloadDownload free PDF View PDFchevron_right

A nonredundant-radix-4 serial multiplier

Jack Meador

Solid-State Circuits, IEEE Journal of, 1989

Of the serial multipliers documented in the literature, those based upon the modified Booth algorithm dominate in applications where reduced latency and area are required. The modified Booth method com putes redundant-radix-4 partial products by multiplying the multiplicand by 0, ± I, or ± 2 in each stage. Interpreting multiplier data in terms of radices other than binary or redundant-radix-4 has been rejected in the past using the rationale that each stage would become unduly complex and counte .... lct gains otherwise obtained from the reduced number of stages. This paper describes a serial multiplier based upon a new mapping of a nonredundant·radix·4 multiplication algorithm. This multiplier forms bi nary partial products by adding multiples of 0,1,2, or 3 times the multipli cand in all internal modules. The circuit described uses simpler recOIling circuitry while retaining the same order area and time of the modified Booth multiplier.

downloadDownload free PDF View PDFchevron_right

HIGH PERFORMANCE RADIX-8 MULTIPLIER USING 8:2 COMPRESSORS

Koteswara Rao Vaddempudi

A high performance and efficient area implementation of Radix-8 booth multiplier is presented in this paper. This is implemented using 4:2 and 8:2 compressors and the design structure may reach up to 126 bits. The speed of operation is been increased four times by appending it with carry look ahead adder. The performance of the proposed algorithm is analyzed using higher order FIR filter. The results are eval uat ed ansdy nt hesi zed usi ng Xili nx ISE 9.2i and targetedt owar ds Spart an 3 FPGA

downloadDownload free PDF View PDFchevron_right

Architectures for multiple constant decimal multiplication

Sara sadat Hoseininasab

Computers & Electrical Engineering, 2019

Due to the increasing demand for decimal calculations in the business, financial and economic world, decimal arithmetic circuits have been much considered by system designers. This is mainly because, these applications heavily depend on decimal arithmetic since the results must match exactly those obtained by human calculations. While decimal multiplication is one of the most frequent and complex-to-implement decimal operations, the special case of constant decimal multiplication is widely used in the economic and financial applications. In this paper, we propose two ideas, named "Constant Decimal TCSD" (CDT) and "Constant Decimal DDDS" (CDD), and their hardware implementations for realizing multiple constant decimal multiplication. In the CDT and CDD architectures, the partial products are generated using a set of positive multiplicand multiples coded in2 s complement signed-digit format (TCSD) and binary coded decimal (BCD), respectively. We also present two new (3:1) compressors to reduce the number of partial products in both designs, one based on a new Double Decimal Digit Set (DDDS), which is not only selfcomplementing but also its redundancy, allows carry-free addition. Finally, a redundant to non-redundant converter recodes the TCSD and DDDS product to BCD in the first and second schemes, respectively. Hardware synthesis evaluation shows that compared to the most recent 16 × 16 decimal multipliers, delay, area, power consumption and PDP of the proposed multiple constant multipliers improve up to 57%, 89%, 93% and 97%, respectively.

downloadDownload free PDF View PDFchevron_right

Fast Combined Decimal/Binary Multiplier Based on Redundant BCD 4221-8421Digit Recoding

Mohammed Nabil

Basrah journal for engineering science, 2017

Many applications consider floating point arithmeticas a key component of the computations. Combineddecimal/binary arithmetic becomes an important topicsupports high-speed decimal/binary applications. A new 64-bit(16×16 digit) the combined decimal/binary multiplier is proposedand implemented in this work that can be used for both fusedmultiply-add (FMA) and multiplier unit. A new partialproducts reduction tree is shared between decimal and binarymultiplier unit. The valuation and comparison result betweenthe proposed multiplier and the previous most recent worksshows 4.66 % less delay than combined decimal/binarymultiplier and 19.33 % less delay than the fastest standalonedecimal multiplier.

downloadDownload free PDF View PDFchevron_right

A hybrid radix-4/madix-8 low power signed multiplier architecture

Eby Friedman

IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 1997

A hybrid radix-4/radix-8 architecture targeted for high bit, general purpose, digital multipliers is presented as a compromise between the high speed of a radix-4 multiplier architecture and the low power dissipation of a radix-8 multiplier architecture. In this hybrid radix-4/radix-8 multiplier architecture, the performance bottleneck of a radix-8 multiplier, the generation of three times the multiplicand for use in generating the radix-8 partial product, is performed in parallel with the reduction of the radix-4 partial products rather than serially, as in a radix-8 multiplier. This hybrid radix-4/radix-8 multiplier architecture requires 13% less power for a 64 2 64-b multiplier, and results in only a 9% increase in delay, as compared with a radix-4 implementation.

downloadDownload free PDF View PDFchevron_right

A radix-4 scalable design

Sign up for access to the world's latest research

Abstract

Related papers

References (6)

Related papers