A radix-4 scalable design
2005, IEEE Potentials
https://doi.org/10.1109/MP.2005.1462460…
5 pages
1 file
Sign up for access to the world's latest research
Abstract
This paper presents an algorithm and architecture for a scalable radix-4 multiplier that makes use of two types of digit recoding in order to generate an efficient solution. Experimental results are shown to demonstrate that the proposed radix-4 Montgomery Multiplier design has better area/time tradeoff than previous radix-2 and radix-8 scalable designs.
Related papers
2011 IEEE 54th International Midwest Symposium on Circuits and Systems (MWSCAS), 2011
Novel multiple radix architectures for 7 N x 7 N integer mUltiplications, where N = 2k and k is a non-negative integer, based on recent developments involving multiple radix representation of numbers is presented. Hardware implemen tations for a 7 x 7 bit multiple-radix multiplier is provided, followed by larger multiplier architectures that employ the 7 x 7 architecture as a building-block based on the Karatsuba Algo rithm. The methodology employed for prototyping the multiplier circuits using Xilinx FPGA devices is described. Measured results in terms of speed and area complexity from on-chip physical FPGA realizations are provided. This recursive architecture provides a new method for building multiple-radix parallel hardware multipliers operating on large integer multiplicands with potential future applications in areas such as computational number theory, digital arithmetic and computer security.
In this paper, we describe a low power and high speed multiplier suitable for standard cell-based ASIC design methodologies. For the purpose, an optimized booth encoder, compact 28-2, 27-2, …, and 10-2 compressors, and XOR based adder are proposed. While the whole design is coded in Verilog-HDL language and implemented through commercially available EDA tool chain, the implementation gives comparable results to full custom designs [1] . Realistic simulations using extracted timing parameters from the layout show that the propagation time of a critical path is 3.25ns at 2.5V on a 0.18um process technology, which is almost 21% faster than the conventional multiplier .
We propose a new sequential multiplier design that generates the radix-16 partial products (e.g., ) as two high ( ) and low ( ) components, such that = 4 + , , ∈ {0, 1, 2, 3} × , where denotes the multiplicand. The required hard 3 multiple is generated in a preliminary cycle to the advantage of reducing the cycle time of the main iteration. Two radix-16 carry-save adders are used to generate the radix-16 accumulated partial product. The synthesis results show improved latency, power dissipation, and energy consumption over the previous relevant designs at the cost of additional silicon area, while however, the energy-area product is also lowered.
CVR Journal of Science & Technology, 2014
Fast multipliers are crucial in digital signal processing systems. The speed of multiply operation is of great importance in digital signal processors and general purpose processors especially since the media processing took off. As the need for efficient design is increasing without compromising the performance, industry has to concentrate on the tradeoffs. Here, a modified Booth multiplier is implemented using an algorithm that reduces the number of partial Products to be generated using the fastest multiplication algorithm. In this work, 8X8 multipliers with maximum range of input from-128 to +127 and negative numbers represented in 2's complement form can be used. Booth Encoder i.e., Partial Product Generator and Hybrid adder are used for the design of modified booth multiplier to achieve minimum delay and less area.
IEEE Transactions on Computers, 2010
The new generation of high-performance decimal floating-point units (DFUs) is demanding efficient implementations of parallel decimal multipliers. In this paper, we describe the architectures of two parallel decimal multipliers. The parallel generation of partial products is performed using signed-digit radix-10 or radix-5 recodings of the multiplier and a simplified set of multiplicand multiples. The reduction of partial products is implemented in a tree structure based on a decimal multioperand carry-save addition algorithm that uses unconventional (non BCD) decimal-coded number systems. We further detail these techniques and present the new improvements to reduce the latency of the previous designs, which include: optimized digit recoders for the generation of 2 n-tuples (and 5-tuples), decimal carry-save adders (CSAs) combining different decimal-coded operands, and carry-free adders implemented by special designed bit counters. Moreover, we detail a design methodology that combines all these techniques to obtain efficient reduction trees with different area and delay trade-offs for any number of partial products generated. Evaluation results for 16-digit operands show that the proposed architectures have interesting area-delay figures compared to conventional Booth radix-4 and radix-8 parallel binary multipliers and outperform the figures of previous alternatives for decimal multiplication.
18th IEEE Symposium on Computer Arithmetic (ARITH '07), 2007
This paper introduces two novel architectures for parallel decimal multipliers. Our multipliers are based on a new algorithm for decimal carry-save multioperand addition that uses a novel BCD-4221 recoding for decimal digits. It significantly improves the area and latency of the partial product reduction tree with respect to previous proposals. We also present three schemes for fast and efficient generation of partial products in parallel. The recoding of the BCD-8421 multiplier operand into minimally redundant signed-digit radix-10, radix-4 and radix-5 representations using new recoders reduces the complexity of partial product generation. In addition, SD radix-4 and radix-5 recodings allow the reuse of a conventional parallel binary radix-4 multiplier to perform combined binary/decimal multiplications. Evaluation results show that the proposed architectures have interesting area-delay figures compared to conventional Booth radix-4 and radix-8 parallel binary multipliers and other representative alternatives for decimal multiplication.
2013
A high performance and efficient area implementation of Radix-8 booth multiplier is presented in this paper. This is implemented using 4:2 and 8:2 compressors and the design structure may reach up to 126 bits. The speed of operation is been increased four times by appending it with carry look ahead adder. The performance of the proposed algorithm is analyzed using higher order FIR filter. The results are evaluated and synthesized using Xilinx ISE 9.2i and targeted towards Spartan 3 FPGA
This paper introduces novel architecture for Radix-10 decimal multiplier. The new generation of highperformance decimal floating-point units (DFUs) is demanding efficient implementations of parallel decimal multiplier. The parallel generation of partial products is performed using signed-digit radix-10 recoding of the multiplier and a simplified set of multiplicand multiples. The reduction of partial products is implemented in a tree structure based on a new algorithm decimal multioperand carry-save addition that uses a unconventional decimal-coded number systems. We further detail these techniques and it significantly improves the area and latency of the previous design, which include: optimized digit recoders, decimal carry-save adders (CSA's) combining different decimal-coded operands, and carry free adders implemented by special designed bit counters.
IEEE Transactions on Computers, 2014
We present the algorithm and architecture of a BCD parallel multiplier that exploits some properties of two different redundant BCD codes to speedup its computation: the redundant BCD excess-3 code (XS-3), and the overloaded BCD representation (ODDS). In addition, new techniques are developed to reduce significantly the latency and area of previous representative highperformance implementations. Partial products are generated in parallel using a signed-digit radix-10 recoding of the BCD multiplier with the digit set [-5, 5], and a set of positive multiplicand multiples (0X, 1X, 2X, 3X, 4X, 5X) coded in XS-3. This encoding has several advantages. First, it is a self-complementing code, so that a negative multiplicand multiple can be obtained by just inverting the bits of the corresponding positive one. Also, the available redundancy allows a fast and simple generation of multiplicand multiples in a carryfree way. Finally, the partial products can be recoded to the ODDS representation by just adding a constant factor into the partial product reduction tree. Since the ODDS uses a similar 4-bit binary encoding as non-redundant BCD, conventional binary VLSI circuit techniques, such as binary carry-save adders and compressor trees, can be adapted efficiently to perform decimal operations. To show the advantages of our architecture, we have synthesized a RTL model for 16 Â 16-digit and 34 Â 34-digit multiplications and performed a comparative survey of the previous most representative designs. We show that the proposed decimal multiplier has an area improvement roughly in the range 20-35 percent for similar target delays with respect to the fastest implementation.
2007 18th European Conference on Circuit Theory and Design, 2007
In this paper, we introduce a novel high-radix binary signed digit (BSD) serial-parallel multiplier suitable for low-power high-speed multiplication. The proposed N-bit×N-bit radix-16 serial-parallel multiplier can reduce the number of accumulation cycles of partial products to as much as N/4, and eliminate most of the invertion operations which consume power in a conventional multiplier in generating the partial products. Unlike other high-radix methods, the pre-multiplication in the new algorithm employs a BSD method which requires no extra adder, and thus removes the extra delay for additions which hinders other high-radix algorithms. I.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (6)
- P. L. Montgomery, "Modular multiplication without trial division," Mathematics of Computation, vol. 44, no. 170, pp. 519-521, April 1985.
- A. F. Tenca and C. K. Koc, "A word-based algorithm and scalable architecture for montgomery multiplication," in Cryptographic Hardware and Embedded Systems -CHES 1999, C ¸. K. Koç and C. Paar, Eds. 1999, Lecture Notes in Computer Science, No. 1717, pp. 94-108, Springer, Berlin, Germany.
- A. F. Tenca, G. Todorov, and C ¸. K. Koç, "High-radix design of a scalable modular multiplier," in Cryptographic Hardware and Embedded Systems -CHES 2001, C ¸. K. Koç and C. Paar, Eds. 2001, Lecture Notes in Computer Science, No. 1717, pp. 189-206, Springer, Berlin, Germany.
- A. D. Booth, "A signed binary multiplication technique," Q. J. Mech. Appl. Math., vol. 4, no. 2, pp. 236-240, 1951.
- G. Todorov, "ASIC design, implementation and analysis of a scalable high-radix Montgomery multiplier," Master thesis, Oregon State University, USA, December 2000.
- L. A. Tawalbeh, "Radix-4 ASIC Design of a Scalable Montgomery Modular Multiplier using Encoding Tech- niques," M.S. thesis, Oregon State University, USA, October 2002.