Background of the Invention
[0001] The present invention relates in general to digital multipliers and more specifically
to improving the speed at which partial products are summed to form the final product
of the multiplication.
[0002] Binary multiplication is an important function in many digital signal processing
applications. Some applications further require accumulation of a product with the
results of previous operations (e.g. forming a sum of products). A versatile multiplier
circuit must have the capability to perform these functions in either two's complement
or unsigned magnitude notation.
[0003] Many schemes are known in the art for reducing the time required to perform a binary
multiplication. For example, many different encoding methods have been devised which
reduce the number of partial products which must be added up to form the final product.
The modified Booth algorithm is one of these which is often used in integrated circuit
digital multipliers.
[0004] Attempts have also been made to speed up the summation of the partial products. In
U.S. patent 4,545,028, issued to Ware, the adder array is divided into blocks so that
different blocks can perform different parts of the addition in parallel, even though
all of the addition within each block is done in ripple fashion. Furthermore, the
first block can only contain four partial products and the remaining blocks must match
an arithmetic progression so that carries from one block appear when needed by the
next block.
[0005] Summation can also be speeded up through use of a carry look-ahead adder. The propagation
of carries through a sequential series of adder stages in ripple fashion requires
a greater period of time the greater the number of bits in the addends. In a carry
look-ahead adder, logic circuitry provides concurrent carry propagation rather than
sequential. However, the bit size of a carry look-ahead adder is limited because the
circuit complexity, gate count and chip area rapidly increase as bit size increases.
[0006] In THE RADIO AND ELECTRONIC ENGINEER, Vol. 49, No. 5, May 1979, pages 250-254, F.
Demmelmeir: "A fast multiplier module", there is disclosed a digital multiplier employing
a pipelined architecture having in cascade two carry save adders and one carry propagate
adder. Whilst this does achieve speeding up of the multiplication operation, pipelining
as a technique is not suitable in all circumstances.
Summary of the Invention
[0007] In the present invention, an alternative approach is provided for fast, parallel
summation of partial products with minimum added complexity and space.
[0008] The present invention provides a multiplier circuit for producing the sum of a multiplier
signal and a multiplicand signal comprising: partial product generating means for
generating N addends, of different rank, each addend comprising a set of partial products
comprised of an ordered set of bits, in response to a multiplier signal and a multiplicand
signal applied to the inputs of said partial product generating means; a carry save
array coupled to said partial product generating means for adding the bits of a first
group of R addends taken from said N addends and for generating the least significant
bits of said sum and generating intermediate sum and carry signals; characterized
by a first carry look-ahead array operating in parallel and simultaneously with said
carry save array coupled to said partial product generating means for adding substantially
all of the bits of said N addends not included in said first group to generate an
intermediate product; and a second carry look-ahead array coupled to said carry save
array and to said first carry look-ahead array, said second carry look-ahead array
generating the most significant bits of said sum from said intermediate product and
from said intermediate sum and carry signals; where N and R are integers with R being
greater than one half of N. A second aspect of the invention is a method as defined
in claim 8.
Brief Description of the Drawings
[0009] The novel features of the invention are set forth with particularity in the appended
claims. The invention itself, as to organization and method of operation, together
with further objects and advantages thereof, may best be understood by reference to
the following detailed description in conjunction with the accompanying drawings in
which:
Figure 1 is a block diagram of one embodiment of a digital multiplier including the
triple array summation of the present invention.
Figure 2 shows the grouping of addend bits for an exemplary 16x16 multiplier/accumulator
with nine partial products generated using encoding.
Figures 3-8 taken together provide a schematic diagram of one embodiment of the triple
array architecture corresponding to the multiplier/accumulator of Figure 2.
Detailed Description of the Invention
[0010] Turning now to Figure 1, a triple array multiplier architecture will be described.
The circuit includes a multiplier input 10 and a multiplicand input 11. A multiplier
signal at multiplier input 10 is applied to an encoding circuit 12 and is converted
to an encoded input 13. Encoding may be performed according to several well known
algorithms (e.g. a modified Booth's algorithm) so that the number of partial products
generated from encoded input 13 and multiplicand input 11 is less than would otherwise
be generated. For example, using a modified Booth algorithm with a multiplier and
multiplicand length of 16 bits each, encoded input 13 is encoded into nine terms.
As a result, partial product generator 14 would generate 9 partial products to be
added to form the final product.
[0011] Partial product generator 14 has as its output a plurality of partial products of
increasing rank. The bits of the partial products are divided into groups which are
summed in a triple adder array. SETA, one group of partial product bits, is input
to the ADDA adder array. SETB, another group of partial products bits, is input to
the ADDB adder array. A small number of remaining partial product bits SETCʹ can optionally
be provided to the ADDC adder array.
[0012] In a preferred embodiment of the invention, SETA includes the bits of the r partial
products of lowest order. The value of r is preferably greater than one-half n, where
n is the total number of partial products. The results of the addition in the ADDA
array include a low order portion of the final product, an intermediate sum SUMA and
an intermediate carry CARA. The low order result is provided to a register 15 and
SUMA and CARA signals are input to the ADDC adder array.
[0013] Included in SETB are all or substantially all of the remaining partial product bits.
The result of the addition in the ADDB array is SUMB, which is input to the ADDC adder
array. There may be a small number of partial product bits which correspond to available
adder locations in the ADDC array not being used to add SUMA, CARA and SUMB. These
bits may be included in SETCʹ which bypasses the ADDA and ADDB arrays to be included
in the ADDC array. The result of the addition in the ADDC adder array is the high
order portion of the final result which is provided to register 15.
[0014] According to the triple array architecture of the present invention, high speed and
low complexity are achieved by implementing ADDA as a carry save array and ADDB and
ADDC as carry look-ahead arrays. Thus, the ADDA array sums the low rank partial products
while taking advantage of the low complexity and small chip area of a carry save array.
The high speed of the carry look-ahead adder is advantageously employed in the ADDB
and ADDC arrays in performing the higher rank summation. More specifically, the ADDA
array preferably comprises r-1 rows of carry save adders for adding the partial products
in SETA. ADDA may also comprise a row (i.e. a total of r rows) for handling accumulation
and two's complement notation. ADDB and ADDC preferably each comprise one or more
rows of carry save adders followed by a carry look-ahead adder.
[0015] In another aspect of the preferred embodiment, the value of r is selected such that
the time delay of carry save addition in ADDA is most closely equal to the time delay
of carry look-ahead addition in ADDB. In this way, the architecture is optimized and
SUMA, CARA and SUMB will be presented simultaneously to the ADDC array.
[0016] The contents of register 15 provide the output of the multiplier. In order to perform
accumulation, the contents of register 15 may be fed back into the ADDA array for
inclusion in subsequent processing.
[0017] In order to handle either signed or unsigned magnitude notation, the multiplicand
(at input 11) is extended by two bits and the multiplier (at input 10) is extended
by one bit. The value of the extended high order bits depends on the notation used
and the value of the most significant bit of the original number. Thus, when signed
notation is being used, the value of the most significant bit is repeated in each
of the extra bits, e.g. a logical "1" in the MSB position of the multiplicand would
be repeated in the two extension bits of input 11. When unsigned notation is being
used, each of the extra bits is forced to a logical "0".
[0018] In order to also handle two's complement signed notation, partial product generator
14 may also include a two's complement register 16. Register 16 has a plurality of
bits, each bit corresponding to a partial product. Each bit can be set in order to
indicate that the corresponding partial product in two's complement notation is negative.
When performing the addition of the partial products, the contents of register 16
must be added in. Preferably, each two's complement bit is added in the array which
includes its respective partial product.
[0019] An example of a specific grouping of bits for a triple array multiplier adapted to
perform 16 bit by 16 bit multiplication/accumulation using modified Booth algorithm
encoding to generate 9 partial products is shown in Figure 2. The 9 partial products
are designated P and P0 to P7. Each designation is followed by a bit designation 0
to H. P is the lowest order or least significant partial product. P7 is the highest
order partial product. The least significant bits are designated 0 and the most significant
bits by H. Figure 2 also shows the two's complement bits designated TC and TC0 to
TC7. The contents of the accumulator register are designated OR0 to ORX2 and the two's
complement bit for the accumulator register is shown as TCA.
[0020] The bits in Figure 2 are arranged such that bits of equal significance are in the
same vertical column. Thus, bit significant increases moving towards the right. Each
two's complement bit has the same significance as the last significant bit of its
respective partial product (or the accumulator register in the case of TCA).
[0021] As shown in Figure 2, SETA includes the bits of 6 partial products, namely P and
P0 to P4, the accumulator contents OR0 to ORX2 and the corresponding two's complement
bits TC, TC0 to TC4 and TCA. SETB includes partial products P6 and P7 and corresponding
two's complement bits TC6 and TC7, and includes bits P52 to P5H of partial product
P5. Low order bits P50 and P51 and two's complement bit TC5 are grouped into SETCʹ.
[0022] A triple array summation circuit for performing addition according to the groupings
of Figure 2 is shown schematically in Figures 4 to 8 with Figure 3 showing how to
arrange Figures 4 to 8.
[0023] ADDA is a carry save array shown in Figures 4 to 6. The first row consists of half
adders except in the sixteenth and nineteenth bit positions. A full adder is used
in the sixteenth bit position to introduce a round bit RND to cause rounding off of
the result when desired. A diamond adder is in the nineteenth bit position of the
first row and has bit PG extended to its b input from the previous half adder. The
input designations of the full adder, diamond adder and first half adder in the first
row are repeated (but not shown) for the other adders of the same types in the figures.
[0024] As is known in the art, a full adder (such as full adder 40 in Figure 5) has addend
inputs a and b, a carry input C
in, a sum output s and a carry output c
out. A half adder, such as half adder 41, has a and b inputs and s and c
out outputs. A diamond adder, such as diamond adder 42, has a and b inputs, an s output
and a "s or c" output which is the logical OR of the s and c outputs of a half adder.
The bits input to each adder are as shown. Further, some inputs are hard wired to
a logical "1" or a logical "0" as indicated.
[0025] The ADDA array also includes conditional sum adders MX0 to MX8. These are high speed
adders which use a multiplexer to select one of two full adders according to its C
in input, as known in the art. The inverted low order bits of the final product are
shown as S0 to S10 in Figure 4.
[0026] ADDB comprises a row of full adders followed by a 20-bit carry look-ahead adder (CLA)
100, shown in Figures 7 and 8. The ADDC array comprises one row of full adders followed
by a 24-bit CLA 101. The inverted high order bits of the final product are output
from CLA 101 and are shown as S11 to S34. In performing the final addition, the ADDC
array receives an intermediate sum signal from the ADDB array and receives both an
intermediate sum signal and an intermediate carry signal from the ADDA array.
[0027] CLA 100 and CLA 101 are shown with their inputs at the top and outputs at the bottom.
The CLAs can be of typical configuration as known in the art and shown, for example,
in N. Scott, Computer Number Systems end Arithmetic (1985) and in US Patent No. 4,153,938,
both of which are hereby incorporated by reference.
[0028] The foregoing triple array multiplier architecture with parallel summation by a carry
save array and a first carry look-ahead array followed by final summation by a second
carry look-ahead array achieves high speed, high performance and small chip area in
a circuit which can be implemented with standard IC technology. The circuit is versatile,
can perform accumulation and can operate with either signed or unsigned magnitude
representation.
1. A multiplier circuit for producing the sum of a multiplier signal and a multiplicand
signal comprising:
partial product generating means (13,14) for generating N addends, of different
rank, each addend comprising a set of partial products comprised of an ordered set
of bits, in response to a multiplier signal and a multiplicand signal applied to the
inputs (10,11) of said partial product generating means;
a carry save array (ADDA) coupled to said partial product generating means for
adding the bits of a first group of R addends taken from said N addends and for generating
the least significant bits of said sum and generating intermediate sum and carry signals;
characterized by
a first carry look-ahead array (ADDB) operating in parallel and simultaneously
with said carry save array coupled to said partial product generating means for adding
substantially all of the bits of said N addends not included in said first group to
generate an intermediate product; and
a second carry look-ahead array (ADDC) coupled to said carry save array and to
said first carry look-ahead array, said second carry look-ahead array generating the
most significant bits of said sum from said intermediate product and from said intermediate
sum and carry signals; where N and R are integers with R being greater than one half
of N
2. The multiplier circuit of claim 1 wherein said partial product generating means includes
an encoding circuit (12) for encoding said multiplier signal to reduce the number
of partial products which are generated.
3. The multiplier circuit of claim 2 wherein said encoding circuit performs a modified
Booth algorithm.
4. The multiplier circuit of claim 1 further comprising an accumulator register (15)
coupled to said carry save array and said second carry-look ahead array for receiving
said sum, said carry save array adapted to add the contents of said accumulator register
to said first group of bits in order to perform accumulation.
5. The multiplier circuit of claim 1 further comprising a two's complement register (16)
coupled to said carry save array and having a plurality of bits, each bit respectively
indicating the sign of a respective partial product, the contents of said two's complement
register being added into said sum.
6. The multiplier circuit of claim 1 wherein said first group of R addends is further
selected such that said intermediate sum and carry signals and said intermediate product
are generated substantially simultaneously.
7. the multiplier circuit of claim 1 wherein said partial product generating means generates
nine addends, wherein said first group of addends added by said carry save array includes
the bits of the six least significant addends, wherein said first carry look-ahead
array comprises one row of full adders (P60 ....P69, P6A.....P6H) and a carry look-ahead
adder (CLA100), and wherein said second carry look-ahead array comprises one row of
full adders (P50,P51......) and a carry look-ahead adder (CLA101).
8. A method for adding n partial products in a digital multiplier comprising the steps
of:
adding the r least significant partial products of said n partial products in a
carry save array (ADDA) to generate the least significant bits of the final sum and
to generate intermediate sum and carry signals, said carry save array performing said
addition in a time delay A; characterized by
simultaneously adding the bits of all the remaining partial products in a first
carry look-ahead array (ADDB) to generate an intermediate product in a time delay
B; and
adding said intermediate sum and carry signals and said intermediate products in
a second carry look ahead array to generate the most significant bits of said final
sum, the value of r being an integer greater than one half n.
9. The method of claim 8 wherein the value of r is selected to minimize the difference
between time delay A and time delay B.
1. Multiplizierschaltung zum Erzeugen der Summe von einem Multiplikatorsignal und einem
Multiplikandsignal,
enthaltend:
eine Teilprodukt-Generatoreinrichtung (13, 14) zum Generieren von N Summanden unterschiedlichen
Ranges, wobei jeder Summand einen Satz von Teilprodukten aufweist, die einen geordneten
Satz von Bits enthalten, wenn ein Multiplikatorsignal und ein Multiplikandsignal an
die Eingänge (10, 11) der Teilprodukt-Generatoreinrichtung angelegt sind,
eine Übertrag-Sicherungsarray (ADDA), die mit der Teilprodukt-Generatoreinrichtung
verbunden ist, zum Addieren der Bits einer ersten Gruppe von R Summanden, die aus
den N Summanden entnommen sind, und zum Generieren der niederwertigsten Bits der Summe
und zum Generieren von Zwischensummen- und Übertragssignalen,
gekennzeichnet durch
eine erste Übertrag-Vorgriffsarray (ADDB), die parallel und gleichzeitig mit der
Übertrag- und Sicherungsarray arbeitet, die mit der Teilprodukt-Generatoreinrichtung
verbunden ist, zum Addieren im wesentlichen aller Bits der N Summanden, die nicht
in der ersten Gruppe enthalten sind, um ein Zwischenprodukt zu generieren, und
eine zweite Übertrag-Vorgriffsarray (ADDC), die mit der Übertrag-Sicherungsarray
und der ersten Übertrag-Vorgriffsarray verbunden ist, wobei die zweite Übertrag-Vorgriffsarray
die höchstwertigsten Bits der Summe aus dem Zwischenprodukt und aus den Zwischensummen-
und Übertragssignalen generiert, wobei N und R ganze Zahlen sind und R großer als
die Hälfte von N ist.
2. Multiplizierschaltung nach Anspruch 1, wobei die Teilprodukt-Generatoreinrichtung
eine Kodierschaltung (12) aufweist zum Kodieren des Multiplikatorsignals, um die Anzahl
von Teilprodukten zu verkleinern, die generiert werden.
3. Multiplizierschaltung nach Anspruch 2, wobei die Kodierschaltung einen modifizierten
Booth-Algorithmus ausführt.
4. MULTIPLIZIERSCHALTUNG nach Anspruch 1, wobei ferner ein Akkumulator-Register (15),
das mit der Übertrag-Sicherungsarray und der zweiten Übertrag-Vorgriffarray verbunden
ist, zum Empfangen der Summe vorgesehen ist, wobei die Übertrag-Sicherungsarray die
Inhalte des Akkumulator-Registers zu der ersten Gruppe von Bits addieren kann, um
eine Akkumulation auszuführen.
5. Multiplizierschaltung nach Anspruch 1, wobei ferner ein Zweierkomplement-Register
(16) vorgesehen ist, das mit der Übertrag-Sicherungsarray verbunden ist und mehrere
Bits aufweist, wobei jedes Bit auf entsprechende Weise das Vorzeichen von einem entsprechenden
Teilprodukt angibt, wobei die Inhalte des Zweierkomplement-Registers zu der Summe
addiert werden.
6. Multiplizierschaltung nach Anspruch 1, wobei die erste Gruppe von R Summanden ferner
so gewählt ist, daß die Zwischensummen- und Übertragssignale und das Zwischenprodukt
im wesentlich gleichzeitig generiert werden.
7. Multiplizierschaltung nach Anspruch 1, wobei die Teilprodukt-Generatoreinrichtung
neun Summanden generiert, wobei die erste Gruppe von Summanden, die durch die Übertrag-Sicherunasarray
addiert werden, die Bits von den 6 niederwertigsten Summanden enthält, wobei die erste
Übertrag-Vorgriffsarray eine Reihe von Volladdierern (P60...P69, P6A...P6H) und einen
Übertrag-Vorgriffsaddierer (CLA100) aufweist und wobei die zweite Übertrag-Vorgriffsarray
eine Reihe von Volladdierern (P50, P51...) und einen Übertrag-Vorgriffsaddierer (CLA101)
aufweist.
8. Verfahren zum Addieren von n Teilprodukten in einem digitalen Multiplizierer, enthaltend
die Schritte:
Addieren der r niederwertigsten Teilprodukte der n Teilprodukte in einer Übertrag-Sicherungsarray
(ADDA), um die niederwertigsten Bits der Endsumme zu generieren und Zwischensummen-
und Übertragssignale zu generieren, wobei die Übertrag-Sicherungsarray die Addition
in einer Zeitverzögerung A ausübt, gekennzeichnet durch
gleichzeitiges Addieren der Bits von allen verbleibenden Teilprodukten in einer
ersten Übertrag-Vorgriffsarray (ADDB), um ein Zwischenprodukt in einer Zeitverzögerung
B zu generieren, und
Addieren der Zwischensummen- und Übertragsignale und der Zwischenprodukte in einer
zweiten Übertrag-Vorgriffsarray, um die höherwertigsten Bits der Endsumme zu generieren,
wobei der Wert von r eine ganze Zahl größer als eine Hälfte von n ist.
9. Verfahren nach Anspruch 8, wobei der Wert von r so gewählt ist, daß die Differenz
zwischen der Zeitverzögerung A und der Zeitverzögerung B minimiert ist.
1. Circuit multiplicateur pour produire la somme d'un signal représentant un multiplicateur
et d'un signal représentant un multiplicande comprenant :
- un moyen (13, 14) générateur de produit partiel pour générer N nombres à ajouter,
de rang différent, chaque nombre à ajouter constituant un ensemble de produits partiels
constitué d'un ensemble ordonné de bits, en réponse à un signal de multiplicateur
et à un signal de multiplicande appliqués aux entrées (10, 11) dudit moyen générateur
de produit partiel ;
- un réseau (ADDA) de sauvegarde de retenue couplé audit moyen générateur de produit
partiel pour ajouter des bits d'un premier groupe de R nombres à ajouter pris parmi
lesdits N nombres à ajouter et pour générer les bits les moins significatifs de ladite
somme et générer des signaux de somme intermédiaire et de retenue,
caractérisé par :
- un premier réseau (ADDB) de lecture anticipée de retenue fonctionnant en parallèle
et simultanément avec ledit réseau de sauvegarde de retenue couplé audit moyen générateur
de produit partiel pour ajouter pratiquement tous les bits desdits N nombres à ajouter
qui ne sont pas inclus dans ledit premier groupe pour générer un produit intermédiaire
; et
- un second réseau (ADDC) de lecture anticipée de retenue couplé audit réseau de sauvegarde
de retenue et audit premier réseau de lecture anticipée de retenue, ledit second réseau
de lecture anticipée de retenue générant les bits les plus significatifs de ladite
somme à partir dudit produit intermédiaire et à partir de ladite somme intermédiaire
et des signaux de retenue ; où N et R sont des nombres entiers, R étant plus grand
que la moitié de N.
2. Circuit multiplicateur selon la revendication 1, dans lequel ledit moyen générateur
de produit partiel comprend un circuit de codage (12) pour coder ledit signal de multiplicateur
de manière à réduire le nombre de produits partiels qui sont générés.
3. Circuit multiplicateur selon la revendication 2, dans lequel le circuit de codage
exécute un algorithme modifié de Booth.
4. Circuit multiplicateur selon la revendication 1, comprenant en outre un registre accumulateur
(15) couplé audit réseau de sauvegarde de retenue et audit second réseau de lecture
anticipée de retenue pour recevoir ladite somme, ledit réseau de sauvegarde de, retenue
étant adapté de manière à ajouter le contenu du registre accumulateur au premier groupe
de bits afin d'effectuer une accumulation.
5. Circuit multiplicateur selon la revendication 1, comprenant en outre un registre (16)
de complément à deux couplé audit réseau de sauvegarde de retenue et comportant une
pluralité de bits, chaque bit indiquant respectivement le signe d'un produit partiel
correspondant, le contenu dudit registre de complément à deux étant ajouté à ladite
somme.
6. Circuit multiplicateur selon la revendication 1, dans lequel ledit premier groupe
de R nombres à ajouter est en outre sélectionné de telle sorte que lesdits signaux
de somme intermédiaire et de retenue ainsi que ledit produit intermédiaire sont générés
de façon sensiblement simultanée.
7. Circuit multiplicateur selon la revendication 1, dans lequel ledit moyen générateur
de produit partiel génère neuf nombres à ajouter, ledit premier groupe de nombres
à ajouter additionné par ledit réseau de sauvegarde de retenue comprenant les bits
des six nombres à ajouter les moins significatifs, ledit premier réseau de lecture
anticipée de retenue comprenant une rangée d'additionneurs complets (P60...P69, P6A...P6H)
et un additionneur (CLA100) de lecture anticipée de retenue et ledit second réseau
de lecture anticipée de retenue comprenant une rangée d'additionneurs complets (P50,
P51...) et un additionneur (CLA101) de lecture anticipée de retenue.
8. Procédé pour additionner n produits partiels dans un multiplicateur numérique comprenant
les étapes consistant :
- à additionner les r produits partiels les moins significatifs desdits n produits
partiels dans un réseau (ADDA) de sauvegarde de retenue pour générer les bits les
moins significatifs de la somme finale et pour générer des signaux de somme intermédiaire
et de retenue, ledit réseau de sauvegarde de retenue effectuant ladite addition avec
un temps de retard A, caractérisé par le fait que l'on additionne simultanément les bits de tous les produits partiels
restants dans un premier réseau (ADDB) de lecture anticipée de retenue pour générer
un produit intermédiaire avec un temps de retard B ; et
- à additionne lesdits signaux de somme intermédiaire et de retenue et lesdits produits
intermédiaires dans un second réseau de lecture anticipée de retenue pour générer
les bits les plus significatifs de ladite somme finale, la valeur de r étant un nombre
entier supérieur à la moitié de n.
9. Procédé selon la revendication 8, dans lequel la valeur de r est sélectionnée de manière
à minimiser la différence entre le temps de retard A et le temps de retard B.