[0001] The invention relates to multistage electrical signal processing apparatus and particularly
to such apparatus having processing elements distributed across a plurality of devices
interconnected to form a cascaded array.
[0002] Electrical signal processing apparatus including apparatus for analysing multi bit
binary coded digital signals, is known including devices where a number of processing
elements are provided on a single chip device such as an integrated circuit silicon
chip. Such apparatus may require high speed sampling of data and difficulties arise
when it is necessary to interconnect a number of such devices or chips in order to
form a cascaded array. Electrical signals can be transferred more quickly in on-chip
communication than off-chip communication which is necessary when data is communicated
through a number of interconnected chips. Due to the small physical size of integrated
circuit chips limited space is available for output and input pins to effect interchip
connections and this can raise further difficulties in attempting to overcome slower
off-chip communication in synchronous systolic arrays of processing elements partitioned
across multiple devices when they are arranged to effect high frequency data sampling.
[0003] It is an object of the present invention to provide multistage electrical signal
processing apparatus with improved inter device connection for use in synchronous
systolic arrays having processing elements distributed across a plurality of devices.
[0004] The present invention provides multistage electrical signal processing apparatus
comprising a plurality of signal processing elements distributed across a plurality
of devices interconnected to form a cascaded array of devices, data supply means for
inputting time varying input data to each of the processing elements, the same input
data being supplied simultaneously to each of the devices, each device comprising:
(l) at least one signal processing element,
(2) means for generating an intermediate result representing the result of processing
the input data received by that device after a time interval from input of the data,
(3) interconnecting means arranged to supply to a second device an output from a first
device for combination with said intermediate result of said second device,
(4) combining means for combining any output received through said interconnecting
means with said intermediate result to form a combined result, and
(5) output means for outputting said combined result from the device through said
interconnecting means,
said interconnecting means including signal delay means whereby said intermediate
result in a second device is combined with an output which was derived from a first
device at a time related to that at which input data was input to said second device
for use in forming said intermediate result, and
(6) time control means controlling the time at which data is input to each of the
elements and controlling a time interval between input of data to a said device and
the formation of an intermediate result using that data.
[0005] Preferably the apparatus includes means for updating the input data supplied to each
of the elements in a succession of time controlled cycles so as to form in each device
a new intermediate result in each cycle, said time delay means being arranged to introduce
a time delay such that the intermediate result of a second device which is combined
with the output of a first device is the intermediate result obtained from input data
input to the second device during the cycle immediately following that of the formation
of the intermediate result incorporated in the output of the first device.
[0006] Preferably each device includes means effecting further time delay connected to said
combining means and arranged to introduce a controlled time delay between generation
of a said intermediate result by the element or elements or the device and the output
of a combined result from the device thereby forming a time controlled pipeline in
which combined outputs are output from the device at a frequency equal to that of
said intermediate result formation but delayed by a controlled time delay.
[0007] Conveniently the combining means comprise adding devices for adding the output of
one device to a said intermediate result of another device. Alternatively other devices
such as shifters, multipliers, or logical bit operators may be used as the combining
means.
[0008] The invention is particularly applicable to processing apparatus in which said elements
each comprise adding devices connected in a chain and arranged to effect addition
using input data and accumulation with an output from a preceding element. Each device
may have a plurality of elements each comprising adding devices connected in a chain
and each arranged to effect multiplication of input data with a coefficient and to
accumulate the result of the multiplication with data output by a preceding element
in the chain.
[0009] In such apparatus arranged to handle multi bit binary coded digital signals, the
number of bits needed to represent the accumulated output of a succession of devices
may increase as the number of devices included in the array increases. It may therefore
be desirable to output from each device to the next fewer bits than are used to represent
the accumulated output of the said device.
[0010] In such an arrangement, the elements are arranged to process multi bit binary coded
digital signals, each device including selector means for selecting from said intermediate
results a signal formed by less bits than the multi bit signal processed by each element.
[0011] Conveniently adjacent devices in the cascaded array are interconnected by a multi
bit parallel connection, said connection having a bit width with less bits than the
multi bit signals processed by each element.
[0012] The invention is particularly applicable to a transversal filter for effecting electrical
signal analysis, said filter comprising a cascaded array of separate filter devices.
Each such device may comprise a single silicon chip.
[0013] The invention will now be described by way of example and with reference to the accompanying
drawings in which:
Figure l is a block diagram of one device for use in a transversal filter for analysis
of multi bit binary coded digital signals,
Figure 2 illustrates timing diagrams for use in the apparatus of Figure l,
Figure 3 illustrates a cascade connection not in accordance with the present invention,
Figure 4 illustrates a cascade connection between devices of the type shown in Figure
l and in accordance with the present invention, and
Figure 5 illustrates in more detail the interconnection of two devices as shown in
Figure 4.
[0014] This particular example relates to digital transversal filters in which a synchronous
systolic array is formed by interconnecting a plurality of single integrated circuit
chip devices, each chip having N stages of filtering. Each stage of the filter is
arranged to multiply input data, which in this example is represented by a l6 bit
number, by a stage coefficient which in this example is represented by a further l6
bit number. Each stage effects this multiplication in a time controlled major cycle
consisting of a plurality of minor cycles, each minor cycle involving the formation
of a partial product and addition with any previous partial products of that major
cycle. The input data is fed simultaneously to all devices and to all stages on each
device. The input data is updated each major cycle. The major cycle has a time T and
after each major cycle the output of each stage is fed to the next stage on the same
chip and a new product calculation is commenced using new updated input data. The
output of the filter chain at time t = kT is given by:
y[kT] = w(l)*x[kT] + w(2)*x[(k-l)T] + ... ... + w(N)*x[k-N+l)T]
where x[kT] represents the kth input data sample and w(l) to w(N) are the weight coefficients
for the N stages.
[0015] The arrangement shown in Figure l is provided on a single chip and in this example
consists of thirty two successive stages of which only the first two stages l2 and
l3 and the final stage l4 have been marked. The operation of each stage is carried
out under the control of a control unit l5 using input data derived from an input
shift register l6. Each stage includes thirty six bit locations each having an adder
for use in forming and accumulating partial products for each minor cycle. The adders
in each stage need not complete resolution of any carry signals during each major
cycle. Each stage forwards a sum and carry signal to the next stage after completion
of each major cycle and the last stage l4 provides an output consisting of sum and
carry signals for each bit position to a carry propagate adder l7. This adder completes
resolution of any carry signals and provides a thirty six bit output signal l8 herein
referred to as an intermediate result for that chip. The device illustrated in Figure
l is described and claimed in our copending patent application of even date entitled
"Improvements in or relating to multistage digital signal multiplication and addition"
(UK Patent Application No 86l2455). It will not be further described in this specification
as the contents of that copending application are hereby incorporated by cross reference.
[0016] The timing operation can be seen from reference to Figure 2 which illustrates a clock
pulse train 2l derived from a clock 20 and in the examples shown one clock pulse is
required for each minor cycle used in partial product formation. In this example each
major cycle consists of four successive clock pulses which would be appropriate for
a four or eight bit coefficient. The input data supplied into the shift register l6
is illustrated at 22 and this may incorporate a signal to indicate periods when the
data is valid and may properly be sampled. The device of Figure l is arranged to sample
input data on a rising edge of a clock pulse so that data for two successive major
cycles is sampled at the instances marked 23 and 24 in Figure 2.
[0017] If the number of filter stages provided on a single chip of the type illustrated
in Figure l are not sufficient for some signal analysis purposes, it may be desirable
to interconnect a succession of chips to form a cascaded array. Such a cascade is
illustrated in Figure 3 although this example is not in accordance with the invention.
This illustrates two successive chips 30 and 3l each arranged to receive the same
input data from a bus 32 together with coefficients from a bus 33. Each device incorporates
a plurality of stages 34 and the operation of each is controlled by a common clock
giving clock pulse inputs 35. Such devices may be arranged to operate at particularly
high speed. For example the input data may be cycled at frequencies up to l0 MHz.
Signal transfer is effected much more quickly in on-chip communication than in off-chip
communication. In the example illustrated in Figure 3 each stage on any one chip is
connected to the next stage by high speed on-chip connections but the output of the
last stage on device 30 is connected directly to an input to the first stage 34 on
chip 3l by an external off-chip connection 36. This off-chip connection 36 will be
a multi bit parallel connection depending on the bit width of the output signal to
be conveyed from each chip to the next. It will inevitably be slower than the on-chip
communication and in high speed operation it will not be possible for the output from
the device 30 to be received for processing by the first stage 34 of the second device
3l in synchronism with a major cycle of operation on device 3l following immediately
after the previous major cycle in which the last stage on device 30 formed an output.
This unavoidable delay in line 36 would mean introducing some form of delay into the
input data line 32 which would therefore require additional input pins on each device
to receive the delayed output from the delay device in the input data bus 32. Furthermore,
the necessary number of bits in the interchip connection 36 will require an extra
bit for each doubling in the number of filter stages 34 which are involved in the
cascaded array. To avoid the problem of providing an excessively large number of input
and output pins on the limited space available on an integrated circuit chip, it is
preferable to select a rounded output from each chip using a selected number of most
significant bits so that the output conveyed from each chip to the next is independent
of the cascaded array length. In order to produce a rounded output from the device
30 this means using an adder similar to the adder l7 in Figure l in order to resolve
the carry signals and then using a bit selector to select a limited number of bits
and effecting rounding before transmitting the output through the interconnection
36. These operations introduce further delay which must have some delay compensation
in the data line 32 if the interconnection 36 is to lead directly into the input of
the first filter stage on the second device 3l. Furthermore, if only a selected number
of bits are transmitted along the interconnection 36 it will be necessary to provide
a further selector at the input to each successive stage so that the selected number
of bits are fed to the correct bit locations of the first filter stage on the next
chip. It also has the disadvantage of less accuracy in that each filter stage of the
second chip operates with rounded data before producing its own output.
[0018] Figure 4 shows a modification of the cascaded arrangement which is in accordance
with the present invention. Each chip in this example comprises a CMOS chip having
N stages of filter each stage being marked 34. The filter stages are controlled by
a control unit l5 using a timing clock 20 as previously described with reference to
Figure l. An intermediate result l8 is formed for each device by the adder l7. The
output 40 from the adder l7 is fed to a selector and rounder 4l before reaching a
combination device in the form of a carry propagate adder 42. The adder 42 combines
the output 43 from the selector 4l with the output 44 from a delay shift register
45. The adders l7 and 42 as well as the selector 4l and shift register 45 are controlled
by the clock 20. The adder 42 provides a combined output 46 which is fed through a
multi bit parallel data path 48 to the input 49 of the next device in the cascade.
The input 49 is connected to the input of the delay shift register 45 in the next
device. The connection between the two devices is illustrated more fully in Figure
5 which shows the output part of a device 30 and the input part of device 3l. In this
example Figure 5 uses a notation on the data buses showing the bit width of the data
paths. The DATAIN line 32 provides a l6 bit signal to the filter stages and the output
of the last filter stage 34 on each device is a thirty six bit signal fed to the adder
l7. The intermediate result from the adder l7 is formed on line 40 which in this example
is still a thirty six bit signal. The selector 4l selects the most significant twenty
one bits and has a carry in from the most significant discarded bit so that rounding
is achieved. It provides an output on line 43 which is a twenty four bit signal and
the most significant three bit locations have sign extension so as to contain the
same bits as the most significant bit of the twenty one selected bits. This allows
the apparatus to operate in two's complement in order to handle negative numbers.
The twenty four bit signal on line 43 is fed to the adder 42 where the signal is combined
with any output on line 44 from the delay register 45. The signal on line 44 is also
a twenty four bit signal and a twenty four bit output is provided on line 48 which
interconnects the output of the first device 30 to the input of the delay register
45 on the second device 3l. It will therefore be seen that the signal which forms
the combined output from any one device is fed along line 48 through the delay register
45 on the next device so that it is combined with the intermediate output of the next
device in the cascaded array.
[0019] The adders l7 and 42 as well as the selector 4l will each take time to carry out
their respective operations and as they are controlled by the clock 20 they are allocated
an integral number of major cycles in which to operate. For this example it is assumed
that the adder l7 and selector 4l introduce a delay of P major cycles. The adder 42
may incorporate additional time delay illustrated at 49 so that the combined delay
of operation of the adder 42 together with inherent delay in the off-chip communication
48 represents a whole number of major cycles between the adder 42 and the input to
the delay register 45 on the second device. It is assumed that this collective delay
between the adder 42 and the input to the shift register 45 on the next device is
represented by X major cycles. If the filter provided by each chip has N stages each
using one major cycle the time of processing by the N stages on each chip is N major
cycles. The number of major cycles delay which is introduced by each shift register
45 is Y where X + Y = N. In this way, the combined output 46 from any one chip is
joined at the adder 42 of the next chip with the intermediate result of the next chip
in such a way that the two are time synchronised. The intermediate result at the second
chip will have been derived using input data which was input to the second chip immediately
following the last major cycle in which data was input to the first chip in forming
the combined output from the first chip.
[0020] It will therefore be seen that the effect of the major cycles delay introduced by
the adder 42, 49 and the interconnecting line 48 provide a pipeline effect so that
a new combined output is supplied to further chips in the array at the same frequency
as the production of intermediate results on each chip although it has been phase
shifted by introduction of the pipeline delay.
[0021] This enables synchronism between the chips to be achieved without causing any delay
in the DATAIN supply to each of the chips. It permits reduction in the number of bits
transmitted through the interconnection 48 between successive chips so that less input
and output pins are necessary on the restricted space available on each chip. The
use of twenty four bits only twenty one of which are used prior to sign extension
by the selector 4l on the first chip allows for additional bits to be included after
each doubling of the number of filter stages in the cascade. By arranging for the
rounded output of each stage to be combined only with the rounded output of the next
stage, less inaccuracy results as the second device does not itself carry out rounding
on a number which has already been rounded by a previous stage.
[0022] In the above example, the delay of P major cycles introduced by the adder l7 and
selector 4l is common to each chip and need not be taken into account in determining
the number of delay units to be introduced by the shift register 45.
[0023] The invention is not limited to the details of the foregoing example. For instance,
if the delay of P units introduced after the last stage of each device prior to combination
at the adder 42 is not common to each unit then variation in the delay achieved by
the shift registezr 45 will be necessary to achieve synchronism of the output of one
device with the intermediate result of the next device.
[0024] Although the above examples have related to single chip devices, the invention may
be applied to other cascaded arrays including those formed by use of board devices.
Although the particular example given relates to transversal filters, other signal
processing devices or other arrays may be used.
1. Multistage electrical signal processing apparatus comprising a plurality of signal
processing elements distributed across a plurality of devices interconnected to form
a cascaded array of devices, data supply means for inputting time varying input data
to each of the processing elements, the same input data being supplied simultaneously
to each of the devices, each device comprising:
(l) at least one signal processing element,
(2) means for generating an intermediate result representing the result of processing
the input data received by that device after a time interval from input of the data,
(3) interconnecting means arranged to supply to a second device an output from a first
device for combination with said intermediate result of said second device,
(4) combining means for combining any output received through said interconnecting
means with said intermediate result to form a combined result, and
(5) output means for outputting said combined result from the device through said
interconnecting means,
said interconnecting means including signal delay means whereby said intermediate
result in a second device is combined with an output which was derived from a first
device at a time related to that at which input data was input to said second device
for use in forming said intermediate result, and
(6) time control means controlling the time at which data is input to each of the
elements and controlling a time interval between input of data to a said device and
the formation of an intermediate result using that data.
2. Multistage electrical signal processing apparatus according to claim l including
means for updating the input data supplied to each of the elements in a succession
of time controlled cycles so as to form in each device a new intermediate result in
each cycle, said time delay means being arranged to introduce a time delay such that
the intermediate result of a second device which is combined with the output of a
first device is the intermediate result obtained from input data input to the second
device during the cycle immediately following that of the formation of the intermediate
result incorporated in the output of the first device.
3. Multistage electrical signal processing apparatus according to claim l or claim
2 in which each device has a plurality of signal processing element connected in a
sequential chain, each arranged to process input data in a succession of cycles each
having a controlled time duration, whereby the said time interval to produce intermediate
result for the device is dependent on the time of each cycle and on the number of
elements in the device.
4. Multistage electrical signal processing apparatus according to claim 3 including
means for supplying an output from one element after a cycle to a subsequent element
in the chain for use with input data by the subsequent element in a subsequent cycle,
and means for updating the input data to each element in each cycle.
5. Multistage electrical signal processing apparatus according to any one of the preceding
claims in which each device includes means effecting further time delay connected
to said combining means and arranged to introduce a controlled time delay between
generation of a said intermediate result by the element or elements of the device
and the output of a combined result from the device thereby forming a time controlled
pipeline in which combined outputs are output from the device at a frequency equal
to that of said intermediate result formation but delayed by a controlled time delay.
6. Multistage electrical signal processing apparatus according to any one of the preceding
claims in which said signal delay means forming part of the interconnecting means
is connected between an input to each device and said combining means.
7. Multistage electrical signal processing apparatus according to claim 5 in which
said means effecting further time delay is connected between said combining means
and the input of a subsequent device.
8. Multistage electrical signal processing apparatus according to claim 5 or claim
7 in which a second means effecting further time delay is connected between a last
processing element on each device and the combining means for the device.
9. Multistage electrical signal processing apparatus according to any one of the preceding
claims in which said signal delay means comprises shift register means through which
data may be sequentially shifted to introduce a time delay, or memory with means for
sequential addressing.
l0. Multistage electrical signal processing apparatus according to any one of the
preceding claims in which the combining means comprise adding devices for adding the
output of one device to a said intermediate result of another device.
11. Multistage electrical signal processing apparatus according to any one of the
preceding claims in which said elements each comprise adding devices connected in
a chain and arranged to effect addition using input data and accumulation with an
output from a preceding element.
12. Multistage electrical signal processing apparatus according to claim ll in which
each device has a plurality of elements each comprising adding devices connected in
a chain and each arranged to effect multiplication of input data with a coefficient
and to accumulate the result of the multiplication with data output by a preceding
element in the chain.
13. Multistage electrical signal processing apparatus according to claim l2 in which
each element is time controlled to effect multiplication without complete resolution
of carry signals, each said device including a carry propagate adder at an output
end of the chain of elements so as to provide a resolved total for the device.
14. Multistage electrical signal processing apparatus according to any one of the
preceding claims in which the elements are arranged to process multi bit binary coded
digital signals, each device including selector means for selecting from said intermediate
results a signal formed by less bits than the multi bit signal processed by each element.
15. Multistage electrical signal processing apparatus according to claim l4 in which
adjacent devices in the cascaded array are interconnected by a multi bit parallel
connection, said connection having a bit width with less bits than the multi bit signals
processed by each element.
16. A transversal filter for effecting electrical signal analysis, said filter comprising
multistage electrical signal processing apparatus according to any one of the preceding
claims in which means are provided for updating input data to each element of an interconnected
cascade array after a cycle of operation by each element.
17. A multistage electrical signal processing apparatus according to any one of the
preceding claims in which each device comprises a single silicon chip device.