[0001] This invention relates to an image signal processor, and more particularly to an
architecture of an image signal processor in which local image processing such as
spatial convolution, non-linear neighbor arithmetic operations can be conducted at
high speed and also, expansion of local image area and parallel processing by use
of multi-processor can be readily conducted.
[0002] Generally speaking, image processing includes the following five steps which are
sequentially conducted. That is, (i) observation, (ii) sampling/quantizing/coding,
(iii) preprocessing, (iv) feature extraction, (v) object recognition. An object is
observed by, generally, a video camera, whose image output is then digitized. In this
case, the digitized image includes random noise due to camera characteristics and
light reflection. Therefore, preprocessing is used to remove the unwanted noise component.
After this processing, features are extracted from the preprocessed image signal and,
thereafter, the extracted features are used to identify the object which was observed
by a video camera.
[0003] In such image processing, the preprocessing consume the most time since it handles
huge digital data which represents an image. Current Von Newman type computers are
not good at such processing.
[0004] Then, several trials have been conducted to realize high-speed image signal processing
by parallel-processing image signal data, but it is extremely difficult to parallel-processing
whole data of a picture frame. Local parallel image processing, which handles local
image data of m-by-n picture elements (pixels) of a picture frame, is applicable
to wider processing such as averaging, differential operation, data transformation
and so on, and size of circuitry therefor is relatively small. Therefore, development
of LSI for use in such local parallel image processing is actively conducted.
[0005] A conventional local parallel processing type image signal processor has specific
structure which is exclusively used for each image processing function and therefore,
in general, does not have general purpose-use-structure and expansion function.
[0006] Generally speaking, a local image signal processor is one which picks up local image
area data of certain proper size out of input image data, and which makes calculation
to such local image area data. That is, a local image signal processor handles image
processing of whole picture frame by scanning a window which simply covers a local
image area over whole area of a picture frame.
[0007] Among image singal processings, there are many processings which are conducted by
local processing such as averaging, differential operation, feature extraction and
so on. These processings have different complexities one another according to configuration
and size of local image area. Generally, such image processing is conducted to local
area of approximately 3-by-3 through l6-by-l6 pixels.
[0008] On example of conventional local image signal processor is disclosed in "Image signal
processor computes fast enough for gray-scale video", Tadashi Fukushima, Electronic
Design, October 4, l984, pages 209 through 2l5. The image signal processor disclosed
in Electronic Design has specific exclusive use structure, but does not have general
purpose use structure, and further, requires many peripheral/additional circuits
to realize expansion processing.
Summary of the Invention
[0009] The present invention, therefore, has as its principal object the provision of an
improved image signal processor which particularly realizes expansion of local image
area with high speed operation and has a simple architecture suitable for LSI.
[0010] This and other objects of the invention are accomplished by image processor according
to the present invention, which processor includes a local image register for picking
up local image area data of (m) rows × (n) columns, an expansion use register coupled
to the local image register for delaying the output of (m) rows × l column, and a
calculation unit coupled to the local image register for conducting calculation based
upon the local image area data. The calculation unit has an input terminal which receives
external data.
[0011] In a specific embodiment, the local image register comprises (m × n) one-pixel shift
registers m may be 3 and n may be 3. The expansion use register comprises m one-bit
shift registers. In this case, m may be 3.
[0012] According to the present invention as described herein, the following benefits, among
others, one obtained.
(l) It is possible to obtain an image signal processor which can realize high speed
local image processing.
(2) It is possible to obtain an image signal processor which can readily realize expansion
processing of local image area by extremely simple structure.
(3) It is possible to obtain an image signal processor which has general purpose use
structure by program control.
[0013] While the novel features of the invention are set forth with particularly in the
appended claims, the invention, both as to organization and contents, will be better
understood and appreciated, along with other objects and features thereof, from the
following detailed description taken in conjunction with the drawings.
Brief Description of the Drawings
[0014]
Fig. l shows a principle of local image processing on which the present invention
is standing;
Fig. 2 is a block diagram of one embodiment of a local image processor according to
the invention;
Fig. 3 is a block diagram of a local image processor as application of said one embodiment
processor, which has local image area expansion function;
Fig. 4 shows operation of the local image processor shown in Fig. 3;
Fig. 5 is detailed block diagram of a local image processor of one embodiment;
Fig. 6 is a block diagram of a local image processor as another application of the
one embodiment processor;
Fig. 7 is a timing chart for explaining the operation of the local image processor
shown in Fig. 6;
Fig. 8 is a block diagram of a local image processor as still further application
of the one embodiment processor;
Fig. 9 shows operation of the local image processor shown in Fig. 8; and
Fig. l0 is a timing chart for explaining the operation of the local image processor
shown in Fig. 8.
Detailed Description of the Invention
[0015] The invention is explained with reference to Figs. Fig. l illustrate a principle
of local image processing.
[0016] Local image processing method is mostly applied to digital image processing systems
such as edge detection, smoothing, etc. The data of each pixel and its neighboring
pixels in the local window l0 are gathered and processed by processor l2 so that output
pixel l4 of output image l6 is obtained, as shown in Fig. l. The local window l0 scans
sequentially over the frame picture of input image l8 one after the other, by scanning
controller 20.
[0017] For the realtime image processing, a very high-speed processor is indispensable.
In the case of the NTSC system with 5l2 × 5l2 pixels in one frame, sampling rate of
image signals is about 90 nS per pixel. In this case the processor must complete the
calculation of all the data of a local window within 90 nS. The local window generally
covers 3 × 3 through l6 × l6 picture elements. The larger local window we choose,
the higher processing speed we need.
[0018] The present invention is a simple but powerful realtime image signal processor.
[0019] Because of the development of the novel local image register and the pipeline register,
the image signal processor of this invention can be operated in multi-chip mode in
such image processing applications that require much higher speed of the enlarged
local window. The speed can easily be increased to twice or more when two or more
image signal processors are used, and simultaneously the local window can be enlarged
without any time loss.
[0020] Fig. 2 shows one embodiment of a local image signal processor for handling local
image are of 3-by-3 pixels or 3 rows × 3 columns according to the invention. Input
image signals l8 are applied, directly and indirectly through one line shift registers
22, 24, to local image signal processor l2 which has input terminals 26, 28, 30, output
terminals 32, 34, 36, external data input terminal 38, and calculation result output
terminal 40. One pixel shift registers 42 ∼ 58 receive image data from terminals 26
∼ 36 and store local image data of (m) rows × (n) columns or (m) × (n) pixels. One
pixel image data comprises 8 bits. Another one pixel shift registers 60 ∼ 64 receive
output from series-connected registers 42 ∼ 46, 48 ∼ 52, 54 ∼ 58, respectively to
produce outputs thereof to terminal 32 ∼ 36. Outputs of shift registers 42 ∼ 58 and
external data from terminal 38 are applied to calculation unit 66 for calculation.
Output of calculation unit 66 is applied to terminal 40.
[0021] Image data l8 are sequentially read out one pixel by one pixel by scanning of one
input image, and applied to shift register 42. Shift register 48 receives data which
is one-line-delayed data by one-line shift register 22 to the data which is applied
to shift register 42. Shift register 54 receives data which is two-line-delayed data
by one-line shift registers 22, 24 to the data which is applied to shift register
42. As stated above, three kinds of data, which are delayed by one line, respectively,
are applied to shift registers 42, 48, 54 and then, transferred to shift registers
44, 50, 56 and 46, 52, 58 respectively so that image data are transferred one pixel
by one pixel. By such operation, image data from input image are re-constructed by
shift registers 42 ∼ 58 as local image area data of 3-by-3 pixels or 3 rows × 3 columns.
These shift registers 42 ∼ 58 are referred to as local image register 59. Local area
data are processed by calculation unit 66 so that image processing of whole area is
possible.
[0022] Shift registers 60, 62, 64 are referred to as expansion use register 65.
[0023] There is a relevant U.S. patent application, Serial No. 682,32l, filed December l7,
l984 which includes similar circuit structure to Fig. 2 structure, but does not include
expansion use register 65.
[0024] Fig. 3 shows a case in which image processing of expanded local are is conducted
by use of a plurality of local image signal processors l2. In this Fig. 3 embodiment,
expanded local image processing of 6-by-6 pixels or 6 rows × 6 columns is possible
by use of four processors l2A ∼ l2D of 3 rows × 3 columns. Structure of each processor
is the same as shown in Fig. 2.
[0025] Input image signal l8 is applied through image signal input terminal 68 to image
signal processor l2A. The structure of processor l2A and its peripherals is the same
as shown in Fig. 2. Output terminals 32, 34, 36, 40 of processor l2A are connected
input terminals 26, 28, 30, 38 of processor l2B. The output terminal 40 of processor
l2B is connected to external signal input terminal 38 of processor l2C. The output
of one-line shift register 24 is applied through two pixels shift register 70, one-line
shift register 72, to input terminal 26 of processor l2C. The output of one-line
shift register 72 is applied through one-line shift register 74 to input terminal
28 of processor l2C. The output of one-line shift register 74 is applied through one-line
shift register 76 to input terminal 30 of processor l2C. The output terminals 32,
34, 36, 40 of processor l2C are connected to input terminals 26, 28, 30, 38 of processor
l2D, respectively. Final output 78 is produced from output terminal 40 of processor
l2D.
[0026] In Fig. 4, an image to be processed is designated by numeral 80. Local areas 82D
∼ 82A are supplied to and stored in local image signal processors l2A ∼ l2D simultaneously
at an arbitrary timing. Local areas 82A, 82B have a space of one pixel there between
in horizontal direction. Local areas 82C, 82D have a space of one pixel therebetween
in horizontal direction. As shown in Fig. 3, calculation result output signals of
processors l2A ∼ l2C are applied through output terminals 40 to external data input
terminals 38 of next stage processors l2B ∼ l2D. Therefore, in case of spatial convolution
calculation etc., convolution calculation result of certain stage processor is ANDed,
at next timing, with convolution calculation result of next stage processor. By transferring
calculation result of each processor to next stage processor in scanning direction,
output 78 of final stage processor l2D becomes calculation result of expanded local
area (in this case, 6 rows × 6 columns). The number of bit for shift register 70 differs
according to the number of local image signal processor to be used.
[0027] In Fig. 3 structure, four local image processors l2A ∼ l2D do not produce their outputs
simultaneously. That is, outputs from terminals 32, 34, 36 of processor l2A are delayed
by expansion use shift resistors 60, 62, 64 (see Fig. 2) by one pixel (one bit). The
output from terminal 40 of processor l2A is also delayed by calculation operation
of calculation unit 66. Therefore, the output from terminal 40 of processor l2B is
delayed to the output from terminal 40 of processor l2A. Similarly, the output from
terminal 40 of processor l2C is delayed to the output from terminal 40 of processor
l2B, and the output from terminal 40 of processor l2D is delayed to the output from
terminal 40 of processor l2C. As stated above, four outputs of processor l2A ∼ l2D
are outputted at different timing. Therefore, for example, the output of processor
l2A is ANDed with the outputs from local image register 59 (see Fig. 2) of processor
l2B in calculation unit 66 of processor l2B. This means that there is no need to provide
pheripheral circuit for local image processing.
[0028] As explained above, expansion processing of local area can be realized without external
additional circuits by use of a plurality of image signal processors, each of which
includes local area resistors of (m) rows × (n) columns and expansion use register
of (m) rows × (n) column.
[0029] Fig. 5 shows an example of an architecture of an image signal processor according
to the present invention. Local image register 84, to which image data l8 is applied
in parallel manner, stores local image area data of (m) rows × (n) columns. Expansion
use image register 86 receives output signal of local image register 84, and shifts
the image data in order, and produces image data output 88. The local image register
84 and expansion use image register 86 are driven by image read-in clock 90 from clock
control circuit 92. The shift operation of expansion use image register 86 is controlled
by expansion control signal 94 from input terminal 96. When the expansion control
signal is supplied, the expansion use image register 86 is set in shift mode wherein
expansion processing of local image are is conducted in pipe line manner. When the
expansion control signal is not supplied, the expansion use image register 86 is set
in pass mode or through mode wherein shift operation is not conducted. Arithmetic
unit (adder and subtracter) 98 and multiplier l00 receive signals selected by selector
l02, l04, l06 and conduct their calculation operations. Data register l08, ll0 store
calculation result of arithmetic unit 98. Data register ll2 stores input data ll4
from data input terminal. Data register ll6 stores calculation result of multiplier
l00. Data register ll8 stores output data from data register ll0. Output control circuit
l20 allows to pass data signal output l22 therethough only during the specified duration
according to control signal l24 from clock control circuit 92. Output clock l26, which
is generated from clock control circuit 92, is used for having data signal output
l22 read in external register (not shown). Program memory l28 stores image processing
program. When image processing is conducted, the program is read out by program control
circuit l30 and then, each block is controlled by such read-out program. Each block
is operated by clock from clock control circuit 92. The clock control circuit 92 receives
system clock l32, program start signal l34, parallel control signal l36 and produces
the above-stated output clock l26 and control clock for each block.
[0030] It is possible to write in program memory l28 address signal for reading out data
of arbitrary one pixel in local image register 84 which stores local area data of
m rows × n columns, calculation control signal of arithmetic unit (adder and subtracter)
98, control signals for selectors l02, l04, l06, write-in control signals for registers
ll2, ll0, ll6, multiplifying numeral of multiplier l00 and so on. If these are combined
to thereby prepare image processing program, arbitrary calculation to local image
area data which are read in local image register 84 can be carried out at high speed.
[0031] As explained above, image processing program, which is written in program memory
l28, is executed to one local image area data. The calculation result which is stored
in register ll0 is transferred to output register ll8 and then, outputted through
output control circuit l20 as data output signal l22. Then, new local image area data
is read in local image register 84. By repeating wuch operations in order, local parallel
image processing to whole image is conducted.
[0032] In case that Fig. 5 architecture is made as LSI, size of local image register 84
is limited to approximately 3 × 3 ∼ 5 × 5 from the view point of integration density.
On the other hand, in case of local parallel processing of image, generally, local
image register of approximately 3 × 3 ∼ l6 × l6 pixels is used. If an image signal
processor having a local image register of 3 × 3 pixels is applied to local parallel
processing which handles local image area of l2 × l2 pixels, complex external circuitry
is required other than l6 image signal processors. In this invention, such external
circuitry is not necessitated to conduct the above-stated processing.
[0033] Further, an image signal processor according to the invention has a structure which
readily enables parallel processing by use of multi-processor. In case that local
parallel processing is conducted during a given period and, processing speed is beyond
performance of an image signal processor, parallel processing must be conducted by
using a plurality of processors. In this case, generally, complex external circuitry
is required, but, in this invention, parallel processing can be realized without external
circuitry.
[0034] Fig. 6 shows an example wherein parallel processing is conducted by using two image
signal processors of the present invention. Image data from input terminal 68 is applied
directly and indirectly through one-line register 22 and one-line registers 22, 24
to image signal processor l2X as input data l8a ∼ l8c. Program start signal l34 from
input terminal l38 and parallel control signal l36 from input terminal l40 are applied
to image signal processor l2X. Image data l8a ∼ l8c and program start signal l34 are
also applied to image signal processor l2Y. Parallel control signal l36 from input
terminal l42 is also applied to image signal processor l2Y. The data outputs l22 of
image signal processors l2X, l2Y are applied to OR gates l44, l46, respectively. These
OR gates l44, l46 also receive output clocks l26 of image signal processors l2X, l2Y,
respectively. The outputs l48, l50 of OR gates l44, l46 are applied to external register
l52 which then produces output l54.
[0035] Each of the image signal processors l2X, l2Y includes local image register 84 of
3 × 3 pixels (see Fig. 5).
[0036] The image data input terminals of image signal processors l2X, l2Y receive image
data of three lines simultaneously by use of one line registers 22, 24. Although same
image data is applied to image signal processors l2X, l2Y, local image area data is
read in each local image register 84 of each image signal processor l2X (l2Y), one
by one.
[0037] Fig. 7 shows voltage wave forms of principal portions of Fig. 6 structure. In Fig.
7 (a) shows program start signal l34. The processing operation of image signal processors
l2X, l2Y is initiated and read-in of image input data l8 [see, (b) of Fig. 7] to local
image register 84 is initiated in synchronous with program start signal l34. (c),
(d) show parallel control signals l36 to be applied to image signal processors l2X,
l2Y. These two parallel control signals l36 are opposite in phase to each other. These
parallel control signals l36 are applied to clock control circuits 92 of image signal
processors l2X, l2Y, respectively and control read-in clock of local image register
84, clocks for program start timing, output control circuit, external register. That
is, image data read-in to local image register 84 is conducted are by one as shown
in (e), (f). According to this, program of each processor l2X (l2Y) is initiated one
by one and executed so that calculation is carried out to read-in local image area
data. The calculation result of each image signal processor is outputted one by one
in synchronous with program completion. The data output is outputted for a given period
by output control circuit l20 so that (g), (i) wave forms are obtained. Data output
signals l22 are added by OR gate l44 (h), (i) show clocks l26 which are outputted
from the image signal processors and used for read-in to external register l52. The
clocks l26 are also added by OR gate l46. Added calculation result l48 and added clock
l50 are applied to external register l52 to thereby produce processed data l54 which
is consecutive. Fig. 7 (k) shows a signal l54 (see Fig. 6) which is inputted to the
external register l52. In case that the image signal processor is formed by ECL gate,
OR gate can be formed by wired OR which is constructed by simply wiring so that OR
gates l44, l46 are not necessitated.
[0038] As explained above, parallel processing is possible by simple structure.
[0039] Fig. 8 shows a local image processor as still further application. In some applications,
there is a need to handle the data of an enlarged local window; for example, an array
of l2 × l2 pixels as shown in Fig. 9. The processor can be used in this case, because
of its novel pipeline architecture.
[0040] For instance, the spatial filter with the expanded local window of l2 × l2 pixels
is
This convoluted equation is re-written down with sixteen partial convolutions as
The enlarged local window is further divided into sub-windowns-a, b, c, .... p. Each
sub-window has a data size of 3 × 3, the same as the local-image register of the processor.
Sixteen units A, B, C, ...., P process the sub-window data, one by one. As shown in
Fig. 8, sixteen units are connected serially.
[0041] The unit A handles the image data of the sub-window-a, and calculates the partial
convolution; G₁. The partial convolution; G₁ is taken out at the next clock cycle
of an input timing of the image data. The image data through the local-image register
are internally delayed by one clock cycle by the pipeline register, and are transferred
to the next unit. Then the image data of sub-windows are taken in to each unit with
shifting by one clock cycle each other. So the data of the partial convolution; G₁
and the image data of the sub-window-b are supplied to the unit B at the same timing.
Then the unit B can simultaneously achieve the convolution of G₂ and the summation
of G₁ and G₂. With the same method, the unit C achieves the convolution of G₃ and
the summation of (G₁+G₂.) and G₃. The unit P achieves the convolution of G₁₆ and the
summation of (G₁+ G₂+ .... + G₁₅) and G₁₆. This data output from the unit P is just
G (X, Y).
[0042] This pipelining technology makes it possible to expand the local window with no time
loss nor the use of any other processing unit. Pipelining and the parallelism of the
processor can be utilized simultaneously.
[0043] While specific embodiments of the invention have been illustrated and described herein,
it is realized that other modifications and changes will occur to those skilled in
the art. It is therefore to be understood that appended claims are intended to cover
all modifications and changes as fall within the true spirit and scope of the invention.