Cross-Reference To Related Applications
[0001] The present application is related to:
[0002] EP-A-651,355, entitled "Method And Apparatus For Image Compression, Storage and Retrieval
On Magnetic Transaction Cards".
[0003] EP-A-651,354 entitled "Compression Method For A Standardized Image Library".
[0004] US-A-5,473,327 entitled "Method And Apparatus For Data Encoding With Reserved Values".
[0005] US 08/361,368 filed December 21, 1994, by Ray, Ellson, and Elbaz, entitled "Method
For Compressing And Decompressing Standardized Portrait Images".
[0006] So far as permitted the teachings of the above referenced Applications being incorporated
by reference as if set forth in full herein.
Field Of The Invention
[0007] The present invention relates to the field of digital image processing and more particularly
to a method and associated apparatus for forming digital standardized image feature
templates for facilitating a reduction in the number of bits needed to adequately
represent an image.
Background Of The Invention
[0008] Consider a library of pictures with similar image content, such as a collection of
portraits of missing children. In this collection of images, there exists a large
degree of image-to-image correlation based upon pixel location as faces share certain
common features. This correlation across different images, just like the spatial correlation
within a given image, can be exploited to improve compression.
[0009] Analysis of some image libraries yields knowledge of the relative importance of image
fidelity based on location in the images. If the images were to be used for identification
of missing children, then the image fidelity on the facial region would be more important
than fidelity in the hair or shoulders which in turn would be more important than
the background. Image compression can be more aggressive in regions where visual image
fidelity is less important to the application.
[0010] In many applications, preserving the orientation and quantization of the original
image is less important than maintaining the visual information contained within the
image. In particular, for images in the missing children library, if the identity
of the child in the portrait can be ascertained with equal ease from either the original
image or an image processed to aid in compression, then there is no loss in putting
the processed image into the library. This principle can be applied to build the library
of processed images by putting the original images into a standardized format. For
missing children portraits this might include orienting the head of each child to
make the eyes horizontal, centering the head relative to the image boundaries. Once
constructed, these standardized images will be well compressed as the knowledge of
their standardization adds image-to-image correlation.
[0011] Techniques from a compression method known as vector quantization (VQ) are useful
in finding correlation between portions of an image. Compression by vector quantization
VQ is well suited for fixed-rate, lossy, high-ratio compression applications (see
R. M. Gray, "Vector Quantization", IEEE ASSP Magazine, Vol. 1, April, 1984, pp. 4-29).
This method breaks the image into small patches of "image blocks." These blocks are
then matched against other image blocks in a predetermined set of image blocks, commonly
known as the codebook. The matching algorithm is commonly the minimum-squared-error
(MSE). Since the set of image blocks is predetermined, one of the entries of the set
can be referenced by a simple index. As a result a multi-pixel block can be referenced
by a single number. Using such a method the number of bits for an image can be budgeted.
When a greater number of bits is allocated per image block, the size of the codebook
can be increased. Similarly, if a greater number of bits is allocated to the image,
then the number of image blocks can be increased (and hence the size of each block
reduced).
[0012] Codebooks are determined by first forming a collection of representative images,
known as the training image set. Next, images are partitioned into image blocks, and
the image blocks are then considered as vectors in a high-dimensional vector space,
i.e., for an 8 x 8 image block, the space has 64 dimensions. Image blocks are selected
from predetermined regions within each image of the training set of images. Once all
the vectors are determined from the training set, clusters are found and representative
elements assigned to each cluster. The clusters are selected to minimize the overall
combined distances between a member of the training set and the representative for
the cluster the member is assigned to. A selection technique is the Linde-Buzo-Gray
(LBG) algorithm (see Y. Linde, et. al., "An Algorithm For Vector Quantizer Design",
IEEE Transactions On Communications, Vol. COM-28, No. 1, January, 1980, pp. 84-95).
The number of clusters is determined by the number of bits budgeted for describing
the image block. Given n bits, the codebook can contain up to 2
n cluster representatives or code vectors.
[0013] The above referenced patent applications, U.S. Application Serial Number 08/145,051
by Ray, Ellson and Gandhi, and U.S. Application Serial Number 08/145,284 by Ray and
Ellson, both describe a system for enabling very high-compression ratios with minimal
loss in image quality by taking advantage of standardized features in a library of
images. This method of compression takes advantage of the built-in image-to-image
correlation produced by the standardization to improve predictability and, therefore,
improve compressibility by training on the standardized images and forming multiple
codebooks consisting of 8x8 pixel codevectors.
[0014] These applications describe a process for extracting the common features of images
in the library and using this as a basis for image standardization. Once standardized
into a standardized library image, the image can be compressed and subsequently decompressed
into lossy representations of the original library image.
[0015] An overview of the prior art as described by the above referenced patent applications
consist of:
Standardization:
[0016] Select the image features of greatest importance.
[0017] Process a set of representative images from the library to enhance the selected features.
[0018] Locate selected features in the representative images.
[0019] Determine constraints for locations of image features.
[0020] Process the image to meet the image feature location constraints.
[0021] Assign regions of the image based on presence of features or a desired level of image
quality.
[0022] Determine the image-to-image correlation of the images for each subregion.
[0023] Allocate capacity for storage of image information for each subregion based on a
partition subregions into image blocks and codebook size.
[0024] Construct codebooks to take advantage of the correlation.
[0025] Process the image to enhance features.
[0026] Locate selected features in the image.
[0027] Standardize the image by processing the image to meet the image feature location
constraints.
[0028] Partition the image based on the subregions and their image blocks.
[0029] For each region, determine the entry in the codebook which best approximates the
image content.
[0030] Store the series of codebook values for each image block as this is the compressed
image.
Decompression:
[0031] Extract the codebook values from the series of codebook value.
[0032] Determine the codebook based on corresponding subregion position in the codebook
value series.
[0033] Extract an image block based on the codebook value from the above determined codebook.
[0034] Copy the image block into the appropriate image block location in the subregion.
[0035] Continue inserting image blocks until all image block locations are filled in the
entire image.
[0036] In order to store a compressed facial image in a manner consistent with international
standards for a single track of a magnetic transaction card, the available data capacity
is below 500 bits (see ISO 7811/2).
[0037] When the target number of bits is extremely small, as in the case of facial image
storage in under 500 bits, the above described compression/decompression process fails
to provide facial images of consistent quality for use in certain verification and
identification applications. Additional techniques are necessary to further improve
the quality of the compressed images for more demanding verification systems. Opportunities
for improvement exist in image standardization, specialized codebook formation, and
image block symmetry.
[0038] Even with standardization of the location and orientation of the face within an image,
the lighting conditions for portraits may be highly asymmetric. This results in a
luminance imbalance between the left and right sides of a human facial portrait. What
is needed is a method for balancing the lightness of a human facial portrait in order
to achieve a higher degree of facial image portrait standardization and enhance the
natural symmetry of the human facial image.
[0039] With the standardization of both the luminance and the location of image features,
codebooks can be developed to better represent the expected image content at a specified
location in an image. A method of codebook specialization by Sexton, U. S. Patent
No. 5,086,480, entitled "Video Image Processing", which describes the use of two codebooks.
This compression method finds the best codevector from amongst two codebooks by exhaustively
searching both codebooks, and then it flags the codebook in which the best match was
found. The net result is a "super-codebook" containing two codebooks of possibly different
numbers of codevectors where the flag indicates the codebook selected. Codebook selection
does not arise from a
priori knowledge of the contents of a region of the image; Sexton calculates which codebook
to use for every codevector in every image. An opportunity for greater compression
is to eliminate the need to store the codebook flag.
[0040] It should be noted that the method of Sexton requires all codevectors in both codebooks
to have the same dimensions. Also, the prior art of Ray referred to above partitions
images into equal sized image blocks.
[0041] Another opportunity to improve the quality of compressed portraits is to use the
correlation within facial images is the approximate mirror image symmetry between
the left and right sides of the face. Frequently in near front perspective portraits,
a large degree of correlation exists between the portions of the face close to the
centerline. In particular, the image blocks used to render the part of the face above
and below the eyes exhibit a high degree of symmetric correlation. However, along
the centerline of the face the level of symmetry drops off due to the variability
of the appearance of the nose when viewed from slightly different angles. What is
needed is a method to further reduce the number of bits necessary to store a compressed
portrait image by exploiting the nature symmetry of the human face in the regions
around the facial centerline without imposing deleterious symmetry constraints on
the nose.
[0042] Some areas of the image do not contribute any significant value to identifying an
individual. For instance, shoulder regions are of minimal value to the identification
process, and moreover, this region is usually covered by clothing which is highly
variable even for the same individual. Since little value is placed in such regions
the allocation of bits to encode the image should also be reduced. In the present
invention some of these areas have been allocated few if any bits, and the image data
is synthesized from image data of neighboring blocks. This permits a greater allocation
of bits to encode more important regions.
Summary Of The Invention
[0043] The present technique facilitates the formation of an image feature template that
finds particular utility in the compression and decompression of like-featured images.
More specifically, the feature template enables the compression and decompression
of large collections of images which have consistent sets of like image features that
can be aligned and scaled to position these features into well correlated regions.
[0044] The feature template of the present invention comprises:
a plurality of template elements each representing a feature of an object; and
data representing the attributes of each template element.
[0045] The preferred methodology for forming a feature template comprises the steps of:
establishing the dimensions of a feature template to accommodate a standardized image;
partitioning said feature template into a plurality of feature types to accommodate
like features in the standardized image;
assigning at least one template element to each feature type; and
recording the position of all assigned template elements within the dimensions of
said feature template to facilitate the reconstruction of the so formed feature template.
[0046] From the foregoing it can be seen that it is a primary object of the present invention
to provide a feature template that is usable in a system for reducing the data storage
requirements for associated sets of images.
[0047] The above and other objects of the present invention will become more apparent when
taken in conjunction with the following description and drawings wherein like characters
indicate like parts and which drawings form a part of the present invention.
Brief Description Of The Drawings
[0048] Figures 1A, 1B, and 1C, illustrate a frontal facial portrait that is tilted, rotated
and translated to a standardized position, and sized to a standardized size, respectively.
[0049] Figure 2 illustrates, in flow chart form, the method for standardizing an image.
[0050] Figure 3A shows the positions and sizes of the template elements that form a template.
[0051] Figure 3B illustrates, by darker shaded areas, the positions and sizes of the template
elements of the template which have a left-to-right flip property.
[0052] Figure 3C illustrates, by darker shaded areas, the positions and sizes of the template
elements of the template which have a top-to-bottom flip property.
[0053] Figure 3D illustrates, by darker shaded areas, the positions and sizes of the template
elements of the template which are linked.
[0054] Figure 4 illustrates, in table form, the portrait features, their associated labels,
and their characteristics.
[0055] Figures 5A and 5B illustrate the template element data record for the elements in
the template illustrated in Figures 3A-3D.
[0056] Figure 6 illustrates a collection of tiles associated with each of the feature types
A-M used in the specific embodiment of the present invention.
[0057] Figure 7 illustrates the tile numbering and labeling for a compressed image.
[0058] Figure 8 illustrates the tiles as extracted from the feature type tile collections
with the lighter shaded tiles having at least one flip property.
[0059] Figure 9 illustrates the tiles after execution of all flip properties.
[0060] Figure 10 illustrates the final image.
[0061] Figure 11 illustrates a preferred apparatus arrangement on which the method of the
present invention may be executed.
Detailed Description Of The Invention
[0062] Figure 1A, represents an image that is a front facial portrait. In this example of
an image the face is tilted and translated with respect to the center of the image.
Depending on the source of images, other variations in the positioning and sizing
of the face within the borders of the image may be encounted. To achieve maximum results
with the present invention the size, position and orientation of the face is to be
standardized. In order to operate upon the image the image is placed into a digital
format, generally as a matrix of pixel values. The digital format (pixel values) of
the image is derived by scanning the original image to convert the original image
into electrical signal values that are digitized. The digital image format is then
used to replicate the image on a display to facilitate the application of a standardization
process to the displayed image and to the pixel values forming the displayed image
to form a standardized geometric image. The images are standardized to provide a quality
match with the template elements associated with the template (to be described in
detail later in the description of the invention). The process starts at Figure 1A
by locating the center of the left and right eyes of the face in the image. In Figure
1B a new digital image of the face image, representing a partially standardized geometric
image, is formed by rotating and translating the face image of Figure 1A, as necessary,
by well known image processing operations, so as to position the left and right eye
centers along a predetermined horizontal axis and equally spaced about a central vertical
axis. Figure 1C illustrates the face image of Figure 1B sized to form a standardized
geometric face image by scaling the image to a standard size.
[0063] Referring now to Figure 2, the method of forming the standardized geometrical image
is set forth in the left column of flow blocks commencing with the block labeled "select
an image". The selection process is based upon the availability of a front facial
image of the person that is to have their image processed with the template of the
invention. Included in the selection process is the creation of a digital matrix representation
of the available image. The digital matrix, is next loaded into a system (shown in
Figure 11) for display to an operator. As previously discussed the operator locates
the left and right eye and performs any needed rotation, translation and rescaling
the image to form the standardized geometrical image.
[0064] More specifically, with regards to the standard image of Figure 1C and the flow chart
of Figure 2, in this embodiment of the invention, the image standard was set to an
image size of width of 56 pixels and a height of 64 pixels with the eye centers located
28 pixels from the top border of the image and 8 pixels on either side of a vertical
center line. Identifying the centers of the left and right eyes is done by displaying
the initial image to a human operator who points to the centers with a locating device
such as a mouse, tablet, light pen or touch sensitive screen. An alternate approach
would be to automate the process using a feature search program. The human operator
localize the eye positions, and a processor fine-tunes the location through an eye-finding
search method restricted to a small neighborhood around the operator-specified location.
The next step in standardization is to alter the image to standardize the position
of the eyes to a predetermined location. In general, this consists of the standard
image processing operations of image translation, scaling and rotation.
[0065] With the image size and eye position adjustments made, the standardized geometric
image is stored, and the luminance standardization procedure takes place. The procedure
is represented by the flow blocks evenly labeled 40-52. There are three spatial scales
used to standardize for variations in the luminance of the digitized images; large
for light level/direction, medium for correcting asymmetric shadows from side lights,
and small for reduction of specular highlights from glasses, jewelry and skin. These
procedures change the mean luminance level in the digitized image. Certain features
which are useful in identification of an individual tend to get subdued in gray-scale
portraits. Hence, variations in luminance level, referred to as contrast, are also
adjusted in order to enhance these features.
[0066] The functional operation represented by block 50 shifts the facial mean luminance,
i.e., the average lightness found in the general vicinity of the nose, to a preset
value. In the preferred embodiment the preset value for a light skin toned person
is 165, for medium skin 155, for dark skin the value is 135. The formed standardized
digital image from block 50 is now represented by a storable matrix of pixel values
that is stored in response to function block 52.
[0067] Figure 3A, illustrates the layout of a template 30 that is to be used with the standardized
image of Figure 2. The template 30 is partitioned into 64 template elements labeled
A through M. The elements are arranged in accordance with 13 corresponding features
of a human face, for example, the template elements labeled A correspond to the hair
feature at the top of the head and the template elements labeled G correspond to the
eyes. Template elements with like labels share in the representation of a feature.
The tables of Figures 4, 5A, and 5B provide a further description of the remaining
template elements. Although the preferred embodiment of the invention is implemented
with 64 template elements and 13 features it is to be understood that these numbers
may be varied to suit the situation and are not to be construed as limiting the method
of this invention. Also to be noted is that some of the regions of the template are
not assigned to any element. These unassigned regions will not have their image content
based on retrieval of information from codebooks. The method for assigning image content
to these regions will be based on the assignment of adjoining regions which will be
described below. The template size matches that of the standardized image with 56
pixels in width and 64 in height. The size of the template elements are based upon
the sizes of facial features that they intend to represent. For example, G is the
relative size of an eye in a standardized image and both instances of elements assigned
to G are positioned in the locations of the eyes in a standardized image.
[0068] In Figure 3B, the darker shaded template elements are assigned a left-to-right flip
property that will be described in detail later.
[0069] In Figure 3C, the darker shaded template elements are assigned a top-to-bottom flip
property that will also be described later.
[0070] Another property of template elements is linkage. Figure 3D represents, with the
darker shaded region, the location of template elements which are part of a link.
In this specific embodiment, there exist 7 linked pairs of elements. The linkage is
horizontal between each pair of darkened template elements, for example, G at the
left of center is linked to G at the right of center. Although 7 linked pairs are
shown as the preferred embodiment, linkages can occur in groups larger than two and
between any set of like labeled elements.
[0071] The template 30 is in fact a sequence of data records where each record, in the preferred
embodiment, describes the location, size, label, left-to-right property, top-to-bottom
property, and linkage of each template element. Data records with other and/or additional
factors may be created as the need arises.
[0072] The template 30 records the distribution and size of template elements. Each template
element has assigned to it a codebook and a spatial location in the image. (Note that
some portions of the template have no template element; these regions will be described
in detail later.) The template shown in Figure 3A consists of 64 template elements
composed of rectangular pixel regions. These template elements are assigned to one
of 13 different codebooks (labeled A - M). The codebooks are collections of uniformly-sized
codevectors of either 4x16, 8x8, 8x5, 4x10, 4x6, or 8x4 pixels. The codevectors which
populate the codebooks are derived from a library of image features.
[0073] Referring to Figure 4, the labels A through M represent feature types for human faces.
The human feature associated with each of the labels A-M in the label row is set forth
in the row directly below the label row. The remainder of Figure 4 provides information
as to the width and height of template elements for each of the associated labels
along with the number of occurrences and the number of unique occurrences for each
feature. A unique occurrence indicates the number of independent template elements
that are linked (linked elements count as only a single unique occurrence).
[0074] Figures 5A and 5B illustrate the template element data records. These data records
represent the attributes of each template element and include data record fields for
the upper left hand pixel coordinates, the width, height, left-to-right flip property,
the top-to-bottom flip property, the record of the linkage group, and the feature
type. If the record of the linkage group is -1 then no linkage occurs. Other values
of the linkage group identify the template elements of that group. For example, the
top two template elements D of Figure 3D are linked and are given the same linkage
group number O in the linkage group column of the table of Figures 5A and 5B.
[0075] For the following discussion reference will be made to Figures 4, 5A, 5B, and Figure
6. The feature types referenced in Figure 4 are shown in Figure 6 as collections of
tiles. For example, the tile 1 within the collection for the feature type G, the eye
feature, is a picture of an eye represented as an array of pixels. The other tiles
2 through 2
n in the collection, for feature type G, are other pictures of eyes. In the preferred
embodiment the number of tiles in each collection for each feature type is 2
n for some positive integer n. It should be noted that tiles within a collection share
visually similar properties as they represent the image features. A comparison of
tiles from different feature types will, in general, be visually dissimilar.
[0076] Figure 7 represents an image as an assignment of template elements to tiles. Each
of the template elements of Figure 7 has a number associated with it, and this number
corresponds to a tile for the feature type of the template element. For example, the
template element 60 is for feature type A, and has the associated tile with number
of 46 in the collection of hair feature type tiles A in Figure 6. Similarly, the template
element 62 for the eye feature type is numbered 123, and it corresponds to the tile
with number 123 in the eye feature type collection labeled G in Figure 6. Note that
template elements within the same linked group (such as the eye feature type template
elements 62 and 64) have identical tile numbers. For ease of identification of linked
elements, they appear in bold number printing in Figure 7.
[0077] The tile numbers assigned to each template element in Figure 7 are used to retrieve
the like numbered tile from the like labeled feature type collection of tiles. The
retrieved tile is positioned in the same location as the template element containing
the tile number. The resulting assembly of tiles produces the mosaic of Figure 8.
[0078] Next, selected tiles are flipped. Figures 3B and 3C indicated which template elements
possessed the left-to-right and top-to-bottom flipping property, respectively. The
template elements with these flip properties are also indicated with the TRUE/FALSE
flags in the table of Figures 5A and 5B. The tiles in Figure 8 that are to be flipped
are identified by diagonal lines through the boxes representing pixels. Figure 9 represents
the application of the flipping properties to the tiles in Figure 8, where all tiles
in Figure 8 which correspond to the darkened template elements in Figure 3B are flipped
left-to-right and all tiles in Figure 8 which correspond to the darkened template
elements in Figure 3C are flipped top-to-bottom. It should be noted that some template
elements undergo both flips in the transformation of the tiles from Figure 8 into
the tile orientation of Figure 9 and that the flips take place within the associated
element.
[0079] The next step is the formation, by image processing operations, of a final image
based on the oriented tile mosaic of Figure 9. The mosaic of Figure 9 may have certain
visually objectionable artifacts as a result of its construction from tiles. These
artifacts can be diminished with some combination of image processing algorithms.
In the preferred embodiment, a combination of well known image processing operations
are applied including smoothing across the tile boundaries, contrast enhancement,
linear interpolation to fill missing image regions and addition of spatially dependent
random noise. The smoothing operation is described by considering the situation where
three successive pixels, P
1, P
2, and P
3, where P
1 and P
2 are in one tile and P
3 is in an adjoining tile. The pixel P
2 is replaced by result of

The contrast enhancement is achieved by determining the minimum pixel value, min,
and the maximum pixel value, max, for the mosaic. Each pixel value, P
cur, of the mosaic is replaced by P
new according to the formula:

[0080] The regions of the feature template not corresponding to any template element are
filled using linear interpolation. For each region, the known values of the boundary
pixels are used to calculate an average pixel value. The unknown corner opposite the
known boundary is set to this average value. The remainder of the unassigned interior
pixels are calculated by linear interpolation. In the preferred embodiment of the
present invention, there are four such unassigned regions, each located in a corner
of the feature template.
The spatially random noise, to be added, is determined by:

v = noise magnitude
where i = column of the affected pixel, j = row of the affected pixel, and
rand is a pseudo-random, floating-point number in the range (-1 to 1). The value
n(i,j) is added to pixel at location (i,j). If the resultant pixel is greater than
255 it is set to 255, and if it is less than zero it is set to 0. Figure 10 represents
an image after processing by these operations. It should be understood that other
image processing operations may be used in other situations, and the preferred embodiment
should not be considered limiting.
[0081] Figure 11, illustrates an apparatus 100 on which the present method may be implemented.
The apparatus 100 is comprised of a means 102 for converting a non-digital image,
such as a photo print 80, or a negative image 82, into a digital representation of
an image. Generally the conversion is performed in a scanner 104 which outputs signals
representing pixel values in analog form. An analog-to-digital converter 106 is then
used to convert the analog pixel values to digital values representative of the scanned
image. Other sources of digital images may be directly inputted into a workstation
200. In the preferred apparatus embodiment of the invention the workstation 200 is
a SUN SPARC 10, running UNIX as the operating system and encoded using standard C
programming language. The program portion of the present invention is set forth in
full in the attached Appendices A and B. Display of the digital images is by way of
the display 202 operating under software, keyboard 204 and mouse 206 control. Digital
images may also be introduced into the system by means of a CD reader 208 or other
like device. The templates created by the present method and apparatus may be downloaded
to a CD writer 210 for storage on a CD, hard copy printed by printer 212 written onto
a storage card (such as a transaction card), or transmitted for further processing
or storage at remote locations by means of a modem 214 and transmission lines.
[0082] Other uses for the present invention include compression of images other than portrait.
Other feature types can be represented, for example, the features associated with
banking checks such as the bank and account numbers along with signatures and dollar
amounts, addresses and the like. Like the human face these features tend to be positioned
at the same locations for each check.