REFERENCE TO RELATED APPLICATIONS:
[0001] This application claims priority to United States Provisional Patent Application
Serial Number 60/146,151, filed on July 30, 1999.
BACKGROUND OF THE INVENTION:
Field of the Invention:
[0002] The present invention relates to a method and apparatus for transmitting, processing,
and/or storing digital images using minimal bandwidth, processor overhead, and/or
storage capacity. In particular, the present invention relates to a new data compression
apparatus and method that can be utilized to effectively compress multimedia images
and the accompanying audio for efficient transmission across a limited bandwidth transmission
line, thereby providing real-time multimedia compression, transmission, decompression,
and viewing at a second location.
Description of the Related Art:
[0003] The present invention substantially advances known techniques for transmitting, storing,
and processing digital multimedia images through the use of novel compression and
decompression techniques. The compression and decompression techniques of the present
invention are applicable for use in nearly all image related data transmission, processing,
and storage technologies, as the techniques of the present invention provide for efficient
and accurate digital image transmission, storage, and processing through currently
available transmission, storage, and processing means.
[0004] Saupe de et al: "Optimal hierarchical partition for fractal image compression" XP000957872
describes the construction of rate distortion optimal partitions. A fine scale partition
which gives a fractual encoding with a high-bit rate and a low distortion is described.
The partition is hierarchical. A pruning strategy is employed on the generalised BFOS
algorithm. Sub-trees corresponding to partitions and fractual encoding are extracted.
A comparison with greedy partions based on the traditional collage area criteria is
also described.
[0005] Hafner et al: "Weighted finite automata for video compression" XP000734814 describes
an extended version of weighted finite automata codec that compresses video streams
at very low bit rates and shows that MWFA video compression gives a good alternative
for low bit rate applications.
SUMMARY OF THE INVENTION:
[0006] The present invention is set out in the appended claims.
BRIEF DESCRIPTION OF THE DRAWINGS:
[0007] The objects and features of the invention will be more readily understood with reference
to the following description and the attached drawings, wherein:
Figure 1 is a first general illustration of HC tree construction;
Figure 2 is a second general illustration of HC tree construction;
Figure 3 is an exemplary illustration of the main structure of the present invention;
Figure 4 is an exemplary illustration of the image compression portion; and
Figure 5 is an exemplary illustration of the audio image compression portion.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS:
[0008] Digital imaging processing, compression and transmission is an important area of
advancement in the computer field, as an ever increasing number of computer hardware
and software applications utilize digital images. The field of digital image processing
is known to include storage, compression, transmission, and receipt of images that
are represented by a plurality of bits or bytes, wherein each bit or byte corresponds
to a characteristic of a particular point within the image. Each individual point
within the image, which is often referred to as a pixel, has a number of binary values
associated therewith, which cooperatively indicate and/or determine whether the pixel
should be illuminated or not, and furthermore, the intensity and/or color of illumination
of the pixel. Therefore, for example, in a general 520 x 480 pixel color image on
a computer display, approximately 2 megabytes of data would be required to represent
this image, assuming each pixel had 8 color bits associated therewith (520 x 480 x
8 = 1,996,800). In situations with larger images having more colors, which is standard
within current computer applications, larger blocks of data are required to represent
the increased number of pixels and colors associated with the image.
[0009] In view of the substantial size of digital images in the current computing environment,
along with the associates storage, processing, and/or transmission of such images,
some form of compression of the image is clearly necessary in order to effectively
manage the storage, processing, or transmission operations. In particular, in order
to effectively transmit image and sound data combined, which is generallytermed video
and/or multimedia, across a commonly available bandwidth, the multimedia/video data
absolutely must be compressed in order to maintain reasonable image and sound quality
at the receiving end. Further, as a result of the inherent size of video and/or multimedia
images and their associated audio, in conjunction with the generally available bandwidth
in communications media, numerous concerns surround the aforementioned compression
operations. For example, image quality, bandwidth adaptation, playback performance,
memory consumption, coding symmetry, size and frame rate scalability, interaction
delay, and platform portability are common considerations that must be addressed by
a compression operation.
[0010] Upon review of these considerations, one of the most promising approaches to image
compression currently available is the fractal compression approach, which is also
known as iterated function systems (IFS). The basis for fractal compression or encoding
takes advantage of the fact that many sub-parts of an image are repeated within the
image itself, and therefore, a representative image can be created by a mapping of
the portions of the image to fractions of representative sub-parts of the original
image. These sub-parts are generally termed blocks. Fractal encoding subdivides an
image to be encoded into blocks, which taken as a whole, make up a close approximation
of the entire original image. Although some of the blocks may overlap and be of different
sizes, an approximation of the entire original image is represented by the compilation
of generated blocks. In conventional fractal encoding, the original image is divided
into two sets of blocks, the first set of blocks is the domain blocks, while the second
set of blocks is termed the range blocks. The domain blocks are capable of being rotated,
and have mirror images created in order to create more choices of domain blocks, which
can be compared against the range blocks. Each domain block is compared to each range
block to determine the closest match, and the mapping of the domain blocks to the
range blocks is stored. Only information regarding matching blocks is used, and the
remaining blocks may be discarded, thus inherently compressing the data representative
of the original image, as trivial blocks have been eliminated. Therefore, generally
speaking, early versions of fractal encoding illustrated how typically self-similar
fractal sets and measures could be generated by parallel action of systems of contraction
mappings. The IFS maps plus a set of associated probabilities (IFSP) defined operators,
which acted on probability measures. However, since it is more convenient and efficient
to represent images and signals by functions, the focus of IFS-type methods shifted
to the approximation of images and signals by functions that are generated by the
iteration of an IFS-type operator. Therefore, currently, the basic idea supporting
fractal image compression is to represent an image originally given as a square of
pixels by a family of transformation operators with a much more economical representation.
Through the use of information regarding course-grained structures, suitable operators
allow the generation of fine grained details within the images. The iterated application
of these operators to an arbitrary image will therefore continuously approximate the
original image.
[0011] Although current fractal image compression techniques yield relatively high compression
ratios and are well approximating for purposes of human perception, current techniques
are inherently lossy with regard to image quality. In particular, current mapping
techniques are known to produce images with blurred edges and distorted detail, as
the level of compression necessary to obtain an efficiently transmittable image requires
discarding of blocks that actually contribute to the image perception. Furthermore,
current fractal image compression techniques are highly asymmetrical in nature, and
therefore, the compression operation is extremely slow. More particularly, current
fractal image compression techniques inherently suffer from the excessive time needed
to search through potential pairings of image blocks in order to capture the self
similarity of the image. This additional time required to search for image block pairings
renders fractal compression techniques undesirable for many current applications,
as the ability to process and transmit real-time multimedia data is considered a necessity
in the current market.
[0012] Anothertechnique for compressing digital image information is wavelet edge detection
or Wavelet Packet transform (WP). Wavelet compression based techniques utilize the
fact that images have spatial and spectral redundancies therein, which can be eliminated
from the image to be stored or transmitted in order to reduce the size of the data
structure. Put simply, wavelets transform an image into a set of basic functions,
in similar fashion to the application of a Fourier transform, which uses mathematical
operations such as sines and cosines as a basis set. When the set of basis functions
is applied, the original image is transformed into a set of coefficients. These coefficients
can then be further transformed with a derivative or gradient operation being applied
to the basis set. The coefficients then take the form of edges in different frequency
bands or scales, which allows for an efficient means of image and video compression.
However, in similar fashion to fractal encoding, wavelet compression techniques also
suffer from the drawback of long coding delays and degraded image quality. Therefore,
in view of the apparent disadvantages associated with current fractal encoding and
WP related compression techniques, there clearly exists a need for a compression technique
capable of quickly, efficiently, and accurately compressing a digital multimedia image
and its accompanying audio, thereby allowing the compressed image and audio to be
efficiently stored, processed, and/or transmitted across generally available communications
media.
[0013] In response to the above noted deficiencies of previous compression methods and algorithms,
the present invention provides a real-time video/multimedia compressor capable of
compressing color multimedia (which will be used herein to reflect a string of video
images having an accompanying synchronized audio signal associated therewith) suitable
for transmission through communications media having bandwidths of, for example, 20Kbps
to 1.2 Mbps, without a noticeable loss of detail from the image. This is accomplished
by addressing a known problem with regard to IFS compression techniques, which is
the determination of the best range of block pairings. Therefore, as a result of the
present invention, exceptional motion tracking of detail within images is provided
at generally low bit rates. Furthermore, although hardware driven embodiments are
contemplated within the scope of the present invention, the present invention, for
example, is capable of being executed solely through a software driven implementation.
[0014] The image compressor of the present invention is centered around a new and improved
image compression algorithm that is based upon IFS, wherein a new technique of using
hierarchical categorizations (HC) of pixel blocks is incorporated into an IFS based
compression algorithm. This new compression algorithm is utilized not only for the
coding of single frames (Intraframes), but also, and more applicably, for coding of
sequences of frames (Interframes), wherein the coding is order preserving and based
upon conditional block plenishment of sequential frames. Integrated use of data structures
and algorithms is one of the main reasons for the resulting high quality of images
compressed and transmitted using the present invention. Furthermore, the ability to
efficiently compress sequences of images is complemented by an extremely efficient
audio codec that is based upon wavelet packet transforms with adaptive quantization.
This configuration, for example, allows for effective color image sequence compression
up to 1024 x 1024, with the accompanying audio at reasonable quality, and further,
real time image frame processing of approximately 30 frames per second over low bandwidth
requirements of approximately 20Kbps to 1.2Mbps. This compressed or coded data can
be implemented in software and decoded or decompressed with a small binary decoder,
which is generally less than 200KB, through fractal interpolation. Further, since
the present compression technique is based upon known fractal encoding frameworks,
the present compression technique is easily compatible with most known compression
frameworks.
[0015] The foundation of the present approach to multimedia compression is based upon the
principle that categorizations of pixel blocks are organized in hierarchical manner
using multidimensional trees, thus avoiding existing multidimensional tree strategies
that use a full pixel-to-pixel comparison of a pair of blocks, which operates to substantially
reduce the overall number of range - domain block comparisons necessary to compress
multimedia files. The present HC approach to multimedia compression combines the constructive
method of searching provided through multidimensional trees with the efficiency of
an abstract and quantitative categorization. More particularly, a multidimensional
tree in which pixel blocks are hierarchically categorized is generated, and then optimized
to eliminate unnecessary redundancies. Therefore, the search phase of the compression
operation is essentially reduced to a mere consulting of the multidimensional tree.
Further, generation an abstract and qualitative classification of an image block refines
the idea of computing an abstract representation of an image, and therefore, the present
invention also utilizes the basic underlying principle of wavelet partitioning. This
underlying principle, generally speaking, is to partition an image into one part representing
the average, and a second part representing a difference of all constituting values.
Therefore, the HC tree compression is configured to examine blocks that are structurally
similar to a reference block, as opposed to previous techniques, that focused upon
examining only those domain blocks that were proximate to a particular range block.
This structural search based compression technique allows for image quality to remain
high, while reducing the search time required.
[0016] The conditional block plenishment operation used for coding sequential frames noted
above is based upon approximating a new image through reference to an old image. Through
consulting the precomputed data structure for the original image, the search speed
for the updated image can be improved, as noted above. However, an additional gain
in search efficiency is realized in using conditional block plenishment as a result
of the fact that the original data structure can be reused for multiple images, thus
eliminating the need to continually regenerate and or precompute an original data
structure for comparison. Further, using this approach, the construction of the HC
tree can then be incrementally obtained for a new image.
[0017] The corresponding audio codec of the present invention is configured to compress
and transmit the audio associated with the multimedia video image in real-time synchronization,
and at quality that is essentially imperceptible through non electronic means from
the original audio. This is accomplished in the present invention through the use
of a WP transform as the filter bank decompression tool, which results in a non-uniform
decomposition that closely matches the critical frequency bands of a phychoacoustic
model. More particularly, a series of Daubechies wavelet filters are used to compromise
between the low delays associated with shorter filters and the inherent increase in
audio quality associated with longer filters. The quantization factors are determined
adaptively for each audio frame from iterating the analysis of the quantization noise
and masking threshold obtained from a perceptual masking model. In order to decode
the compressed audio signal, a simple dequantization step is followed by the inverse
WP transform to obtain the reconstructed audio signal.
[0018] In order to fully comprehend the compression technique of the present invention,
it is beneficial to discuss the foundational principles of the basic unmodified fractal
image compression approach. Without loss of generality, let (
X ;
d) denote a compact metric space, which will be termed the "pixel space." A function
c:
X →
X is a contraction if there is a constant
k ∈ [0, 1], such that
d(c(x), c(
y))
≤ kd(x,y) for all
x,
y ∈
X. Further, let the space of functions be
F(X) =
L·L(
X), such that, for example, the set of functions
f: X →
R, satisfies equation (1) shown below.

Iterated function systems with Grey Level Maps (IFSM) are an example of IFS or fractal
transform method over functions in
F(
X) representing images. Furthermore, an
N level map IFSM where
M = (ω, Φ) is a tuple where:
1. ω = {ω1,ω2...,ωN}, called the IFS component, is a family of contractions

and
2. φ= {φ1φ2...,φN } , called the grey level component, is a family of functions
φi : R → R being Lipschitz, for example, there exists a Ki ≥ 0 such that |φ(t1) - φ(t2)|≤ Ki|t1 - t2| for all t1, t2 ∈ R. It is assumed that affine grey-level maps of the form φ1 (t) = αit + βi , with αi,βi ∈ R.
A place dependent N-level map (PDIFSM) where M = (ω, Φ) is an IFSM with grey level
component Φ= {φ
1,φ
2 ...,φ
N} , where φ
i :
R x X→ R is Lipschitz; then it is assumed that φ
i is an affine grey-level map of the form φ
i,
(t,x)= α(x)t
+βi (x) +γ
i for
αiβi: Xi → R bounded on
Xi and
γi ∈
R. A partitioned N-level map (PIFSM) M = (w, Φ) is an IFSM where ω
= (ωi)i∈N is an N-indexed family of contractions
ωi :
Xi→
Xj(i) together with an index map j : N → N, such that
Xj(i) ⊂
X , and
Xi ⊂
X.
[0019] Given an IFSM or PIFSM noted above, M = (w, Φ), a fractal transform operator
T: F(X) → F(X) is defined as follows. For each a µ ∈
F(
X) and
x ∈
X, let its
N - fractal component
fi be defined as shown in equation (2)

where
x ∈ωi (X),
x ∉ωi (X).
Then, in view of the above noted mathematical foundations, the transformed image
Tu will be defined as shown in equation (3).

Under suitable conditions on the ω
i and the φ
i involving their Lipschitz factors, the IFSM operator
T is contractive in
F(
X). Therefore, from Banachs Fixed Point Theorem, there exists a unique fixed point

=
T
.
[0020] With the mathematical foundations established, it is apparent that fractal image
compression is based upon the notion that given a target image ν ∈
F(X) , which shall be approximated/compressed to a desirable accuracy ε > 0, find an IFSM
(ω, Φ) with associated/contractive fractal transform operator T such that equation (4)
is satisfied, where where
= T
is the fixed point of T .

The action of operator T may be geometrically viewed in terms of its action on the
graph of an image
u. Each term
u(
w
(
x)) represents a reduced copy of the graph of
u, which is supported on the set
wi (
X). The grey level map ∅
i then distorts this reduced copy in the grey level direction to produce the fractal
component
fi. The fractal transform adds up these fractal components to produce (
Tu)(
x). In the resulting fixpoint, the graph of

is the union of reduced and modified copies of itself.
[0021] The place-dependent IFSM (PDIFSM) generalizes the IFSM method with grey-level maps
since the grey level maps ∅
i depend on both the grey level value at a pre-image, as well as the location of the
pre-image itself. Partitioned IFSM (PIFSM), the basis of fractal block encoding, is
based on IFS maps
wi :
Xi →
Xj(i) that map subset
Xi ⊂
X to a smaller subset
X j(i) ⊂
X. The associated partitioned IFSM operator maps modified copies of the image on the
Xi onto the
X j(i) . Thus images are approximated as unions of local copies of themselves, as opposed
to copies of the entire image. This is accomplished by partitioning the original image
at different scales. Since images usually take the form of a rectangular array of
pixels, partitioning the original image into blocks is a natural choice. The blocks
are divided into large and small partitions, wherein the large partitions are called
domain blocks and the small partitions are called range blocks . The more domain blocks,
then the greater the domain pool, and the better the decoded image quality. However,
more domain blocks also results in longer compression times, and therefore, requires
a more intelligent and efficient algorithm to effectively compress the image.
[0022] For the remainder of this disclosure, unless stated otherwise, the following representation
is assumed: The image discussed and utilized in the exemplary embodiments is a digital
grey level picture. It is generally covered by non-overlapping square range blocks,
wherein each range block has a of size
n x
n, where n represents a number of pixels. The first block is aligned with the bottom
left corner of the picture. If the blocks do not reach up to the right or the top
edge, then the margin can be covered by rectangular blocks. Next, for every range
block, a similar domain block is located. The domain blocks are generally twice the
size of the range block, and are located anywhere on the image. The
x,
y coordinates of the lower left corner of the domain block determine its location.
These coordinates can be on each pixel of the image. Blocks are related by an appropriately
chosen similarity metric. The primary similarity metrics considered herein are the
root-mean-square (rms) and the mean absolute error (mae) metric. Therefore, for blocks
A = (
ai) and
B =
(bi) with 0 ≤
i <
m the representative equations are shown in (5) and (6).


Note that the
ai are obtained as solutions of the differential equation ∂Δ /
∂ai = 0 in the case where rms is chosen as the similarity metric, which generally amounts
to minimizing the positive definitive quadratic form shown in equation (7).

[0023] With the mathematical foundations and preliminary definitions set forth, the compression
process is generally described as locating a similar and larger domain block for every
range block, wherein similarity is judged by a corresponding similarity metric. The
transformation parameters corresponding to the located domain block are recorded as
the IFS code, and this process is repeated for all of the defined range blocks. Thereafter,
the combination of the transformation mappings yield the transformation
T, which represents a close approximation of the original image.
[0024] The transformation
T =
Tv o TG o Tc is composed from a contraction map
TG, followed by a grey value map
Tv. The contraction map shrinks the domain block to half the original size, replacing
the 2 x 2 pixel area by their respective averages. The geometric map is one of the
eight flips or symmetries of the square. Therefore, blocks are considered independently
of the image. The grey value map also changes the contrast and the brightness by a
scale factor
a0 and an offset
a1 respectively. The domain block (
dij) is mapped to (
a0dij +
a1)
. By restricting to the values of
a0 to be less than 1, the map T is ensured to be contracting. Although the offset is
constant in this case, it is not required to be. Building further upon the aforementioned
mathematical principles, define Δ to be a chosen similarity metric and Δ
D,R to be the corresponding distance from a contracted and/or flipped domain block
D to a range block
R. Then, the minimum of all differences obtained from the given range block
R is represented by equation (8).

Therefore, given a range block
R, its minimum difference Δ

with every potential domain block must be calculated. The basic compression algorithm
operates as shown in equation (9).

Further, generally speaking, in order to find a most suitable domain block for a
given range block, the range block has to be converted to its corresponding degenerated
HC-tree, for example, to a hierarchical node. Therefore, for each range block
R in the present exemplary embodiment, construct the
l-level hierarchy of abstract subblock vectors
a1 (R) =
(aj)j=1...h1. Although this process has to be performed only once for each range block, it may
replay the previous conversion of a potential domain block in order to efficiently
utilize computation time.
[0025] In order to decode the compressed image, the calculated transformations are applied
to an arbitrary initial image, and the process is repeated. Since the transform operator
is contractive, the images converge to a stable image, which is the resulting decoded
approximation of the original image, now uncompressed.
[0026] Therefore, in its most general sense, the compression algorithm of the present exemplary
embodiment combines the constructive searching method provided through the use of
trees with the efficiency of an abstract and quantitative categorization. As such,
the present compression method utilizes the block search strategy of organizing categories
of pixel blocks in a hierarchical manner using multidimensional trees. Through the
use of multidimensional trees, elements can be arranged in complete order and accessed
through a binary search. The binary tree's used in previous embodiments are balanced
so that each branch includes a partition in the list into two equal parts, which is
effective for indexing and/or searching one-dimensional data sets. However, the characteristics
representing images given as grey value maps are multidimensional, and therefore,
require more than a binary tree for effective searching and/or indexing. Therefore,
as noted above, the classic binary trees of previous embodiments have been replaced
with the multidimensional trees of the present exemplary embodiment. In the multidimensional
trees of the present exemplary embodiment, elements are considered n-dimensional vectors.
Therefore, suitable data structures for searching and/or indexing the multidimensional
space are the r-tree, k-tree, and the D-tree. In the r-tree structure, the vectors
are organized into a nested set of bounding rectangles, wherein each rectangle can
hold a maximum number of nodes. When the maximum number of nodes is exceeded, the
node is caused to split in two, which minimizes the total area represented by the
bounding rectangles. Therefore, through the use of multidimensional ordered trees,
the number of overall range - domain block comparisons necessary to identify an optimally
suited domain block for a given range block is minimized. As a result thereof, through
the use of multidimensional trees, the time needed to search through potential pairings
of image blocks in order to capture the self similarity of the image is dramatically
reduced over known search techniques.
[0027] In addition to the use of multidimensional trees, the present exemplary embodiment
further improves upon known search and compression strategies through the use of a
quantitative characterization scheme for pixel blocks that avoids a substantial amount
of the unnecessary pixel-to-pixel comparisons that are inherent in the aforementioned
existing search strategies. Through the use of categories, an abstract classification
of blocks is provided, wherein testing of the blocks is conducted in a more efficient
and less time consuming manner. In particular, for example, the HC approach categorizes
a given block according to only the mean values and the orientation of the contained
quad-tree sub-blocks. Alternatively, in the present exemplary embodiment, in addition
to the mean values and orientation information, orthogonal information relative to
the differences between pixels is included. Therefore, through the use of multidimensional
HC trees, the abstraction of pixels through a qualitative categorization can be iterated,
which results in efficiently parsable multidimensional trees. Additionally, unlike
previous search and compression algorithms that examine only those domain blocks physically
located proximate a particular range block, the present exemplary embodiment uses
the multidimensional HC tree search and compression algorithm to examine blocks that
are structurally similar to a particular range block. This allows the present exemplary
embodiment to maintain high image quality while allowing for dramatically reduced
search times.
[0028] The actual construction of an HC-tree in the present exemplary embodiment is based
a number of factors. First, given a vector s, wherein s =
(si)i=l...k, let the sequence 1...
k be called a scanline ordering. Therefore, each domain block
D is partitioned into a vector of equally-sized subblocks
s(
D), wherein
s(
D) =
(si)i=l...k, which is termed the subblock vector. Thus, given a domain block
D, its subblock vector
s(
D) = (S
i)
i=l ... k, and the contraction factor
X such that k =
h * X, the abstract subblock vector
a(
D) is the vector
(aj)j=1....h. where each component
aj =
(mj, vj) consists of the average
mj and the difference v
j of
(Si)(j-1)*x<i<j*X, respectively. Therefore, given an abstract subblock vector
s = (
Si)i=1...k with
si =
(mj, vi) and the contraction factor
X such that k =
h * X, its abstract subblock vector
a(s) is the vector
(aj)j=1... k, where each component
aj = (
mj,
vj) consists of the average
mj and the difference
vj of
(mi)(j-1)*x<i<j*X, respectively. Then the orientation of an abstract subblock vector a is its scanline
ordering reordered with increasing differences.
[0029] With the establishment of the notion of a multidimensional tree (r-tree, D-tree,
etc.), the establishment of the initial block partitioning, and the establishment
of the contraction factor(s), the actual HC-tree is the result of a standard tree
construction, which is generalized in Figure 1. Figure 1 illustrates an initial part
of a HC-tree induced from the quadrant of pixel values 14 (s111, s112, s113, and s114)
in the first level 11 of the figure. This quadrant of pixels 14 is then represented
in the second level 12 of the figure by the quadrant of numerical values 15 illustrated
in the exemplary embodiment as (m11, d11), (m12, d12), (m31, d31), and (m41, d41),
wherein m represents the medium value of the corresponding pixel level and d represents
the variance value of the corresponding pixel level. Finally, the quadrant of numerical
values 15 is represented as a quadrant of subblock vectors 16 (a1, a2, a3, and a4)
in the third level 13 of the figure, wherein a1 = (m1, d1), a2 = (m2, d2), a3 = (m3,
d3) and a4 = (m4, d4). Therefore, upon review of the progression from the first level
11 to the third level 13 illustrated within Figure 1, it is apparent the given a set
of domain blocks along with its corresponding subblock vectors, the representative
HC-tree is the multidimensional tree iteratively constructed from the correspondingly
induced abstract subblock vectors. These trees are termed multidimensional, as the
branching is based on abstract subblock comparison. Additionally, in a fractal based
codec, domain blocks and range blocks are related under affine transformations. However,
through the use of the exemplary configuration, the use of normal precomputation algorithms
is complicated, as a normalization may not rely on information on specific range blocks,
as is usually the case in fractal based compression operations. In order to counter
this complication, the orientation has been added as a precomputed value. Turning
back to the HC-tree structure, the root node of an HC-tree represents an abstract
view on the original blocks from which the tree was constructed. Going from some leaf
to the root node means an increasingly abstract and decreasingly concrete view on
the block represented by that leaf. Alternatively, when going from the root node to
a particular leaf node means narrowing down an abstract description of a partial image
to a concrete representation.
[0030] While discussing the construction of the HC-trees, it is also beneficial to briefly
discuss domain and range blocks. Since the analysis of a gray level component of the
original image is based on a fixed block structure, fractal image-based compression
algorithms seek self-similarity between larger parts and smaller parts of the given
image. The smaller parts form an initial partition of blocks called range blocks,
which are generally square or rectangle. By definition, every pixel within the given
image is included in exactly one range block. The first range block is aligned with
the bottom left corner of the image, and if the range blocks do not extend to the
right or the top edge, then the margin can be covered by rectangular blocks. The larger
and carefully selected blocks are termed domain blocks. It is desirable to generate
larger domain blocks, as the larger the domain blocks are allowed to be, the greater
the compression ratio produced. Aside from the size of the domain blocks, the goal
of the compression process is to most quickly find a closely matching domain block
for every range block, and upon finding a similar block, storing the corresponding
fractal transform. As an example of the creation and use of domain and range blocks,
the present exemplary embodiment using HC-trees starts from range blocks initially
being of size 32x32 pixels. These range blocks are then refined down to 24x24, 16x16,
8x8, 6x6, and finally to 4x4 pixels, as shown in Figure 1, assuming that sufficient
quality matches can be found between the respective blocks. After refining, the HC-tree
approach implements a partial tree partitioning operation, wherein given a range block,
the search of a matching domain block may fail, and therefore, in this case, the range
block is recursively replaced by a covering set of smaller range blocks. Domain blocks
may overlap each other, and need not cover every pixel within the respective image.
The domain blocks in the exemplary embodiment are generally twice the size of the
range block, and are located anywhere on the image. However, various sizes of domain
blocks are contemplated within the scope of the present invention. Nonetheless, domain
blocks are inherently restricted to a number of reasonable sizes. Although larger
domain blocks yield a more efficient matching of range blocks, a larger domain block
is also more specific, and thus, more difficult to be matched. Therefore, the potential
contribution from a larger domain block is less probable. Experimental data on the
present exemplary embodiment have revealed that block sizes selected from the following
set of {4, 6, 8, 12, 32, 64} pixels in each dimension have been effective in the present
exemplary embodiment. The
x,
y coordinates of the lower left corner of the domain block defines the location of
the domain block on the respective image, which can be on each pixel. The set of all
domain blocks, when considered as a whole, is termed the domain pool.
[0031] The determination of the best matching domain and range blocks is generally determined
by the particular distance metric chosen. Both the rms and mae distance metrics been
found viable, and further, wavelet coefficients provide an and alternative to rms
and mae. More particularly, for HC-tree operations such as insertion and comparison,
each node of the HC-tree is considered as a position vector in a corresponding r-tree,
and its orthogonal basis is the Cartesian product of all abstract subblock vectors
describing the original pixels in an increasingly concrete manner. The distance between
two of these vectors is measured by an appropriately chosen distance or similarity
metric. For purposes of example, consider the subblock vectors
A =
(ai) and
B = (b
i) with
0 ≤ i ≤
m. When vector bases are equal in type, the primary similarity metric is the rms metric,
which is defined by equation defined by equation (10).

The mae metric is defined by equation (11), and generally offers the least computational
efforts.

The wavelet (wvl) metric, Δ

, using the wavelet coefficients of a corresponding, fixly chosen wavelet filter provides
a further option. The crucial point of each of the aforementioned equations is the
fact that comparing two blocks requires a computational process involving all pixels,
which is similar to the traditional r-tree approach, wherein the results of the computation
are used in a series of average and variance comparisons, termed naive extension.
However, introduction of the HC-tree utilizes the advantages of these methods, while
also following the principle idea of pruning the computation as soon as possible.
In practice, this means that node insertion usually uses the complete set of information
in order to guarantee that neighboring nodes in the tree are structurally close as
intuitively expected. However, as discussed in the domain-range block pairing process,
computation of distances proceeds from abstract to concrete components, stopping as
soon as a reasonable decision is found.
[0032] Therefore, in general terms, the comparison each node of the HC-tree will be considered
as a position vector, and its orthogonal basis represents the list of averages and
differences reordered according to the orientation of the subblock vector. Matching
each range block tree onto the precomputed domain block tree begins with the most
abstract level. The HC-range block node is compared with a corresponding node in the
HC-domain tree. This results in the selection of some certain domain blocks and the
search continues recursively on the next level until finally a single domain block
is located together with the required transformation parameters. Embedding the HC-tree
into the principal domain-range search algorithm noted above leads to equation (12).

[0033] The process of matching
TreeR TreeD includes locating the optimal domain block for a given range block, wherein a hierarchical
approach is used. First, both normalized domain and range blocks are compared on the
very abstract level. Domain blocks corresponding poorly to the range block are dropped
at this early stage, and positive comparisons are reevaluated in subsequent, more
detailed comparisons. It should be noted that the HC-approach does not only reduce
the overall number of block comparisons, but in particular the number of operations
involving pixels, which is required if every single pixel belonging to a block must
be considered for determining the result of a comparison. Matching each range block
tree onto the recomputed domain block tree begins with the most abstract level. The
HC-range block node is compared with a corresponding node in the HC-domain tree. This
results in the selection of some certain domain block son and the search continues
recursively on the next level until finally a single domain block is located together
with the required transformation parameters.
[0034] Therefore, the domain and range comparison phase is preceded with a precomputation
phase in which all domain blocks are organized into a single domain tree
TreeD. The quality of the comparison algorithm is a result of each comparison on an individual
level involving a much smaller number of details, and therefore, fewer operations
are necessary, as the result of each comparison on a particular level contributing
to further searches. Further, through the use of variances rather than average mean
values, the emphasis is placed upon structural similarities, which inherently introduces
a qualitative aspect to the present invention. Additionally, precomputed average values
allow for an easy determination of the offset value for the transformation, and precomputed
orientations ensure that range blocks of different orientations may be compared in
a flexible manner.
[0035] Furthermore, since there are almost certainly going to be redundant entries within
each HC-tree, each branch of the HC-domain tree can be restricted to a certain number
of nodes without seriously degrading the available variety. Therefore, in the interest
of efficient processing, new blocks are generally inserted into the HC-domain tree
only if the differences of the corresponding HC-nodes are within the limits of a predetermined
scheme. Remembering that the HC-domain tree is constructed bottom-up, which is illustrated
by the progression from finer details to abstract representations as you go up through
the tree, the tree can be pruned in a top-down manner by identifying abstract nodes
which are of sufficient similarity as to eliminate one node from the tree. The pruning
process of the HC-trees may also be conducted depending on the pixel distributions
of abstract subblock vectors. More particularly, for capturing edges, blocks with
highest differences selected from 4x4, 8x8, and 12x12 sizes, for example. For moderate
texture, blocks with mid range differences selected from 12x12, 16x16, and 32x32 sizes,
for example. For smooth areas, for example, blocks with lowest differences selected
from 16x16, 32x32 and 64x64 sizes. This heuristic is based upon on the nearly intuitive
observation that while large blocks can successfully cover smooth areas of an image,
smaller blocks are required illustrate more defined and sharp edges.
[0036] In order to more formally present the notion of an HC tree, introduction of some
auxiliary definitions based on r-trees is required. First, let
T[S] be an r-tree
(r*-tree) constructed from a given set of vectors
S, and the level of a tree is, as expected, the level of its element vectors. Therefore,
for each node
n ∈ Tj, let
C(n) ∈ Aj be the set of all abstract subblock vectors that are direct or indirect branch nodes
of n. Further let
C-(n) =
{aj-1(
D) ∈
aj-1|
aj ∈
C(n)} be the set of corresponding subblock vectors which are one level less abstract. With
these foundations set, let
T-(n) = T[C-(n)] be the correspondingly generated tree, and then each leaf node in
Ti is considered to be of level 1, and each node having a branch node of level
n has level n + 1.
[0037] Therefore, in view of the domain pool and its abstract views, the HC-domain tree
is the tree T constructed according to the following scheme, as shown in Figure 2.
First, construct the
r-tree
Th =
T[Ah] generated from the set of most abstract subblock vectors. As a consequence the generators
become the leaf nodes of
Th. Given a subset of nodes
N ⊆
Tl, an HC-tree is the original level
1 tree where each
n ∈
N is associated with a level
I - 1 tree
T- (n). Thus, an HC-tree is a tree which on the top consists of very abstract nodes, and
moving downwards through the tree will eventually lead to a selected node that is
associated with a new, less abstract tree. This new tree can be considered as an ordered
replacement of more detailed versions of the original subtree of that node. On the
abstract level the set
N may intuitively be identified with the set of leaf nodes, and on the less abstract
level, it may include the leaf nodes and it's immediate branches.
[0038] Upon determination of a range block R, the HC-tree fractal compression algorithm
of the present exemplary embodiment seeks the domain block
D such that their difference Δ

(modulo a simple transformation) becomes minimal. More precisely this means to look
for a transformation
T, which when applied to
R, yields an optimal approximation of
D. The HC-approach of the present exemplary embodiment utilizes the following transformations.
Given a pixel
p, which is within
R, its associated value
q, which is within
D, is computed by applying the transformation
ti, where
q =
tip. Beginning with a simple linear transformation means that each of the transformations
ti is composed from a scaling factor
Si contraction map, a rotation
ri geometric map, and an offset
oi gray value map, wherein equation (13) holds true.

Equation (13) must be solved over three-dimensional space where the
x and
y components refer to the 2-dimensional localization, and the
z portion refers to the gray value intensity. In order to solve equation (13), a matrix
operation is conducted, which is represented by equation (14).

[0039] The scaling encoded by
sz, and
sv actually comprises scaling along the
x and
y axis, thus mapping a domain block to a somewhat smaller range block. Further, scaling
the contrast determined by
sz; rotation factors
rz and
rs, are used to describe rtating and mirroring Therefore, in view of the supporting
mathematical background presented above, the code for the partial IFS representation
of an image basically consists of a sequence of tuples, one per range block, wherein
equation (15) holds true.

Scaling factors
sz and
sv are limited to the possible relations between the standardized sizes of domain blocks
and matching range blocks. Rotation factors are limited to one of the 8 possible symmetry
operations encoding clock-wise rotations around the center by 0, 90,180, and 270 degrees
as well as mirroring about the horizontal and vertical median and diagonal axes. Rather
than initializing the image with the typical mean gray value, but for speeding up
the convergence of the decomposition process,
oi is initially be set to the mean value of a range block, rather than the relative
offset from the corresponding domain block in the present exemplary embodiment.
[0040] Returning to the discussion relative to domain and range blocks in greater detail,
with regard to the formation of a domain pool, as noted above, the notion of abstract
pixels forming abstract subblock vectors is based upon averages and differences. The
average value of a block A where
A = (
ai) with
0 ≤
i < m is taken to be
av(A), wherein


Therefore, as a principal notion of difference, the variance

is used. Mathematical foundation aside, preparation of the domain pool initially
includes selecting domain blocks at positions that are integral multiples of four,
in the present exemplary embodiment, which are 4, 6, 8, 16, 24, 32, and 64 pixels
wide in each dimension, in addition to any supplemental rectangular blocks possibly
needed to cover the margins. Domain blocks are normalized with respect to their orientation,
which is measured by the average values of its subblocks, and then storing the corresponding
rotation factor. Further, domain blocks are normalized with respect to their medium
gray level, and the corresponding offset is stored.
[0041] In constructing the domain blocks, abstract views on a domain block are generated
through the notion of abstract subblock vectors. In particular, each domain block
D is partitioned into a vector of equally-sized subblocks
s(D) =
(si)i=l ...k where
k = 4 for blocks being at least 32 pixels wide and
k = 2 otherwise in the present exemplary embodiment. The progression of abstraction
is covered by the contracting factor
X; which is generally
x =2 or 3 are the most efficient choices for the present exemplary embodiment. Therefore,
given a domain block D, its subblock vector
s(D) =
(si)i=l...k , and the contraction factor
X such that
k=
h *
X, the abstract subblock vector
a(D) of level 1 is the vector
(aj)j=l...h where each component
aj =
(mj, vj) consists of the average
mj and the difference
vj of (S
i)
(j-1*X≤j≤-1)*X respectively. Further, given an abstract subblock vector
s =
(si)j=1...k of level l with
si = (
mi,
vi) and the contraction factor
X such that
k =
h* X, its abstract subblock vector
a(s
) of level
1 + 1 is the vector (a
j)
j=l...h where each component
aj = (mj, vj) consists of the average
mj and the difference
vj of (m
i)
(j-l)"X≤i≤j*X respectively. Then for each domain block
D, let
a(D) =
(ai(D))i=l...h be the hierarchy of its abstract subblock vectors of level 1, and let A =
(Ai)i=l..h be the hierarchy of subblock vector sets
A1 =
{a1(D)|
D is domain block}.
[0042] The foundation of the audio portion of the compression in the present exemplary embodiment
utilizes the following primary components. A filter band decomposition is used, and
in particular a WP transform is used as the filter bank decomposition tool. A 42-band
decomposition filter bank tree structure is used in the present exemplary embodiment
to give a non-uniform decomposition that closely matches the critical bands given
by a psychoacoustic model. The psychoacoustic masking model, which is a simplified
perceptual, masking model, is employed here to reduce computational expense. The masking
thresholds applied to filter bank signals depend on the frequency bandwidths of the
critical bands. As such, the present exemplary embodiment uses a series of different
length Daubechies wavelet filters in different locations within the WP decomposition
tree. A quantization step is used, wherein an individual scaling factor for each of
the 42 frequency bands is used. These are determined adaptively for each frame of
the audio signal from an iterative analysis of the quantization noise and masking
threshold obtained from the perceptual masking model. Therefore, generally speaking,
the audio coder uses the concept of wavelet-packets with psychoacoustic adaptive quantization.
There are generallythree components in this exemplary psychoacoustic perceptual audio
coding algorithm: filter bank decomposition, psychoacoustic masking, and quantization.
[0043] With greater particularity, the WP transform used in the audio portion can be implemented
with an iterated 2-channel filter bank which is repeatedly applied not only in the
lower half band but also in the higher half band yielding any desired dyadic tree
structure of subbands. Assume the low and high pass filters used in the standard wavelet
transform are H0 and H1, with coefficients
ln and
hn respectively. The even and odd numbered WPs
W2m(t) and
W2m+1(t) are defined by equations (16) and (17)


[0044] The 42 band tree of the present exemplary embodiment is intended for 44.1 kHz sampled
audio. The tree structure can be deduced from the column showing the WP Decomposition
Tree Passbands. An important aspect of applying a WP decomposition filter bank in
audio coding is that the quality of the system depends heavily on the choice of the
analyzing wavelet used. As the length of the wavelet function increases the corresponding
filter transition bandwidth decreases. Consequently, a better separation between the
subband signals is gained and higher compression of the wideband signal results. However,
a longer impulse response increases computational effort and coding delays. Since
lower order filters can achieve satisfactory separation at lower frequencies and help
reduce coding delay, the present invention compromises by using different length,
maximally flat Daubechies wavelets in different parts of the WP decomposition tree.
The present invention further uses shorter Daubechies filters in lower frequency subbands
and longer filters for higher frequency subbands.
[0045] Frames provide the reference units for a certain number of samples. Only the difference
between the neighboring frames is transmitted in the present exemplary embodiment.
In selecting a suitable frame size the present invention compromises between two conflicting
requirements. A large frame size is desirable for maintaining lower bit rates. Unfortunately,
larger frame sizes also lead to poorer quality because of the non-stationarity of
audio signals. In the coder of the present exemplary embodiment, a standard frame
size of 2048 samples (about 46ms at 44,1kHz sampling rate) is employed. The two ends
of each frame are weighted by the square root of a Hanning window size 256 (i.e.,
every two neighboring frames overlap by half of this amount). Use of the square root
of Hann0ing window provides a lower frequency leakage and higher resolution than the
Hanning window used in MPEG.
[0046] The quantization portion of the audio codec is done using a scale factor table of
42 values, one for each of the 42. frequency bands. The 42 quantization factors are
determined adaptively for each frame of the audio signal from an iterative analysis
of the quantization noise and masking threshold obtained from the perceptual masking
mode. After the WP transform has been applied, the next step is quantization of the
WP transform coefficients. Quantization is of critical importance in reducing the
resultant t bit-rate. An adaptive scalar quantization table method is applied in our
audio coder, with a different quantization number applied to each of the 42 subbands.
The scale factor for each subband must be chosen with great care to ensure that the
reconstruction error due to quantization remains below the masking curve, which is
indicated by equation (18)

where
δk(ω) denotes the frequency response of the kth subband of the WP transform. E
k is the quantization noise power and
M = 38
is the number of WP subbands. It is worth1 noting that ψ(ω) denotes the power for density
spectrum of the masking curve which is actually a piecewise constant function equal
in each critical band to the corresponding masking threshold determined below.
[0047] After filter bank decomposition, the audio signals are transmitted into a psychoacoustic
masking model which identifies redundant audio information, and quantization is then
applied. Masking is a frequency domain phenomenon whereby a weak signal, the maskee,
is made inaudible by the presence of a simultaneously occurring stronger pure tone
signal (the masker), if they are close enough I to each other in frequency. The masking
threshold depends on the sound pressure level (SPL), the frequency of the masker,
and the characteristics of masker and maskee. The frequency ranges used in the current
filter bank tree are given in the first two columns of the table above. The shape
of the masking threshold of a pure tone at frequency
f can be approximated by the formula in Table 3 where
fmasker and
fmaskee are the frequencies of the masker and the maskee signals respectively.
Tmax(fmasker) is the relative masking threshold at this frequency. The masking threshold for a
maskee is obtained by multiplying the power of the maskee by
T(fmasker, fmaskee) Therefore, for an input signal frame, we calculate a masking threshold for each critical
band as follows. First, the power
S(f) of each frequency
f in the Fourier domain is estimated by apply a Fast Fourier Transformation (FFT) to
the input signal frame. Then the masking threshold
Xthresh(F) for a critical band
F is determined. To determine the masking threshold for each frequency component
f ∈
F, we must let equations (19), (20), and (21) be satisfied,



where T
abs is the absolute listening threshold in
F taken from the ISO/IEC MPEG standard's absolute threshold table. Masking effects
do not only occur in the frequency domain, but also in the time before and after a
masking sound. It has empirically been shown that the premasking effect, also known
as preecho, may last 10 ms and can only be ignored if the masking sound is shorter
than two milliseconds. Preechos are eliminated by low-pass filtering frames containing
them before they are transmitted. The maksing threshold frequency os represented by
equation (22)

[0048] In application, the multimedia compression algorithm discussed above is generally
shown in the exemplary embodiment illustrated in Figure 3. Image and sound data is
received on separated input channels 18 and 19, and is then forwarded to corresponding
compression modules. The corresponding output is then merged and entropy encoded with
variable run length encoding by merger and entropy module 25. Processing in each of
the modules is supervised by the global bitrate manger 27, which is configured to
dictate the pace of compression and to merge the resulting compressed video and audio
signals according to the available bitrate. This typically involves load balancing
of audio and video signals and/or quantization of the compressed signals.
[0049] Since the present exemplary embodiment targets low bitrate applications, it is necessary
to code frames sequentially with conditional block replenishment. Generally speaking,
frames are therefore transmitted in an order-preserving manner where the coding the
current frame relies on that of the previous one. Range blocks for the current frame
are mapped to domain blocks of the previous frame. Depending on the available bitrate,
an adjustable filter will then select only those blocks with the highest interframe
error for transmission. Due to the fact that the image is not periodically refreshed
as a whole, there is no refresh rate associated with the output frame. Color is represented
in the Y-U-V format, and both the U and V images are subsampled by a factor of 2 in
each direction in the present exemplary embodiment. The color blocks are generally
updated at the same time as the corresponding luminance blocks. After the data to
be transmitted has been quantized, a final entropy coding is added. A time record
is kept, since each of the transmitted blocks is updated every time a block is selected
for transmission.
[0050] A general illustration of the image portion of the encoding scheme of the present
exemplary embodiment is shown in Figure 4. In operation, the image to be compressed
is received in the new image input module 19. Upon entry of the color image into the
compression module of the present exemplary embodiment, the color image is transmitted
to the RGB-YUV module 20. In RGB-YUV 20 color images are represented in the Y-U-V
format, and both the U and V images are subsampled by a factor of 2, for example,
in each direction. A such, the color portion of the image is essentially separated
from the black and white portion of the image. The color blocks are updated at the
same time as the corresponding luminescence blocks. RGB-YUV 20 transmits the UV portion
of the image to storage unit 22, while the remaining RGB portion of the image is sent
to the new tree construction module 21, where the HC-tree for the image is constructed
from an abstract view of a number of pixels, which is recursively repeated, as discussed
above. Once the HC-tree is constructed, the tree is transmitted to stored tree module
23, wherein the constructed tree is stored. Additionally, the constructed HC-tree
transmitted to the block differencer and prioritizer module 26, where similar branches
of the HC-tree are combined in order to increase the efficiency of the compression
process, as discussed above. Further still, the constructed HC-tree is transmitted
to the fractal encoding module 24, where the fractal mapping for the image and/or
the corresponding tree is computed. Fractal mapping module is additionally in connection
with the storage unit 22, stored tree module 23, and RGB-YUV converter 20, and therefore,
is capable of receiving input from each of these modules for processing in the fractal
encoding operation. Bit rate manager 27, which is on connection with each module of
the compression device, operates to prioritize the operations and transmission of
information within the compression device. In particular, if bit rates are low, then
bit rate manager 27 operates to tailor audio and video in order to guarantee the maximum
quality of multimedia transmission given the particular bandwidth. Merger and entropy
coder 25 accepts the audio and image data from the corresponding modules shown in
Figure 4, and in accordance with tactical information provided by bit rate manager
27, the audio and image data are joined as a single stream and quantized. Then the
entropy portion of merger and entropy module 25 performs a final non-content specific
compression algorithm thereto before the stream is sent to a transmission module.
[0051] Figure 5 illustrates an exemplary configuration of the sound compression portion
of the present exemplary embodiment. As shown in Figure 3, the audio portion of the
multimedia data is received by sound input module 18, and then forwarded to the WP
transform module 28. WP transform module is in connection with the stored WPT module
32, the sensitive correction and prioritizer module 30, and the adaptive quantizer
module 33. The primary function of WP transform module 28 is to implement a filer
bank decomposition tool. In particular, in the present exemplary embodiment, WP transform
module 28 implements a 42 band decomposition filter bank tree structure designed to
result in non-uniform decomposition that closely matches critical bands given by a
psychoacoustic model. After the new sound frame has been analyzed by the WP transform
module 28, it is quantized according to the corresponding noise level in quantization
module 29. This quantization process involves using an individual scaling factor for
each of 42 frequency bands in the present exemplary embodiment. These are determined
adaptively for each frame of audio from an iterative analysis of the quantization
noise and masking threshold obtained from a perceptual masking model. The quantization
coefficients are then stored in table module 31, and only the differences between
the quantization tables for neighboring frames are actually transmitted. The resulting
information is then sent to the merger and entropy module 25 for merging with the
image portion of the multimedia data.
[0052] Decoding of the compressed multimedia data at the receiver end is accomplished by
s simple dequantization step followed by the inverse WP transform to obtain the reconstructed
signal. However, the expansion of the image requires joining the image data on which
the actually displayed frame is based, and the differences currently received, before
the fractal decompression process may be preformed on those regions that need to be
updated. Similarly, sound data is combined with the audio still present from the preceding
frame, and then the dequantization step and inverse WP transform are applied.
[0053] In view of the configuration of the present exemplary embodiment, there is an inherent
priority between the image and audio portions of the multimedia transmission. In particular,
the priority between the image and sound portions is decided in accordance with a
few simple principles. These principles illustrate that basic audio flow receives
a top priority, since audio delays are known to be more perceptible than image delays.
However, as stated above, the present invention is provided in order to propagate
the entire multimedia stream with the least amount of delay and/or errors as possible,
therefore providing a smooth image and accompanying audio over a generally available
bandwidths.
[0054] In view of the fact that CPU speeds generally exceed the network processing speeds,
a part-wise comparison of actual and preceding image and audio data is costly, and
therefore avoided, as long as dropping the logical frame rate remains a viable option
in view of image quality concerns. With particular regard to image data, this is generally
the case down to 30% of the targeted frame rate, and if the frame rate drops below
this threshold, only those parts of an image where changes occur are actually sent.
[0055] Although the invention described in the present exemplary embodiment has been illustrated
in a hardware configuration with a plurality of modules, it is expressly contemplated
within the scope of the present invention to implement the present invention through
software alone, or through a combination of hardware and software. Furthermore, although
the invention has been described based upon these preferred embodiments, it would
be apparent to those of skilled in the art that certain modifications, variations,
and alternative constructions would be apparent, while remaining within the scope
of the invention. In order to determine the metes and bounds of the invention, therefore,
reference should be made to the appended claims.
1. An apparatus for compressing images comprising:
an image compression unit (24) for compressing an image portion of a data frame in
a fractal coding environment using domain blocks and range blocks,
each domain bock and range block being provided with a hierarchical categorization
(HC) tree having a plurality of nodes wherein a higher level node (16) represents
an abstract view of the next lower level node (15) and the lowest level node (14)
represents a concrete view of the image block;
the image compression unit (24) is configured to compare a domain block and a range
block by starting with a comparison of the most abstract level of the HC tree associated
with said domain block with the most abstract level of the HC tree associated with
said range block.
2. An apparatus for transmitting multimedia comprising an apparatus for compressing images
as claimed in claim 1, wherein said image compression unit further comprises:
an image converter (20) for separating a UV portion from an RGB portion;
a tree construction unit (21) in connection with the image converter for constructing
a hierarchical categorization tree;
a block differencer and prioritizer unit (26) in connection with the tree construction
unit for optimizing the hierarchical categorization tree;
at least one storage unit (22) in connection with the image converter and the tree
construction unit for storing at least one of a UV portion of the multimedia image
and the hierarchical categorization tree;
a fractal encoding unit (26) in connection with the at least one storage unit for
executing a fractal encoding operation based upon the hierarchical categorization
tree;
and a control unit (27) in connection with the at least one storage unit, the image
converter, the tree construction unit, the fractal encoding unit, and the merger unit
for regulating the operation of the apparatus.
3. An apparatus for transmitting multimedia comprising:
an apparatus for compressing images as claimed in claim 1 for compressing an image
portion of a multimedia data frame;
an audio compression unit for compressing an audio portion of the multimedia frame;
a bit rate manager (27) in communication with said image compression unit,
and said audio compression unit for controlling frame flow; and
a merger unit (25) in communication with each of said image compression unit and said
audio compression unit for merging a compressed image portion and a compressed audio
portion into a single compressed multimedia frame.
4. An apparatus for transmitting multimedia as recited in claim 3, wherein said audio
compression unit is configured to execute a filter bank decomposition operation to
compress the audio portion of the multimedia frame.
5. An apparatus for transmitting multimedia as recited in claim 4, wherein said filter
bank decomposition operation further comprises a wavelet packet transform.
6. An apparatus for transmitting multimedia as recited in claim 3, wherein said image
compression unit further comprises:
an image converter (20) for separating a UV portion from an RGB portion;
a tree construction unit (21) in connection with the image converter for constructing
a hierarchical categorization tree;
a block differencer and prioritizer unit (26) in connection with the tree construction
unit for optimizing the hierarchical categorization tree;
at least one storage unit (22) in connection with the image converter and the tree
construction unit for storing at least one of a UV portion of the multimedia image
and the hierarchical categorization tree;
a fractal encoding unit (24) in connection with the at least one storage unit for
executing a fractal encoding operation based upon the multidimensional hierarchical
categorization domain tree and range tree and
a control unit in connection with the at least one storage unit, the image converter,
the tree construction unit, the fractal encoding unit, and the merger unit for regulating
the operation of the apparatus.
7. An apparatus for transmitting multimedia as recited in claim 3, wherein said bit rate
manager (27) is further configured to prioritize information generated by the apparatus
and to minimize perceivable deficiencies.
8. An apparatus for transmitting multimedia as recited in claim 3, wherein said image
compression unit further comprises a hierarchical categorization tree construction
unit (21),
wherein said hierarchical categorization tree construction unit is configured to
construct at least one hierarchical categorization tree that is representative of
the image portion of the multimedia frame.
9. An apparatus for transmitting multimedia as recited in claim 8, wherein said at least
one hierarchical categorization tree further comprises a precomputed domain block
tree.
10. An apparatus for transmitting multimedia as recited in claim 8, said apparatus further
comprising a fractal encoding (24) unit in communication with said hierarchical categorization
tree construction unit (21),
wherein said fractal encoding unit is configured to conduct a fractal encoding
operation on the at least one hierarchical categorization tree.
11. An apparatus for transmitting multimedia as recited in claim 3, wherein said an audio
compression unit further comprises:
a transform unit (28) for receiving the audio portion of the multimedia frame and
conducting a WP transform thereon; and
a quantization unit (33) in communication with the transform unit for conducting a
quantization operation on at least one WP coefficient representing the audio portion
of the multimedia frame.
12. An apparatus for transmitting multimedia as recited in claim 11, wherein said audio
compression unit further comprises:
at least one storage unit (32) in communication with said transform unit and said
quantization unit for storing at least one of a WP transform and a result of a quantization
operation.
13. An apparatus for compressing multimedia data, said apparatus comprising:
an apparatus for compressing images as claimed in claim 1;
an audio compression unit; and
a control interface (27) in connection with said image compression unit and said audio
compression unit,
wherein said audio compression unit uses a filter bank decomposition operation
to compress a corresponding audio portion of the multimedia data.
14. An apparatus for compressing multimedia data as recited in claim 13, wherein said
image compression unit further comprises:
an image converter (20), said image converter receiving a multimedia image and separating
a UV component of the multimedia image from an RGB component of the multimedia image;
a tree construction unit (21) in communication with the image converter unit, said
image construction unit constructing said hierarchical categorization tree;
a rate manager (27) in communication with the tree construction unit, said rate manager
operating to control and prioritize information generated by the apparatus and to
minimize perceivable deficiencies; and
a fractal encoding unit (24) in communication with the rate manager, said fractal
encoding unit being configured to conduct a fractal encoding operation on the hierarchical
categorization tree.
15. An apparatus for compressing multimedia data as recited in claim 13, wherein said
audio compression unit further comprises:
a wavelet packet (WP) transform unit (28), said WP transform unit being configured
to apply a WP transform to the audio portion of the multimedia data to generate representative
WP coefficients; and
a quantization unit (33) in communication with the transform unit, said quantization
unit being configured to quantize the representative WP coefficients.
16. An apparatus for compressing multimedia data as recited in claim 15, wherein said
quantization unit is further configured to quantize the WP coefficients using a scalar
quantization table method.
17. An apparatus for compressing multimedia data as recited in claim 15, wherein said
control interface further comprises:
a bit rate manager (27) in communication with the image compression unit and the audio
compression unit; and
a merger/entropy unit (25) in communication with the bit rate manager;
wherein said bit rate manager is configured to cooperatively control the operation
of the image compression unit and the audio compression unit, and said merger/entropy
unit is configured to merge and entropy encode both the compressed image and audio
portions of the multimedia data.
18. A method for compressing an image portion of a data frame in a fractal coding environment
using domain blocks and range blocks,
each domain bock and range block being provided with a hierarchical categorization
(HC) tree having a plurality of nodes wherein a higher level node (16) represents
an abstract view of the next lower level node (15) and the lowest level node (14)
represents a concrete view of the image block; the method comprising
comparing a domain block and a range block by starting with a comparison of the most
abstract level of the HC tree associated with said domain block with the most abstract
level of the HC tree associated with said range block.
19. A method for compressing multimedia data, said method comprising the steps of:
receiving multimedia;
separating an audio portion of the multimedia and an image portion of the multimedia;
compressing the image portion in accordance with claim 18;
compressing the audio portion with a filter bank decomposition; and
merging and synchronizing the compressed audio, and image portions for transmission.
20. The method for compressing multimedia data as recited in claim 19, wherein the compressing
the image portion step further comprises the steps of:
transforming the image portion into a YUV format;
constructing a hierarchical categorization tree of the image;
computing a fractal mapping of the hierarchical categorization tree; and
reducing the computed fractal mapping:
21. The method for compressing multimedia data as recited in claim 19, wherein the step
of compressing the audio portion further comprises the steps of:
analyzing the audio portion with a WP transform to determine WP coefficients;
quantizing the WP coefficients using an individual scaling factor to determine quantization
coefficients;
storing the quantization coefficients;
determining a difference between the stored quantization coefficients and determined
coefficients of a neighboring frame; and
transmitting the determined difference.
22. The method for compressing multimedia data as recited in claim 19, wherein the merging
and synchronizing step further comprises:
merging the compressed audio and image portions into a common compressed multimedia
data form; and
conducting a final non-content specific compression on the compressed multimedia data
form.
23. A method for compressing multimedia data as claimed in claim 19 wherein said step-of
compressing the image portion comprises the steps of:
determining a plurality of domain blocks and a plurality of range blocks corresponding
to the image;
generating at least one hierarchical categorization tree from the plurality of domain
and range blocks determined;
locating a matching domain block for each of the plurality of range blocks through
comparison of a node in the at least one hierarchical categorization
tree with a corresponding node in at least one further hierarchical categorization
tree through a series of average and variance comparisons;
determining each range block not having a matching domain block, and replacing each
determined range block with a set of reduced size range blocks; and
computing a transform for the located matching domain block.
1. Vorrichtung zum Komprimieren von Bildern, die umfasst:
eine Bild-Kompressionseinheit (24), um einen Bildabschnitt eines Datenrahmens In einer
Umgebung einer fraktaten Codierung unter Verwendung von Domänen- und Bereichsblöcken
zu Komprimieren,
wobei jeder Domänenblock und jeder Beretchsbtock mit einem Baum einer hierarchischen
Kategorisierung (HC-Baum) versehen ist, der mehrere Knoten besttzt, wobei ein Knoten
(16) einer höheren Ebene eine abstrakte Ansicht des Knotens (15) der nächstniedrigeren
Ebene repräsentiert und der Knoten (14) der niedrigsten Ebene eine konkrete Ansicht
des Bildblocke repräsentiert;
wobei die Bild-Kompressionseinhelt (24) so konfiguriert ist, dass sie einen Domänenblock
und einen Bereichsblock vergleicht, indem sie mit einem Vergleich der abstreiktesten
Ebene des dem Domänenblock zugeordneten HC-Baums mit der abstraktesten Ebene das dem
Bereichsblock zugeordneten HC-Baums beginnt.
2. Vorrichtung zum Übertragen von Multimedia, die eine Vorrichtung zum Komprimieren von
Bildern nach Anspruch 1 umfasst, wobei die Bild-Komprossionseinheit ferner umfasst:
einen Bildumsetzer (20), um einen UV-Abschnitt von einem RGB-Absehnitt zu trennen;
eine Baumkonstruktionseinheit (21), die mit dem Bildumsetzer verbunden ist. um einem
Baum einer hierarchischen Katagortetarung zu konstruieren;
eine Blockdifferenzierungs- und Blockprlorislerungssinheit (26), die mit der Baumkonatruktlonselnhelt
verbunden ist, um den Baum einer hierarchischen Kategorisierung zu optimieren;
wenigstens eine Speichereinheit (22), die mit dem Bildumsetzer und der Baumkonstrucktionseinheit
verbunden ist, um einen UV-Abschnitt des MultimediaBildes und/oder den Baum einer
hierarchischen Kategorisierung zu speichern;
eine Einheit (26) für fraktale Codierung, die mit der wenigstens einen Spelchereinhelt
verbunden Ist, um eine Operation einer fraktalen Codierung auf der Grundlage des Baums
einer hierarchischen Kategorisierung auszuführen;
und eine Steuereinheit (27), die mit der wenigstens einen Speichereinheit, dem Bildumsetzer,
der Baumkonstruktionseinheit, der Einheit für fraktale Codierung und der Mischereinheit
verbunden ist, um die Operation der Vorrichtung zu steuern.
3. Vorrichtung zum Obertragen von Multimedia, die umfasst:
eine Vorrichtung zum Komprimieren von Bildern nach Anspruch 1, um einen Bildabschnitt
eines Multimedia-Datenrahmens zu komprimieren;
eine Audio-Kompressionseinheit, um einen Audioabschnitt des Multimedia-rahmens zu
komprimieren;
einen Bitraten-Manager (27), der mit der Bild-Kompressionseinheit und mit der Audio-Kompressionseinheit
kommuniziert, um den Rahmenfluss zu steuern; und
eine Mischereinheit (25), die sowohl mit der Bild-Kompressionseinheit als auch mit
der Audio-Kompressionseinheit kommuniziert, um einen komprimierten Bildabschnitt und
einen komprimierten Audioabschnitt zu einem einzigen komprimierten Multimedia-Rahmen
zu mischen.
4. Vorrichtung zum Übertragen von Multimedia nach Anspruch 3, bei der die Audio-Kompressionseinheit
so konfiguriert ist, dass sie eine Filterbank-Zerlegungsoperation ausführt, um den
Audioabschnitt des Multimedia-Rahmens zu komprimieren.
5. Vorrichtung zum Übertragen von Multimedia nach Anspruch 4, bei der die Filterbank-Zerlegungsoperation
ferner eine Wavelet-Pakettransformation umfasst.
6. Vorrichtung zum Übertragen von Multimedia nach Anspruch 3, bei der die Bild-Komprassionseinheit
ferner umfasst:
einen Bildumsetzer (20), um einen UV-Abschnitt von einem RGB-Abschnitt zu trennen;
eine Baumkonstruktionseinheit (21), die mit dem Bildumseizer verbunden ist, um einen
Baum einer hierarchischen Kategorisierung zu Konstruieren;
eine Blockdifferenzierungs- und Blockpriorisierungs-Einheit (28), die mit der Baumkonstruktionseinheit
verbunden ist, um den Baum einer hierarchischen Kategorisierung zu optimieren;
wenigstens eine Speichereinheit (22), die mit dem Bildumsetzer und der Baumkonstruktionseinheit
verbunden Ist, um einen UV-Abschnitt des MultimediaBildes und/oder den Baum einer
hierarchischen Kategorisierung zu speichern;
eine Einheit (24) für fraktale Codierung, die mit der wenigstens einen Speichereinheit
verbunden ist, um eine Operation einer fraktalen Codierung auf der Grundlage des mehrdimensionalen
Domänen- und Bereichsbaurns einer hierarchischen Kategorisierung auszuführen; und
eine Steuereinheit, die mit der wenigstens einen Spelchereinhelt, dem Bildumsetzer,
der Baumkonstruktionseinheit der Einheit zur fraktalen Codierung und der Misehereinheit
verbunden ist, um die Operation der Vorrichtung zu steuern.
7. Vorrichtung zum Übertragen von Multimedia nach Anspruch 3, bei der der Bitraten-Manager
(27) ferner so konfiguriert ist, dass er Informationen, die durch die Vorrichtung
erzeugt worden, Priorität einräumt und wahmehmbare Defekte minimal macht.
8. Vorrichtung zum Übertragen von Multimedia nach Anspruch 3, bei der die Bild-Kompressionseinheit
ferner eine Einheit (21) for die Konstruktion eines Baums einer hierarchischen Kategorisierung
umfasst,
wobei die Einheit für die Konstruktion eines Baums einer hierarchischen Kategorisierung
so konfiguriert ist, dass sie wenigstens einen Baum einer hierarchischen Kategorisierung
konstrulert, der den Bildabschnitt das Multlmedia-Rahmens repräsentiert.
9. Vorrichtung zum Übertragen von Multimedia nach Anspruch 8, bei der der wenigstens
eine Baum einer hierarchischen Kategorisierung ferner einen im Voraus berechneten
Domänenblock-Baum enthält.
10. Vorrichtung zum Übertragen von Multimedia nach Anspruch 6, wobei die Vorrichtung ferner
eine Einheit (24) zur fraktalen Codierung umfasst, die mit der Einheit (21) für die
Konstruktion eines Baums einer hierarchischen Kategorisierung kommuniziert,
wobei die Einheit zur fraktaten Codierung so konfiguriert ist, dass sie eine Operation
einer fraktalen Codierung an dem wenigstens einen Baum einer hierarchischen Kategorisierung
ausführt.
11. Vorrichtung zum Obertragen von Multimedia nach Anspruch 3, bei der die Audio-Komprossionseinheit
ferner umfasst:
eine Transformationseinhett (28), um den Audioabschnitt des Multimedia-Rahmens zu
empfangen und um daran eine WP-Transformation Buszuführen; und
eine Quantisierungseinheit (33), die mit der Transformationseinheit kommunizlert,
um eine Quantisierungsoperation an wenigstens einem WP-Koeffizienten, der den Audioabschnitt
des Multimedia-Rahmens repräsentiert, auszuführen.
12. Vorrichtung zum Übertragen von Multimedia nach Anspruch 11, bai der die Audiokompressionseinhelt
ferner umfasst:
wenigstens eine Speichereinheit (32), die mit der Transformationselnheit und mit der
Quantisierungseinheit kommuniziert, um eine WP-Transformation und/oder ein Ergebnis
einer Quantislerungsopsration zu speichern.
13. Vorrichtung zum Komprimieren von Multimedia-Daten, wobei die Vorrichtung umfasst:
eine Vorrichtung zum Komprimieren von Bildern nach Anspruch 1;
eine Audio-Kompressionseinheit; und
eine Steuerschnittstelle (27), die mit der Bild-Kompressionseinheik und mit der Audio-Komprossionseinheit
verbunden ist,
wobei die Audio-Kompressionseinheit eine Filterbank-Kompressionsoperation verwendet,
um einen entsprechenden Audloabschnitt der Multimediadaten zu komprimieren.
14. Vorrichtung zum Komprimieren von Multimediadaten nach Anspruch 13, bei der die Bild-Komprossionseinheit
ferner umfasst:
einen Bildumsetzer (20), wobei der Bildumsetzer ein Multimediabild empfängt und eine
UV-Komponente des Multimediabildes von einer RGB-Komponenten des Multimediabildes
trennt;
eine Baumkonstruktionseinheit (21), die mit der Bildumsetzungssinhelt kommuniziert,
wobei die Bildkonstruktionseinheit den Baum einer hierarchischen Kategorisierung konstrulert;
einen Raten-Manager (27), der mit der Baumkonstruktionseinheit kommuniziert, wobei
der Raten-Manager in der Weise arbeitet dass er Informationen, die durch die Vorrichtung
erzeugt werden, steuert und mit einer Priorität versieht und wahrnehmbare Defekte
minimal macht; und
eine Einheit (24) zur fraktalen Codierung, die mit dem Raten-Manager kommuniziert,
wobei die Einheit zur fraktalen Codierung so konfiguriert ist, dass sie eine Operation
einer fraktalen Codierung an dem Baum einer hierarchischen Kategorisierung ausführt.
15. Vorrichtung zum Komprimieren von Multimediadaten nach Anspruch 13, wobei die Audio-Kompressionseinheit
ferner umfasst:
eine Wawletpaket-Transformationseinheit (WP-Transformationseinheit) (28), wobei die
WP-Transformationseinheit so konfiguriert ist, dass sie eine WP-Transformation auf
den Audiosbschnitt der Multimediadaten anwendet, um repräsentative WP-Koeffizienten
zu erzeugen; und
eine Quantisierungseinheit (33), die mit der Transformationseinheit kommuniziert,
wobei die Quantisierungseinheit so konfiguriert ist, dass sie die repräsentativen
WP-Koeffizienten quantisiert.
16. Vorrichtung zum Komprimieren von Multimediadaten nach Anspruch 15, wobei die Quantisierungseinheit
ferner so konfiguriert ist, dass sie die WP-Kooffizienten unter Verwendung eines Skalarquantisierungstabellen-Verfahrens
quantisiert.
17. Vorrichtung zum Komprimieren von Multimediadaten nach Anspruch 15, bei der die Steuerschnittstalle
ferner umfasst:
einen Bitraten-Manager (27), der mit der Bild-Kompressionseinheit und mit der Audio-Kompressionseinheit
kommuniziert; und
eine Mischer/Entropie-Einheit (25), die mit dem Bitraten-Manager kommuniziert;
wobei der Bitraten-Manager so konfiguriert ist, dass er die Operation der Bild-Komprossionseinheit
und der Audio-Kompressionseinheit kooperativ steuert, und wobei die Mischer/Entropie-Einhelt
so konfiguriert ist dass sie sowohl den komprimierten Bildabschnitt als auch den komprimierten
Audioabschnitt der Multimediadaten mischt und einer Entroplecodierung unterwirft.
18. Verfahren zum Komprimieren eines Bildabschnitts eines Datenrahmens In einer Umgebung
einer fraktalen Codierung unter Verwendung von Domänen- und Bereichsblöcken,
wobei jeder Domänenblock und jeder Bereichsblock mit einem Baum einer hierarchischen
Kategorisierung (HC-Baum), der mehrere Knoten besitzt, versehen ist, wobei ein Knoten
(16) einer höheren Ebene eine abstrakte Ansicht des Knotens (15) der nächstniedrigeren
Ebene repräsentiert und der Knoten (14) der niedrigsten Ebene eine konkrete Ansicht
des Bildblocks repräsentiert: wobei das Verfahren umfasst:
Vergleichen eines Domänenblodcs und eines Bereichsblocks, indem mit einem Vergleich
der abstraktesten Ebene des dem Domänenblock zugeordneten HC-Baums mit der abstraktesten
Ebene des dem Bereichsblock zugeordneten HC-Baums begonnen wird.
19. Verfahren zum Komprimieren von Multimediadaten, wobei das Verfahren die folgenden
Schritte umfasst:
Empfangen von Multimedia;
Trennen eines Audioabschnitts der Multimedia von einem Bildabschnitt der Multimedia;
Komprimieren des Bildabschnitts nach Anspruch 18;
Komprimieren des Audioabschnitts mit einer Filterbankzerlegung; und
Mischen und Synchronisieren der komprimierten Audio- und Bildabschnitte für die Übertragung.
20. Verfahren zum Komprimieren von Multimediadaten nach Anspruch 19, bei dem der Schritt
des Komprimierens des Bildabschnitts ferner die folgenden Schritte umfasst:
Transformieren des Bildabschnitts In ein YUV-Format;
Konstruieren eines Baums einer hierarchischen Katagorlsierung des Bildes;
Berechnen einer fraktalen Abbildung des Baums einer hierarchischen Kategorisierung;
und
Reduzieren der berechneten fraktalen Abbildung.
21. Verfahren zum Komprimieren von Multimediadaten nach Anspruch 19, bei dem der Schritt
des Komprimierens des Audioabschnifts ferner die folgenden Schritte umfasst:
Analysieren das Audioabschnitts mit einer WP-Transformation, um WP-Koeffizienten zu
bestimmen;
Quantisieren der WP-Koeffizienten unter Verwendung einen Individuellen Skellerungsfaktors,
um Quantisierungskoeffizlenten zu bestimmen;
Speichern der Quantislerungskoeffizlenten;
Bestimmen einer Differenz zwischen den gespeicherten Quantisierungskoeffizienten und
den bestimmten Koefflzienten eines Nachbarrahmens; und
Übertragen der bestimmten Differenz.
22. Verfahren zum Komprimieren von Multimediadaten nach Anspruch 19, bei dem der Mischungs-
und Synchronisierungsschritt ferner umfasst:
Mischen der komprimierten Audio- und Bildabschnitte zu einer gemeinsamen komprimierten
Multimediadaten-Form; und
Ausführen einer andgultigen, nicht inhaltsspezifischen Kompression an der komprimierten
Multimediadatan Form.
23. Verfahren zum Komprimieren von Multimediadaten nach Anspruch 19, bei dem der Schritt
das Komprimierens des Bildabschnitts die folgenden Schritte umfasst:
Bestimmen mehrerer Domänenblöcke und mehrerer Bereichsblöcke, die dem Bild entsprechen;
Erzeugen wenigstens eines Baums einer hierarchischen Kategorisierung aus den mehreren
bestimmten Damänen- und Bereichsblöcken;
Lokalisieren eines überseinstimmenden Domänenblocks für jeden der mehreren Bereichsblöcke
durch Vergleichen eines Knotens In dem wenigstens einen Baum einer hierarchischen
Katogorisierung mit einem entsprechenden Knoten in wenigstens einem weiteren Baum
einer hierarchischen Kategorisierung durch eine Reihe von Ourchschnitts- und Varianz-Vergleichen;
Bestimmen jedes Bereichsblocks, der keinen übereinstimmenden Domänenblock hat, und
Ersetzen jedes bestimmten Bereichsblocks durch eine Menge von Bereichsblöcken mit
verringerter Größe; und
Berechnen einer Transformation für den lokalisierten übereinstimmenden Domänenblock.
1. Dispositif de compression d'images comprenant :
un module de compression d'image (24) pour comprimer une partie d'image d'une trame
de données dans un environnement de codage fractal en utilisant des blocs de domaine
et des blocs de plage,
chaque bloc de domaine et bloc de plage étant muni d'un arbre de catégorisation hiérarchique
(HC) ayant une pluralité de noeuds, parmi lesquels un noeud de niveau plus élevé (16)
représente une vue abstraite du noeud de niveau immédiatement inférieur (15) et le
noeud de niveau le plus bas (14) représente une vue concrète du bloc d'image ;
le module de compression d'image (24) étant configuré pour comparer un bloc de domaine
et un bloc de plage en commençant par une comparaison du niveau le plus abstrait de
l'arbre HC associé au bloc de domaine avec le niveau le plus abstrait de l'arbre HC
associé au bloc de plage.
2. Dispositif d'émission multimédia comprenant un dispositif de compression d'images
selon la revendication 1, dans lequel le module de compression d'image comprend en
outre :
un convertisseur d'image (20) pour séparer une partie UV d'une partie RGB ;
un module de construction d'arbre (21) en connexion avec le convertisseur d'image
pour construire un arbre de catégorisation hiérarchique ;
un module différenciateur de blocs et d'affectation de priorité (26) en connexion
avec le module de construction d'arbre pour optimiser l'arbre de catégorisation hiérarchique
;
au moins un module de mémorisation (22) en connexion avec le convertisseur d'image
et le module de construction d'arbre pour mémoriser au moins une partie UV de l'image
multimédia et l'arbre de catégorisation hiérarchique ;
un module de codage fractal (26) en connexion avec ledit au moine un module de mémorisation
pour exécuter une opération de codage fractal basée sur l'arbre de catégorisation
hiérarchique ;
un module de commande (27) en connexion avec ledit au moins un module de mémorisation,
le convertisseur d'image, le module de construction d'arbre, le module de codage fractal
et un module de fusion pour réguler le fonctionnement de l'appareil.
3. Dispositif d'émission multimédia comprenant :
un dispositif de compression d'images selon la revendication 1 pour comprimer une
partie d'image d'une trame de données multimédia ;
un module de compression audio pour comprimer une partie audio de la trame multimédia
;
un gestionnaire de débit de bits (27) en communication avec le module de compression
d'image, et le module de compression audio pour commander le débit de trames ; et
un module de fusion (25) en communication avec chaque module de compression d'image
et le module de compression audio pour fusionner une partie d'image comprimée et une
partie audio comprimée en une trame multimédia comprimée unique.
4. Dispositif d'émission multimédia selon la revendication 3, dans lequel le module de
compression audio est agencé pour exécuter une opération de décomposition par ensemble
de filtres pour comprimer la partie audio de la trame multimédia.
5. Dispositif d'émission multimédia selon la revendication 4, dans lequel l'opération
de décomposition par ensemble de filtres comprend en outre une transformation de paquets
par ondelettes.
6. Dispositif d'émission multimédia selon la revendication 3, dans lequel le dispositif
de compression d'images comprend en outre :
un convertisseur d'image (20) pour séparer une partie UV d'une partie RGB ;
un module de construction d'arbre (21) en connexion avec le convertisseur d'image
pour construire un arbre de catégorisation hiérarchique ;
un module différenciateur de bloc et d'affectation de priorité (26) en connexion avec
le module de construction d'arbre pour optimiser l'arbre de catégorisation hiérarchique
;
au moins un module de mémorisation (22) en connexion avec le convertisseur d'image
et le module de construction d'arbre pour mémoriser au moins une partie UV de l'image
multimédia et l'arbre de catégorisation hiérarchique ;
un module de codage fractal (24) en connexion avec ledit au moins un module de mémorisation
pour exécuter une opération de codage fractal basée sur l'arbre de domaine de catégorisation
hiérarchique multidimendionnel et l'arbre de plage ; et
un module de commande en connexion avec ledit au moins un module de mémorisation,
le convertisseur d'image, le module de construction d'arbre, le module de codage fractal
et un module de fusion pour réguler le fonctionnement de l'appareil.
7. Dispositif d'émission multimédia selon la revendication 3, dans lequel le gestionnaire
de débit de bits (27) est en outre agencé pour établir une priorité quant aux informations
produites par le dispositif et pour minimiser les défauts perceptibles.
8. Dispositif d'émission multimédia selon la revendication 3, dans lequel le module de
compression d'image comprend en outre un module (21) de construction d'arbre de catégorisation
hiérarchique ;
dans lequel le module de construction d'arbre de catégorisation hiérarchique est
agencé pour construire au moins un arbre de catégorisation hiérarchique qui est représentatif
de la partie d'image de la trame multimédia.
9. Dispositif d'émission multimédia selon la revendication 8, dans lequel ledit au moins
un arbre de catégorisation hiérarchique comprend un arbre de blocs de domaines prècalculé.
10. Dispositif d'émiseion multimédia selon la revendication 8, dans lequel le dispositif
comprend en outre un module de codage fractal (24) en communication avec le module
(21) de construction d'arbre de catégorisation hiérarchique ;
dans lequel le module de codage fractal est agencé pour effectuer une opération
de codage fractal sur ledit au moins un arbre de catégorisation hiérarchique.
11. Dispositif d'émission multimédia selon la revendication 3, dans lequel le module de
compression audio comprend en outre ;
un module de transformation (28) pour recevoir la partie audio de la trame multimédia
et effectuer sur celle-ci une transformation WP ; et
un module de quantification (33) en communication avec le module de transformation
pour effectuer une opération de quantification sur ledit au moins un coefficient WP
représentant la partie audio de la trame multimédia.
12. Dispositif d'émission multimédia selon la revendication 11, dans lequel le module
de compression audio comprend en outre au moins un module de mémorisation (32) en
communication avec le module de transformation et le module de quantification pour
mémoriser au moins une transformation WP et un résultat d'une opération de quantification.
13. Dispositif de compression de données multimédia comprenant :
un dispositif de compression d'images selon la revendication 1,
un module de compression audio ; et
une interface de commande (27) en connexion avec le module de compression d'images
et le module de compression audio ;
dans lequel le module de compression audio utilise une opération de décomposition
par ensemble de filtres pour comprimer une partie audio correspondante des données
multimédia.
14. Dispositif de compression de données multimédia selon la revendication 13, dans lequel
le module de compression d'images comprend en outre :
un convertisseur d'image (20), le convertisseur d'image recevant une image multimédia
et séparant une composante UV de l'image multimédia d'une composante RGB de l'image
multimédia ;
un module de construction d'arbre (21) en communication avec le module convertisseur
d'image, le module de construction d'arbre construisant ledit arbre de catégorisation
hiérarchique ;
un gestionnaire de débit (27) en communication avec le module de construction d'arbre,
le gestionnaire de débit fonctionnant pour commander et conférer une priorité aux
informations produites par le dispositif et pour rendre minimum les défauts perceptibles
; et
un module de codage fractal (24) en communication avec le gestionnaire de débit, le
module de codage fractal étant agencé pour effectuer une opération de codage fractal
sur l'arbre de catégorisation hiérarchique.
15. Dispositif de compression de données multimédia selon la revendication 13, dans lequel
le module de compression d'image comprend en outre :
un module de transformation par paquets d'ondelettes (WP) (28), le module de transformation
WP étant agencé pour appliquer une transformation WP à la partie audio des données
multimédia pour produire des coefficients WP représentatifs ; et
un module de quantification (33) en communication avec le module de transformation,
le module de quantification étant agencé pour quantifier les coefficients WP représentatifs.
16. Dispositif de compression de données multimédia selon la revendication 15, dans lequel
le module de quantification est en outre agencé pour quantifier les coefficients WP
en utilisant un procédé à table de quantification scalaire.
17. Dispositif de compression de données multimédia selon la revendication 15, dans lequel
l'interface de commande comprend en outre :
un gestionnaire de débit de bits (27) en communication avec le module de compression
d'image et le module de compression audio ; et
un module de fusion/entropie (25) en communication avec le gestionnaire de débit de
bits ;
dans lequel le gestionnaire de débit de bits est agencé pour commander de façon
coopérative le fonctionnement du module de compression d'image et du module de compression
audio, et le module de fusion/entropie est agencé pour fusionner et coder par entropie
les parties d'image et audio comprimées des données multimédia.
18. Procédé de compression d'une partie d'image d'une trame de données dans un environnement
de codage fractal en utilisant des blocs de domaine et des blocs de plage,
chaque bloc de domaine et bloc de plage étant muni d'un arbre de catégorisation
hiérarchique (HC) ayant une pluralité de noeuds, parmi lesquels un noeud de niveau
plus élevé (16) représente une vue abstraite du noeud de niveau immédiatement inférieur
(15) et le noeud de niveau le plus bas (14) représente une vue concrète du bloc d'image
;
le procédé comprenant l'étape consistant à comparer un bloc de domaine et un bloc
de plage en commençant par une comparaison du niveau le plus abstrait de l'arbre HC
associé au bloc de domaine avec le niveau le plus abstrait de l'arbre HC associé au
bloc de plage.
19. Procédé de compression de données multimédia, ce procédé comprenant les étapes suivantes
:
recevoir les données multimédia ;
séparer une partie audio des données multimédia d'une partie d'image de ces données
;
comprimer la partie d'image selon la revendication 18 ;
comprimer la partie audio par une décomposition par ensemble de filtres ; et
fusionner et synchroniser les parties audio et d'image comprimées pour émission.
20. Procédé de compression de données multimédia selon la revendication 19, dans lequel
l'étape de compression de la partie d'image comprend en outre les étapes suivantes
:
transformer la partie d'image en un format YUV ;
construire un arbre de catégorisation hiérarchique de l'image ;
calculer un mappage fractal de l'arbre de catégorisation hiérarchique ; et
réduire le mappage fractal calculé.
21. Procédé de compression de données multimédia selon la revendication 19, dans lequel
l'étape de compression de la partie audio comprend en outre les étapes suivantes :
analyser la partie audio par une transformation WP pour déterminer les coefficients
WP ;
quantifier les coefficients WP en utilisant un facteur de normalisation individuel
pour déterminer des coefficients de quantification ;
mémoriser les coefficients de quantification ;
déterminer la différence entre les coefficients de quantification mémorisés et les
coefficients déterminés dans une trame voisine ; et
transmettre la différence déterminée.
22. Procédé de compression de données multimédia selon la revendication 19, dans lequel
l'étape de fusion et de synchronisation comprend en outre :
fusionner les parties audio et d'image comprimées sous forme de données multimédia
comprimées communes ; et
effectuer une compression finale non spécifique au contenu sur les données multimédia
comprimées.
23. Procédé de compression de données multimédia selon la revendication 19, dans lequel
l'étape de compression de la partie d'image comprend les étapes suivantes :
déterminer une pluralité de blocs de domaine et une pluralité de blocs de plage correspondant
aux images ;
produire au moins un arbre de catégorisation hiérarchique à partir de la pluralité
de blocs de domaine et d'image déterminés ;
localiser un bloc de domaine en correspondance pour chacun de la pluralité de blocs
de plage par comparaison d'un noeud dans ledit au moins un arbre de catégorisation
hiérarchique avec un noeud correspondant dans ledit au moins un autre arbre de catégorisation
hiérarchique par une succession de comparaisons de moyenne et de variance ;
déterminer chaque bloc de plage n'ayant pas un bloc de domaine en correspondance et
remplacer chaque bloc de plage déterminé par un ensemble de blocs de plages de dimension
réduite ; et
calculer une transformation pour le bloc de domaine en correspondance localisé.