A. Background of the invention.
A(1) Field of the invention.
[0001] The invention generally relates to a method of transmitting a series of images of
a full motion video scene in a digital format via some transmission medium. More particularly,
said transmission medium is constituted by a compact disc-like record carrier.
[0002] The invention also relates to a display apparatus in which the transmitted images
are processed and made suitable for display on a display screen; and to an optically
readable record carrier on which said images are stored.
A(2) Description of the prior art.
[0003] More than fifteen years ago the firm of Philips marketed an optically readable record
carrier on which audio signals as well as analog video signals were recorded. This
record carrier was referred to as video long play (VLP) and supplemented the well-known
audio long play (ALP). As compared with videotapes, such optically readable record
carriers have the advantage that their quality does not deteriorate due to repeated
use. However, as compared with video tapes they have the drawback that they cannot
be rerecorded.
[0004] In the last ten years a completely new trend has developed, namely that of the optically
readable audio record carriers generally known by the name of CD audio (Compact Disc
audio). Due to its general acceptance and the ever increasing demand for integration
of audio and video apparatus, a compact disc video has been created on which digitised
audio signals as well as an analog video signal are present, which video signal corresponds
to a full motion video scene having a duration of several minutes.
[0005] To increase this duration, the original analog video signal has been digitised. A
full motion video scene is then considered as a finite series of images, for example,
fifty or sixty occurring each second. Such an image comprises, for example 288 image
lines with 352 pixels per line. By means of some sensibly chosen encoding algorithm
each image is converted into an image data block comprising so much digital information
that each pixel of the image can be reconstructed, with the possible inclusion of
the information from other image data blocks. The encoding algorithm is chosen to
be such that consecutive image data blocks comprise a minimum amount of redundant
information. Since the length of each image data block (number of bits in this image
data block) is thus very limited, a very large number of such image data blocks can
be recorded on such a record carrier.
B. Object and summary of the invention.
[0006] The invention has for its object to contribute to the above-mentioned novel development
in order to render said display apparatus financially accessible to a very wide public
on the consumer market.
[0007] According to the invention the images of the series are subjected to a hierarchic
encoding process in which the original series of images is considered as a number
of interleaved sub-series having an increasing ranking order and in which images from
sub-series having a lower ranking order are considered for encoding an image of a
sub-series. In this way each image is converted into an image data block and a packet
header indicating the ranking order of the sub-series with which the corresponding
image is associated is added to each image data block.
[0008] The display apparatus is now adapted to receive all these image data blocks but to
select only those blocks which have predetermined packet headers. Only image data
blocks which are thus selected are subjected to a hierarchic decoding process in a
video processing circuit so as to generate signals which are suitable for displaying
the image on a display screen (for example, a display tube).
[0009] The invention will certainly be appreciated if the following aspect is considered.
The costprice of a video processing circuit increases exponentially with the number
of operations (additions, subtractions, etc.) which it can perform each second. If
the rate of the images in the original series is equal to 50 Hz, this means that the
video processing circuit must be capable of determining each second the three chrominance
signals R, G and B from the transmitted information for approximately 5.10⁶ pixels.
The number of operations which must thus be performed is so high that this can only
be realised by means of a very "powerful" video processing circuit which is, however,
so costly that display apparatus is financially accessible to a select group of consumers
only.
[0010] According to the invention the display apparatus can make a selection from the presented
image data packets so that only image data packets having predetermined packet headers
are applied to the video processing circuit for further processing. This means that
the video processing circuit only has to process a part of all available image data
packets, for example, no more than half of them. It is true that this is at the expense
of the image quality, but practice has proved that this quality is maintained at a
sufficiently high level. It also means that the video processing circuit may be considerably
less powerful, which renders its costprice and hence that of the display apparatus
very favorable.
C. Brief description of the Figures.
[0011]
Fig. 1 shows diagrammatically a compact disc-like record carrier having a track and
its division into packets;
Figs. 2 to 7 show some diagrams to explain the hierarchic encoding process;
Fig. 8 shows a sequence in which the image data blocks with different packet headers
can be transmitted,
Fig. 9 shows diagrammatically the structure of a display apparatus according to the
invention;
Fig. 10 shows diagrammatically another implementation of the hierarchic encoding process.
D. Explanation of the invention.
[0012] In Fig. 1 a part of the track on a compact disc-like record carrier is shown diagrammatically
at A. A packet is present each time between two consecutive points a, b, c, d, e,
etc. The structure of such a packet is shown diagrammatically at B in Fig. 1. It comprises,
for example 2352 bytes and is divided into a packet header H comprising 24 bytes and
a data field D comprising 2328 bytes.
[0013] The packet header H is further divided into a synchronisation field SNC of 12 bytes,
an ordinal number field RF of four bytes and a service field SF of eight bytes. The
synchronisation field SNC marks the start of a packet. It comprises one byte consisting
exclusively of "0" bits, followed by 10 bytes consisting exclusively of "1" bits and
finally again one byte consisting exclusively of "0" bits. The bytes in the ordinal
number field RF indicate the ordinal number of the packet in the track. The service
field SF indicates whether the packet is a video packet, an audio packet or a computer
data packet.
[0014] The data field D is divided into data slots DS. These data slots of an audio packet
are chosen to be such that a 16-bit audio word of a digital audio signal can be transmitted
in each slot. The data slots of a video packet are chosen to be such that an 8-bit
video word of a digitised video signal can be incorporated in each slot. These data
slots also have a length of one byte for computer data packets.
[0015] As already stated in the foregoing, each image is considered as a matrix of 288*352
pixels P(i,k). In this case i(= 1, 2, 3, ... 288) is the ordinal number of the row
and k (=1, 2, ... 352) is the ordinal number of the pixel on this row (column). The
color of such a pixel is completely determined by an associated luminance value Y(i,k)
and two color difference values U(i,k) and V(i,k). If these three values of each pixel
were encoded with an eight-bit accuracy, approximately 130 video packets would be
required for one image. However, this number can be reduced to 54 video packets without
any deterioration of the image quality, namely by transmitting only one out of four
color difference signals in one out of two image lines. In this case an image is thus
completely defined by a 288*352 luminance matrix Y(i,k), a 144*88 color difference
matrix U(r,s) and a 144*88 color difference matrix V(r,s), r = 1, 2, ... 144 and s
= 1, 2, ... 88.
[0016] There are many encoding methods of further reducing the number of bits required to
represent an image and hence the number of video packets required for each image.
By way of example one such method will now be described in greater detail with reference
to Fig. 2. In this Figure 2 the reference S₀ denotes a series of consecutive images
B₁, B₂, ... B₁₂ of a full motion scene. The luminance matrix associated with the image
B
n (n = 1, 2, ...) will be denoted by Y
n(i,k) and the color difference matrices will be denoted by U
n(r,s) and V
n(r,s), respectively. For each image B
n a prediction image B
n' is determined, comprising the prediction matrices Y
n'(i,k), U
n'(r,s) and V
n'(r,s) and, starting from these matrices a difference image DB
n comprising the difference matrices DY
n(i,k), DU
n(r,s) and DV
n(r,s) by difference formation of the image B
n and the prediction image B
n', or expressed mathematically:
i.e.:
The prediction image B
n' is obtained by determining a system of motion vectors Q
n-1,n for the previous image B
n-1 and by shifting the individual pixels of this image B
n-1 in accordance with the associated motion vectors.
[0017] Since the dynamic range of the luminance and color difference values of the difference
matrices is considerably smaller than that of the original matrices, these values
can be represented with considerably fewer bits, for example with only four bits instead
of the original eight bits. Although the calculated systems of motion vectors Q
n must be transmitted in addition to the difference images DB
n for an accurate reconstruction of the original images in the display apparatus, this
method results in a considerable saving of bits. On the one hand a larger number of
images can thus be recorded on the record carrier and on the other hand the time required
to read all information for an image from the record carrier is considerably shorter.
[0018] In this known encoding method each difference image is dependent on the previous
image. In the display apparatus each image of the series will therefore have to be
reconstructed. This means that the temporal resolution of the scenes to be displayed
by the display apparatus is equal to the temporal resolution of the scenes which have
been picked up. As already noted, this means that the display apparatus should comprise
a very powerful video processing circuit.
[0019] The temporal resolution can be influenced and hence the requirements which must be
imposed on the video processing circuit can be influenced by subjecting the images
of the series to a hierarchic encoding process as extensively described, for example
in European Patent Application no. 0,340,843. For the sake of completeness this method
will be described in greater detail by way of example with reference to Fig. 3. In
this Figure 3 the series of consecutive images B₁, B₂, ... B₁₂ of a full motion scene
is again shown at S₀. This series is divided into a number of sub-series, four in
this case, denoted by S₁, S₂, S₃ and S₄, respectively.
Sub-series S₁ comprises the images B₁, B₅, B₉,...,
sub-series S₂ comprises the images B₃, B₇, B11, ...,
sub-series S₃ comprises the images B₂, B₆, B₁₀, ..., and
sub-series S₄ comprises the images B₄, B₈, B₁₂, ....
The images of sub-series S₁ are converted into difference images DB₁, DB₅, DB₉, ...
in the way as described above with reference to Fig. 2. As is shown in Fig. 4 for
the sake of completeness, a system of motion vectors is more particularly determined
for each image of this sub-series S₁. The system Q
1,5 for the image B₁, the system Q
5,9 for the image B₅, the system Q
9,13 for the image B₉ and so forth. With the aid of these vectors prediction images B₁',
B₅ ', B₉', ... are calculated and the difference image DB
m is obtained from a series DS₁ by difference formation of the original image B
m (m = 1, 5, 9, 13, ...) and the associated prediction image B
m'.
[0020] As already noted, a vector of, for example, the system Q
1,5 denotes the direction and the distance over which a pixel or a group of pixels of
the image B₁ must be displaced so as to reach the position of this pixel or group
of pixels in the image B₅. For encoding the images in the sub-series S₂, S₃ and S₄
it is assumed for the sake of simplicity that such a displacement is linear. This
means that said pixel of B₁ has undergone a quarter of the total displacement for
the image B₂, half the total displacement for the image B₃ and three quarters of the
total displacement for the image B₄. For encoding the images of the sub-series S₂
one proceeds in the manner as shown in Fig. 5. Starting from the image B₁ and a system
of motion vectors ½ Q
1,5 each having the same direction as the motion vectors in the system Q
1,5, but being only half as long, a prediction image B
1,3 is determined. Starting from the image B₅ and a system of motion vectors-½ Q
1,5 each having a direction which is opposite to the direction of the motion vectors
in the system Q
1,5 and being only half as long, a prediction image B
5,3 is determined. The average value of the two prediction images B
1,3 and B
5,3 is taken by adding the two prediction images together and dividing them by two. The
result is the desired prediction image B₃'. By difference formation with the original
image B₃, the difference image DB₃ of a series DS₂ is obtained.
[0021] As is shown in Fig. 6, a prediction image B₂' is determined in a corresponding manner,
starting from the images B₁ and B₃, which prediction image leads to a difference image
DB₂ of a series DS₃ by difference formation with B₂. Finally Fig. 7 shows how a difference
image DB₄ of a series DS₄ is obtained by starting from the images B₃, B₄ and B₅.
[0022] For transmitting the series of sub-images thus obtained, the information for each
sub-image is serialised so that an image data block for each sub-image is obtained.
The image data block associated with the difference image DB
n will be denoted by DB

. The image data blocks thus obtained are subsequently transmitted (i.e. recorded
on the disc) in the sequence as shown, for example in Fig.8. More particularly, an
image data block (for example, DB

) associated with a difference image from series DS₁ is transmitted first, then the
image data block (DB

) of the immediately preceding difference image associated with series DS₂, subsequently
the image data block (DB

) of the immediately preceding difference image associated with series DS₃ and finally
the image data block (DS

) of the immediately preceding difference image associated with series DS₄. It is
to be noted that B₁ in Fig. 8 is assumed to be the first image of the scene.
[0023] To be able to distinguish the image data blocks of the difference images of series
DS
i (i = 1, 2, 3, 4) from those of the difference images of series DS
j (j = 1, 2, 3, 4,) and j ≠ i, a packet header indicating the series with which a corresponding
difference image is associated is added to each image data block. In Fig. 8 these
packet headers are denoted by DS₁, DS₂, DS₃ and DS₄.
[0024] Fig. 9 diagrammatically shows an embodiment of a display apparatus adapted to receive
digitised images which are transmitted by means of a compact disc-like transmission
medium in the format shown by way of example in Fig. 8. This display apparatus is
provided with a read device 1 by means of which information recorded on a compact
disc-like record carrier 2 can be read and converted into an electric signal which
is applied to a demultiplexer 3. Starting from the information in the service field
SF of a packet on the disc, this demultiplexer supplies the computer data packets
at its output 3(1), the audio packets at its output 3(2) and the video packets at
its output 3(3).
[0025] Since only the processing of the video packets plays a role within the scope of the
present invention, the processing of the audio and computer data packets will not
be further dealt with. The video packets are applied to a selection circuit 4 removing
the packet headers from the video packets and selecting those blocks from the remaining
image data blocks which are provided with predetermined packet headers, for example,
only those image data blocks which are provided with the packet header DS₁, or both
those image data blocks which are provided with the packet header DS₁ and those image
data blocks which are provided with the packet header DS₂, etc. The image data blocks
thus selected are applied to the video processing circuit 5 which supplies a luminance
matrix Y(i,k) and the associated color difference matrices U(r,s) and V(r,s) for each
image to be displayed. In the embodiment shown the luminance matrix Y(i,k) is stored
in a luminance memory 6(1), the color difference matrix U(r,s) is stored in a U memory
6(2) and the color difference matrix V(r,s) is stored in a V memory 6(3). These memories
6 (.) are addressed in the conventional manner by addresses ADD of an address generator
7 and by a read-write enable signal R/W(.). As soon as this signal has the logic value
"1", information can be written in the relevant memory. If it has the logic value
"0", the contents of the memory can be read. The information read from a memory 6(.)
is converted in a D/A converter 8(.) into an analog signal. The analog luminance signal
Y(t) thus obtained, as well as the two analog color difference signals U(t) and V(t)
are converted into the elementary chrominance signals R, G and B in a dematrixing
circuit 9 and applied to a display tube 10.
[0026] It will be evident that the more powerful the video processing circuit 5 is (and
consequently the more costly), the more series of difference images can be selected
by the selection circuit 4 (number of different packet headers) and thus the higher
the temporal resolution will be.
[0027] It has been tacitly assumed in Fig. 3 that the rate at which the images occur in
the original series is equal to 50 Hz. However, the present invention obviates the
ever recurrent problem related to the difference between the so-called 50 and 60 Hz
field frequency countries. Let it be assumed that the images shown in Fig. 10 occur
at a frequency of 60 Hz. This series can then be divided into five sub-series S₁,
S₂' S₃, S₄, S₅. The images of the sub-series S₁ are converted in the manner as shown
in Fig. 4 into the series DS₁ of difference images (system of motion vectors Q
1,6, Q
6,11, Q
11,16, ...). The images of the sub-series S₂ are converted in the same way as is shown
in Fig. 5 into the series DS₂ of difference images (system of motion vectors

Q
1,6, -

Q
1,6, ...). The images of the sub-series S₃ are converted in the manner as shown in Fig.
6 into the series DS₃ of difference images (system of motion vectors

Q
1,6, -

Q
1,6,

Q
1,6,

Q
6,11, ...). The images of the sub-series S₄ are converted in the manner as shown in Fig.
7 into the series DB₄ of difference images (system of motion vectors

Q
1,6,-

Q
1,6,

Q
1,6, -

Q
6,11, ...). Finally the images of sub-series S₅ are converted into a series DS₅ of difference
images in the manner as shown in Fig. 7 and starting from the images in the series
S₁ and S₄. All this is shown diagrammatically in Fig. 10. More particularly, each
arrow starts at an image by means of which a prediction image is calculated for the
image where the arrow head of the relevant arrow ends, all this while taking the correct
system of motion vectors into account. By selecting only the difference images of,
for example the series DS₁, DS₂, DS₃ and DS₄ of the series of difference images thus
obtained and by displaying them with mutually equal intervals, an image sequence of
50 Hz is obtained. By providing a display apparatus according to Fig. 9 with a selection
circuit 4 and by ordering the video images on the disc and recording them in the manner
as described above with reference to Fig. 10, the discs can be used in the so-called
50 Hz countries as well as in the so-called 60 Hz countries and the display apparatus
can be simply made suitable for use in these different countries.
[0028] It is to be noted that it has been assumed in the foregoing that the motions in the
image are linear. Consequently it is sufficient to calculate systems of "main" motion
vectors for the images in the sub-series S
i. The motion vectors of the images in the other sub-series can then be obtained by
taking a proportional part of these main motion vectors. However, it is alternatively
possible to calculate the actual motion vectors for each image instead of taking the
proportional part of the main motion vectors.