[0001] The present invention relates to the field of displaying 3D video content, and more
specifically to the adaptation of 3D video content for display under different viewing
conditions.
[0002] The production of 3D video is to a large extent determined by the targeted viewing
conditions, e.g. cinema projection in a theatre or display on a domestic 3D-TV display.
The main parameters that are taken into account during production are the width of
the targeted screen and the distance between the viewer and the screen.
[0003] When 3D video content designed for specific viewing conditions shall be displayed
under different viewing conditions, the 3D video content should be modified to fit
these new viewing conditions. Otherwise the 3D experience quality may be rather low
due to shallow 3D effects or discomfort and visual fatigue. Despite this problem,
today generally no kind of adaptation is performed. This sometimes leads to very poor
3D effects, e.g. when playing 3D movies excerpts or trailers on a 3D-TV display.
[0004] With the current growth of the 3D Cinema market the adaptation of 3D video content
will become an important issue for the replication and distribution of 3D-DVD (Digital
Versatile Disc) and 3D-BD (BluRay Disc). The goal is to avoid the need to handle several
masters for the same 3D video content.
[0005] Today the most primarily investigated approach for adaptation of 3D video content
consists in synthesizing new "virtual" views located at the ideal camera positions
for the targeted viewing conditions. This view synthesis enables pleasing 3D effects
without altering the structure of the scene shot. However, view synthesis is complex
and expensive in terms of computations. It requires the delivery of high quality disparity
maps along with color video views, as the use of poor quality disparity maps induces
unacceptable artifacts in the synthesized views. Though for computer-generated content
the generation of the required disparity maps is rather easy, for natural video contents
this is a rather challenging task. Up to now no reliable chain from disparity estimation
to view synthesis is available.
[0006] Even if improved solutions for disparity estimation become available, it still remains
desirable to provide a reasonable, low-complex adaptation solution, e.g. for 3D set-top
boxes.
[0007] Accordingly, it is an object of the present invention to propose a solution for adaptation
of 3D video content to different viewing conditions, which can be implemented with
low complexity.
[0008] According to the invention, this object is achieved by a method for adapting 3D video
content to a display, which has the steps of:
- retrieving a stereoscopic image pair;
- obtaining a maximum disparity value for the stereoscopic image pair;
- determining a largest allowable shift for the stereoscopic image pair using the obtained
maximum disparity value;
- calculating an actual shift for a left image and a right image of the stereoscopic
image pair using the determined largest allowable shift; and
- shifting the left image and the right image in accordance with the calculated actual
shift.
[0009] Similarly, an apparatus for adapting 3D video content to a display has:
- an input for retrieving a stereoscopic image pair;
- a disparity determination unit for obtaining a maximum disparity value for the stereoscopic
image pair;
- a maximum shift determination unit for determining a largest allowable shift for the
stereoscopic image pair from the obtained maximum disparity value;
- an actual shift calculation unit for calculating an actual shift for a left image
and a right image of the stereoscopic image pair from the determined largest allowable
shift; and
- an image shifting unit for shifting the left image and the right image in accordance
with the calculated actual shift.
[0010] The invention proposes an adaptation of the 3D content by performing a view shifting
on a frame-by-frame basis. The 3D effect is increased by moving back the scene with
regard to the screen, i.e. by moving the views apart. To this end, in order to adapt
a 3D movie to a 3D-TV the left view is shifted to the left and the right view is shifted
to the right. Though this alters the scene structure with regard to what the director
of the movie originally chose, the 3D effect is optimized. A real-time control adapted
to the content, or more specifically adapted to the amount of disparity of each stereoscopic
image pair, is implemented to ensure that the resulting depth remains in the visual
comfort area. For this purpose advantageously the highest disparity value is transmitted
for each stereoscopic image pair. Alternatively, the highest disparity value is obtained
by a search for the maximum value within a complete disparity map that is transmitted
for the stereoscopic image pair. As a further alternative, the highest disparity value
is obtained by a disparity estimation feature. In this case a coarse, block-based
implementation of the disparity estimation is sufficient.
[0011] The solution according to the present invention allows a reliable and fast adaptation
of 3D video content to a display device. The 3D effect is optimized while granting
the viewer comfort without implementing a depth-based synthesis, which is expensive
in terms of computation and hazardous when poor quality depth maps are used.
[0012] Advantageously, the successive shifting steps are complemented with a temporal filtering,
e.g. Kalman filtering, which is a second order filtering, to smoothen the temporal
behavior of the display adaptation. Temporal filtering allows to prevent annoying
jittering 3D artifacts in the resulting 3D content. Such artifacts are especially
likely when depth estimation is required. For natural content, disparity maps may
present frame-by-frame estimation errors, which could harm the final depth perception.
By temporal filtering a smooth variation of the pixel shift is achieved. For CGI contents
(Computer-Generated Imagery) supplied with their own depth maps, temporal filtering
is not necessarily performed.
[0013] Preferably, the viewer has the possibility to adjust the shift of the left view and
the right view with an interface, e.g. an interface similar to the volume or the contrast
bar.
[0014] Advantageously, the shifted left image and the shifted right image are sent directly
to the display. Alternatively, the shifted left image and the shifted right image
are stored on a storage medium for displaying them later.
[0015] For a better understanding the invention shall now be explained in more detail in
the following description with reference to the figures. It is understood that the
invention is not limited to this exemplary embodiment and that specified features
can also expediently be combined and/or modified without departing from the scope
of the present invention as defined in the appended claims. In the figures:
- Fig. 1
- shows a stereoscopic image pair;
- Fig. 2
- depicts depth maps of the stereoscopic image pair of Fig. 1;
- Fig. 3
- gives an explanation of the vergence-accommodation conflict;
- Fig. 4
- depicts the depth situation for a cinema movie scene;
- Fig. 5
- shows the depth situation when the movie scene is displayed on a domestic 3D-TV panel;
- Fig. 6
- shows a flow chart of an adaptation method according to the invention; and
- Fig. 7
- schematically illustrates an apparatus for performing the adaptation method according
to the invention.
[0016] In Fig. 1 a stereoscopic image pair is shown. The image pair consists of a left view
40 and a right view 50. Each image 40, 50 has a width of 1024 pixels and a height
of 768 pixels. The two images 40, 50 were taken with a camera pair having an inter-camera
distance of
tc=10cm and a focal length of
f=2240 pixels. The distance of the convergence plane from the camera basis was
Zconv=+∞. The near clipping plane was located at
Znear=4,48m
, the far clipping plane at
Zfar=112,06m. The maximum distance of the objects in the images 40, 50 was
Zmax≈12m
, the minimum distance
Zmin≈5m
.
[0017] The depth maps 41, 51 of the stereoscopic image pair of Fig. 1 are depicted in Fig.
2. An object located in the near clipping plane would correspond to pure white values.
Accordingly, an object located in the far clipping plane would correspond to pure
black values. The disparity
d(Z) for a given depth Z is given by

[0018] With
Zconv=+∞
, the above equation simplifies to

[0019] Therefore, for
Zconv=+∞ the maximum disparity is negative, i.e.
dmax<0. Using the above formula, the minimum depth
Zmin results in a minimum disparity of
dmin≈-44,8 pixels, whereas the maximum depth
Zmax results in a maximum disparity of
dmax≈-18,7 pixels. As a rule parallax and disparity are positive for objects located behind
the screen
(Z>Zconv), whereas they are negative for objects located in front of the screen (
Z<Zconv)
.
[0020] To look at a three-dimensional object in real life, the eyes of a viewer need to
do two things. Firstly they must verge, i.e. they must rotate slightly inward or outward
so that the projection of an image is always in the center of both retinas. Secondly,
the eyes must accommodate, i.e. change the shape of each lens to focus the image on
the retinas. Artificial 3D, however, causes a vergence-accommodation conflict. The
viewer must focus at one distance, where the light is emitting from the screen, but
verge at another distance, namely the spatial position of the 3D object. This vergence-accommodation
conflict may lead to headaches and other discomforts.
[0021] The vergence-accommodation conflict is schematically illustrated in Fig. 3. The viewer,
whose eyes are separated by an inter-ocular distance
te, focuses on a screen 1 with a width
Wscreen. As long as the viewer verges on an object 6 located in the plane of the screen 1,
there is no vergence-accommodation conflict. In this case the vergence distance
Dconv is equal to the accommodation distance of the eyes. If, however, the viewer verges
to an object 6' located before the screen or an object 6" located behind the screen,
the vergence distance
Dconv is different from the accommodation distance of the eyes.
[0022] Due to this vergence-accommodation conflict there are a lower parallax bound and
an upper parallax bound, which limit the depth range where objects may be located.
The lower parallax bound designates the largest distance to the front of the screen
where an object may be displayed, whereas the upper parallax bound designates the
corresponding distance to the back of the screen.
[0023] Fig. 4 illustrates the depth situation for a cinema movie scene. Drawn is the depth
perceived by a viewer against the depth of the objects in real world. The figure is
based on a cinema movie scene without any particular effect, i.e. there is a linear
relationship between the depth that has been shot and the depth perceived by the viewers.
In the figure, the thick black line 1 corresponds to the position of the cinema screen.
Typically, the cinema screen is located at a distance of 10m from the viewer. The
thick dark grey line 2 corresponds to the lower bound for negative parallax values.
In cinema there is no upper parallax bound because the screen is far enough away from
the viewer. The viewer can look into the infinite without feeling any accommodation
pain. As the movie scene under consideration does not present any specific 3D effect,
i.e. there is no stereoscopic distortion, the depth perceived by the viewer, which
is illustrated by the dashed black line 3, corresponds to the depth of objects that
have been shot.
[0024] Fig. 5 illustrates the corresponding depth situation when the movie scene is displayed
on a domestic 3D-TV panel. In this case the distance to the screen changes to typically
about 3m. As a consequence, an upper parallax bound appears. The upper parallax bound
is illustrated by the light grey line 4. Obtaining 3D effects that are comparable
to the 3D effects that are achieved in a cinema is impossible because of the limited
visual comfort area. Indeed, at home the viewer is located too close to the screen.
As a consequence looking into the infinite while still accommodating on the screen
yields visual fatigue. If no adaptation is performed, the movie scene only presents
poor 3D effects, which is illustrated by the dashed black line 3. The solution according
to the present invention, which moves the scene further to the back behind the screen,
allows to increase the depth perception, without exceeding the visual comfort area.
This is shown by the dash-dotted black line 5.
[0025] In the following the basis for the adaptation process that is performed in order
to achieve the increased depth perception illustrated by the dash-dotted black line
5 in Fig. 5 shall be described.
[0026] A stereoscopic image pair of a frame t with a disparity
dmax(t) is assumed. The value
dmax(t) denotes the highest disparity value in pixels of the stereoscopic image pair. A priori
dmax (t) >0 . The value
dmax(t) is either transmitted as metadata for the stereoscopic image pair or obtained by
a search for the maximum value within a complete disparity map that is transmitted
for the stereoscopic image pair. Alternatively, a disparity estimation feature is
implemented in the 3D-TV display or a connected set-top box. In this case a coarse,
block-based implementation is sufficient.
Consider
[0027] 
where
Nrow denotes the number of pixels per line,
Wscreen is the width of the domestic screen in meters, and
te denotes the viewer's inter-ocular distance. For an adult
te=0.065m, whereas for a child
te=0.04m.
[0028] Let D stand for the distance from viewer to the 3D-TV screen. The highest disparity
amount

that is allowable for these viewing conditions is given by:

where the value 1/
M in diopters corresponds to the vergence-accommodation conflict tolerance that is
admitted by the manufacturer of the set-top box or the 3D-TV display. Advantageously,
a HDMI connection is used for this purpose.
[0029] Consequently, the largest allowable shift
hMAX(t) for a frame t is given by:

[0030] The actual shift
h(t) may be any value between 0 and
hMAX(t), according to the viewer's preferences, with a shift of
h(t)/
2 pixels to the left for the left view and a shift of
h(t)/2 pixels to the right for the right frame. Advantageously the viewer has the possibility
to adjust the shift with an interface similar to the volume or the contrast bar. This
adjustment is expressed by a factor
α, which may assume values between 0 and 1.

[0031] In practice shift values
h(t) up to ~60 pixels, i.e. 30 pixels per view, are obtained. This corresponds to about
3% of the horizontal resolution, which is an acceptable value.
[0032] Preferably, a temporal filtering feature is implemented to smoothen temporal variations
of
dmax. It has been found that such temporal filtering, e.g. Kalman filtering, is feasible
and remains unnoticeable to the viewer.
[0033] An adaptation method according to the invention is shown in Fig. 6. In a first step
10 a stereoscopic image pair is received. Then the maximum disparity value
dmax(t) is obtained 11, either from metadata transmitted together with the stereoscopic image
pair or by a disparity estimation. In the next step 12 the largest allowable shift
hMAX(t) is determined, e.g. as described above. From the value
hMAX(t), and advantageously also from the user settable shift adjustment parameter
α, the final shift
h(t) for the frame, or rather the shift value
h(t)/2 for the left image and the right image, are calculated 13. The left image and the
right image are the shifted 14 accordingly and sent 15 to a display or stored 16 on
a storage medium.
[0034] An apparatus 20 for performing the adaptation method according to the invention is
schematically illustrated in Fig. 7. The apparatus 20 comprises an input 21 for receiving
10 a stereoscopic image pair. A disparity determination unit 22 obtains 11 the maximum
disparity value
dmax(t), either from metadata transmitted together with the stereoscopic image pair using
a metadata evaluation unit 32 or by a disparity estimation using a disparity estimator
33. An optional temporal filter 31 downstream of the disparity determination unit
22 performs a temporal filtering on the maximum disparity value
dmax(t). A maximum shift determination unit 23 determines 12 the largest allowable shift
hMAX(t). Preferably the apparatus 20 has a user interface 24, which enables the viewer to
set a shift adjustment parameter
α. An actual shift calculation unit 25 calculates the final shift
h(t) for the frame, or rather the shift value
h(t)/2 for the left image and the right image, taking into account the shift adjustment
parameter
α. An image shifting unit 26 shifts 14 the left image and the right image accordingly.
Finally, outputs 27, 28 are provided for sending 15 the shifted images to a display
29 or for storing 16 the shifted images on a storage medium 30. Apparently, the different
units may likewise be incorporated into a single processing unit. This is indicated
by the dashed rectangle. Also, the user interface 24 does not necessarily need to
be integrated in the apparatus 20. It is likewise possible to connect the user interface
24 to the apparatus 20 via an input. For example, when the adaptation method according
to the invention is performed in a set-top box, the user interface 24 may be provided
by a connected display or a personal computer, which then transmits the adjustment
parameter
α to the set-top box.
1. A method for adapting 3D video content to a display, the method
having the steps of:
- retrieving (10) a stereoscopic image pair;
- obtaining (11) a maximum disparity value (dmax(t)) for the stereoscopic image pair;
- determining (12) a largest allowable shift (hMAX(t)) for the stereoscopic image pair using the obtained maximum disparity value (dmax (t) ) ;
- calculating (13) an actual shift (h(t)/2) for a left image and a right image of the stereoscopic image pair using the determined
largest allowable shift (hMAX(t)); and
- shifting (14) the left image and the right image in accordance with the calculated
actual shift (h(t)/2).
2. The method according to claim 1, wherein a user settable shift adjustment parameter (α) is taken into account when calculating (13) the actual shift (h(t)/2) for the left image and the right image of the stereoscopic image pair.
3. The method according to claim 1 or 2, further having the step of performing a temporal filtering on the maximum disparity value
(dmax(t)) before determining (12) the largest allowable shift (hMAX(t) ) .
4. The method according to claim 3, wherein temporal filtering is performed by Kalman filtering.
5. The method according to one of claims 1 to 4, wherein the maximum disparity value (dmax(t)) is obtained (11) from metadata associated to the stereoscopic image pair or by a
disparity estimation.
6. The method according to one of claims 1 to 5, further having the step of sending (15) the shifted left image and the shifted right image
to the display (29) or storing (16) the shifted left image and the shifted right image
on a storage medium (30).
7. An apparatus (20) for adapting 3D video content to a display, the apparatus
having:
- an input (21) for retrieving (10) a stereoscopic image pair;
- a disparity determination unit (22) for obtaining (11) a maximum disparity value
(dmax(t)) for the stereoscopic image pair;
- a maximum shift determination unit (23) for determining (12) a largest allowable
shift (hMAX(t)) for the stereoscopic image pair from the obtained maximum disparity value (dmax(t) ) ;
- an actual shift calculation unit (25) for calculating (13) an actual shift (h(t)/2) for a left image and a right image of the stereoscopic image pair from the determined
largest allowable shift (hMAX(t) ) ; and
- an image shifting unit (26) for shifting (14) the left image and the right image
in accordance with the calculated actual shift (h (t)/2 ) .
8. The apparatus (20) according to claim 7, further having a user interface (24) connected to the actual shift calculation unit (25)
for setting a shift adjustment parameter (α), which is taken into account when calculating (13) the actual shift (h(t)/2) for the left image and the right image of the stereoscopic image pair.
9. The apparatus (20) according to claim 7 or 8, further having a temporal filter (31) downstream of the disparity determination unit (22)
for performing a temporal filtering on the maximum disparity value (dmax(t)) before determining (12) the largest allowable shift (hMAX (t) ) .
10. The apparatus (20) according to claim 9, wherein the temporal filter (31) is a Kalman filter.
11. The apparatus (20) according to one of claims 7 to 10, wherein the maximum disparity value (dmax(t)) is obtained (11) by a metadata evaluation unit (32) from metadata associated to
the stereoscopic image pair or by a disparity estimator (33).
12. The apparatus (20) according to one of claims 7 to 11, further having a first output (27) for sending (15) the shifted left image and the shifted
right image to the display (29) and/or a second output (28) for storing (16) the shifted
left image and the shifted right image on a storage medium (30).
13. System including an apparatus (20) according to claim 7 for adapting 3D video content
to a display and a user interface (24) connected to said apparatus (20) for setting
a shift adjustment parameter (α), which is taken into account in the apparatus (20) when calculating (13) an actual
shift (h(t)/2) for a left image and a right image of a stereoscopic image pair.