(19)
(11)EP 3 216 006 B1

(12)EUROPEAN PATENT SPECIFICATION

(45)Mention of the grant of the patent:
10.06.2020 Bulletin 2020/24

(21)Application number: 15720681.4

(22)Date of filing:  28.04.2015
(51)International Patent Classification (IPC): 
G06T 7/557(2017.01)
G06T 7/593(2017.01)
(86)International application number:
PCT/EP2015/059232
(87)International publication number:
WO 2016/173631 (03.11.2016 Gazette  2016/44)

(54)

AN IMAGE PROCESSING APPARATUS AND METHOD

BILDVERARBEITUNGSVORRICHTUNG UND -VERFAHREN

APPAREIL ET PROCÉDÉ DE TRAITEMENT D'IMAGE


(84)Designated Contracting States:
AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

(43)Date of publication of application:
13.09.2017 Bulletin 2017/37

(73)Proprietor: HUAWEI TECHNOLOGIES CO., LTD.
Shenzhen Guangdong 518129 (CN)

(72)Inventors:
  • PAPADHIMITRI, Thoma
    80992 Munich (DE)
  • URFALIOGLU, Onay
    80992 Munich (DE)
  • NAVARRO FRUCTUOSO, Hector
    80992 Munich (DE)
  • KONIECZNY, Jacek
    80992 Munich (DE)

(74)Representative: Kreuz, Georg Maria 
Huawei Technologies Duesseldorf GmbH Riesstraße 25
80992 München
80992 München (DE)


(56)References cited: : 
WO-A1-2015/028040
DE-A1-102012 105 435
  
  • CHANDRAJIT CHOUDHURY ET AL: "Multi-epipolar plane image based 3D reconstruction using robust surface fitting", PROCEEDINGS OF THE 2014 INDIAN CONFERENCE ON COMPUTER VISION GRAPHICS AND IMAGE PROCESSING, ICVGIP '14, 1 January 2014 (2014-01-01), pages 1-6, XP055239944, New York, New York, USA DOI: 10.1145/2683483.2683509 ISBN: 978-1-4503-3061-9
  • LAZAROS NALPANTIDIS ET AL: "Obtaining Reliable Depth Maps for Robotic Applications from a Quad-Camera System", 16 December 2009 (2009-12-16), INTELLIGENT ROBOTICS AND APPLICATIONS, SPRINGER BERLIN HEIDELBERG, BERLIN, HEIDELBERG, PAGE(S) 906 - 916, XP019135966, ISBN: 978-3-642-10816-7 abstract section 1.1, last 12 lines section 2.1; figure 1 section 2.2, last three paragraphs; equation 1
  
Note: Within nine months from the publication of the mention of the grant of the European patent, any person may give notice to the European Patent Office of opposition to the European patent granted. Notice of opposition shall be filed in a written reasoned statement. It shall not be deemed to have been filed until the opposition fee has been paid. (Art. 99(1) European Patent Convention).


Description

TECHNICAL FIELD



[0001] The present invention relates to an image processing apparatus and method. In particular, the present invention relates to an image processing apparatus and method for determining the depth of a pixel of a reference image of a plurality of images representing a visual scene relative to a plurality of locations.

BACKGROUND



[0002] A fundamental task in the field of computer vision and computational photography is the estimation of a depth map of a real world visual scene on the basis of a 4D light field thereof, i.e. a plurality of 2D images of the real world visual scene captured on a regular grid of camera positions. As plenoptic cameras are becoming more and more popular and are expected to replace conventional digital cameras in the near future, the need for computationally efficient depth map estimation algorithms will increase even further in the future.

[0003] However, the task of estimating a depth map from a 4D light field, i.e. a plurality of 2D images of the real world visual scene captured on a regular grid of camera positions, still faces various challenges, such as the accurate depth map estimation of the visual scene at textureless, i.e. uniform color, areas and/or at depth discontinuities. Indeed, at uniform color areas, identifying corresponding points of the visual scene across multiple views/images is extremely difficult. Current algorithmic solutions tend to over smooth the estimated depth map. Unfortunately, this is the case at objects boundaries as well as where depth discontinuities are stronger. This results in an inaccurate depth map estimation of the visual scene at those locations.

[0004] The article "Globally Consistent Depth Labeling of 4D Light Fields", S. Wanner and B. Goldluecke, Computer Vision and Pattern Recognition (CVPR), 2012 describes a method for estimating the depth map of a visual scene via an orientation analysis (based on the so-called structure tensor) of the epipolar images. Each of these images is a 2D cut of the 4D light field. The structure tensor analysis provides an initial depth map estimation, i.e. a fast local solution, which then can be further improved by applying a global optimization approach, i.e. a slow global solution. This comes with the cost of added computational complexity. The fast local solution can be implemented in real time on standard GPUs. For estimating the depth map of the visual scene a first depth map is obtained from images whose centers are positioned regularly along the horizontal line passing from the center of the reference image and a second depth map is obtained from the images positioned along the vertical direction. The first and the second depth maps are merged to obtain a final depth map, wherein the merging of the first and second depth maps is based on their confidence maps in that for each pixel the depth value with the highest confidence value among the two candidates is chosen.

[0005] The article "Scene Reconstruction from High Spatio-Angular Resolution Light Fields', SIGGRAPH, 2013 describes an alternative solution for a visual scene reconstruction from 4D light fields which can deal better with uniform color areas while still preserving depth map discontinuities. Also in this case the computational complexity is high and for this reason a real time implementation is not possible. The input 4D light field must also be sampled densely enough, which in the case of plenoptic cameras is generally not possible. Also in this case a first depth map is obtained from images whose centers are positioned regularly along the horizontal line passing from the center of the reference image and a second depth map is obtained from the images positioned along the vertical direction.

[0006] Thus, there is a need for an improved image processing apparatus and method, in particular an image processing apparatus and method allowing for an improved depth estimation.

[0007] DE 102012105435 A1 discloses an image processing method.

[0008] Choudhury et al., "Multi-epipolar plane image based 3D reconstruction using robust surface fitting", Indian Conference on Computer Vision, Graphics and Image Processing, 1 January 2014, disclose a method for 3D reconstruction from epipolar plane (EP) representation of images and surface fitting for multiview or lightfield images. The proposed method detects parallelograms in EP images using mean shift segmentation.

[0009] Nalpantidis et al., "Obtaining Reliable Depth Maps for Robotic Applications from a Quad-Camera System", Intelligent Robotics and Applications, 16 December 2009, disclose a quad-camera based system able to calculate a single depth map of a scenery. The four cameras are placed on the corners of a square. Thus, three, differently oriented, stereo pairs result when considering a single reference image (namely a horizontal, a vertical and a diagonal pair).

[0010] WO 2015/028040 discloses an image processing apparatus for 3D reconstruction. The image processing apparatus may comprise: an epipolar plane image generation unit configured to generate a first set of epipolar plane images from a first set of images of a scene, the first set of images being captured from a plurality of locations; an orientation determination unit configured to determine, for pixels in the first set of epipolar plane images, two or more orientations of lines passing through any one of the pixels; and a 3D reconstruction unit configured to determine disparity values or depth values for pixels in an image of the scene based on the orientations determined by the orientation determination unit.

SUMMARY



[0011] It is an objective of the invention to provide an image processing apparatus and method allowing for an improved depth estimation.

[0012] This objective is achieved by the subject matter of the independent claims. Further implementation forms are provided in the dependent claims, the description and the figures.

[0013] In order to describe the invention in detail, the following terms will be used having the following meaning:
2D image
A two dimensional picture of a real world visual scene acquired, for instance, by a digital camera.
4D light field
A series of 2D images of a real world visual scene captured on a regular grid (e.g. rectangular or hexagonal) of camera positions.
Plenoptic camera
A camera that captures a 4D light field.
Depth map
Usually a grayscale 2D image of a visual scene where bright pixels indicate points of the scene closer to the camera and darker pixels indicate points further away.
Reference image
One of the images of the 4D light field for which the depth map is to be calculated.
Confidence map
Usually a grayscale 2D image (generally having values between 0 and 1) where bright pixels indicate points of the visual scene whose depth estimation is more reliable and darker pixels indicate pixels corresponding to points of the visual scene whose depth estimation is less reliable.
Baseline
Distance between the centers of 2 consecutive image, i.e. camera, locations.
Disparity
Displacement between the projection of a certain point/pixel of a visual scene in one image and the projection of the corresponding point/pixel in a consecutive neighboring image. The disparity is inversely proportional to the distance of that point/pixel from the camera, i.e. the depth of the point/pixel.


[0014] According to a first aspect the invention relates to an image processing apparatus for determining a depth of a pixel of a reference image of a plurality of images representing a visual scene relative to a plurality of locations, wherein the plurality of locations define a two-dimensional regular grid with rows and columns, for instance, a rectangular grid, and wherein the location of the reference image is associated with a reference row and a reference column of the grid.

[0015] The image processing apparatus comprises a depth determiner configured to determine a first depth estimate on the basis of the reference image and a first subset of the plurality of images for determining the depth of the pixel of the reference image, wherein the images of the first subset are associated with locations being associated with at least one row of the grid different than the reference row and with at least one column of the grid different than the reference column. The depth determiner is further configured to determine a second depth estimate on the basis of the reference image and a second subset of the plurality of images, wherein the images of the second subset of the plurality of images are associated with locations being associated with the reference row, and/or to determine a third depth estimate on the basis of the reference image and a third subset of the plurality of images, wherein the images of the third subset of the plurality of images are associated with locations being associated with the reference column, and

wherein the depth determiner is further configured to combine the first depth estimate, the second depth estimate and/or the third depth estimate for determining the depth of the pixel of the reference image;

wherein the image processing apparatus further comprises a confidence value determiner, wherein the confidence value determiner is configured to determine a respective confidence value associated with the first depth estimate, the second depth estimate and/or the third depth estimate;

wherein the confidence value determiner is configured to determine the confidence value for the first depth estimate, the second depth estimate and/or the third depth estimate on the basis of a structure tensor defined by the first subset, the second subset and the third subset of the plurality of images, respectively;

wherein the confidence value determiner is configured to exclude a depth estimate from processing on the basis of a filter K defined by the equation:

where d denotes the disparity of the pixel and b denotes the baseline defined by the reference image and the first subset of the plurality of images.



[0016] Thus, an image processing apparatus is provided allowing for an improved depth estimation. This implementation form combines in a computationally efficient manner the depth information available from the images lying in the same row as the reference image and images lying in different rows and columns. This implementation form yields very reliable depth estimates and uses only reliable depth estimates for estimating the depth.

[0017] In a first possible implementation form of the first aspect of the invention, the depth determiner is configured to determine the depth of the pixel of the reference image by computing a median value of the first depth estimate, the second depth estimate and/or the third depth estimate. This implementation form provides a computationally efficient way of determining the depth without having to compute confidence values.

[0018] In a second possible implementation form of the first aspect of the invention as such or the first implementation form thereof, the depth determiner is configured to determine the first depth estimate, the second depth estimate and/or the third depth estimate by determining the slope of the epipolar line defined by the position of the pixel in the reference image and the positions of the corresponding pixels in the first subset, the second subset or the third subset of the plurality of images, respectively. This implementation form yields exceptionally good depth estimates.

[0019] In a third possible implementation form of the first aspect of the invention, the depth determiner is configured to determine the depth of the pixel of the reference image by choosing as the depth of the pixel of the reference image the depth estimate from the group consisting of the first depth estimate, the second depth estimate and/or the third depth estimate having the largest confidence value. This implementation form yields very reliable depth estimates.

[0020] In a fourth possible implementation form of the first aspect of the invention as such or any one of the first to third implementation form thereof, the depth determiner is configured to determine the first depth estimate based on interpolating the intensity of the pixel between the intensity of the pixel in the reference image and the intensities of the corresponding pixels in the first subset of the plurality of images. This implementation form allows for a dense disparity processing.

[0021] In a fifth possible implementation form of the first aspect of the invention as such or any one of the first to fourth implementation form thereof, the image processing apparatus further comprises an image recorder configured to record the plurality of images representing the visual scene at the plurality of locations.

[0022] In a sixth possible implementation form of the fifth implementation form of the first aspect of the invention, the image recorder comprises a movable camera, an array of cameras or a plenoptic camera.

[0023] In a seventh possible implementation form of the first aspect of the invention as such or any one of the first to sixth implementation form thereof, the image processing apparatus is configured to determine a depth map for the reference image by determining a depth estimate for a plurality of pixels of the reference image.

[0024] In an eighth possible implementation form of the first aspect of the invention as such or any one of the first to seventh implementation form thereof, the depth determiner is configured to determine the first depth estimate on the basis of the reference image and the first subset of the plurality of images, wherein the reference image and the first subset of the plurality of images share a common center pixel.

[0025] According to a second aspect the invention relates to an image processing method for determining a depth of a pixel of a reference image of a plurality of images representing a visual scene relative to a plurality of locations, the plurality of locations defining a two-dimensional grid with rows and columns, the location of the reference image being associated with a reference row and a reference column of the grid. The image processing method comprises the step of determining a first depth estimate on the basis of the reference image and a first subset of the plurality of images for determining the depth of the pixel of the reference image, wherein the images of the first subset are associated with locations being associated with at least one row of the grid different than the reference row and with at least one column of the grid different than the reference column; determining a second depth estimate on the basis of the reference image and a second subset of the plurality of images, wherein the images of the second subset of the plurality of images are associated with locations being associated with the reference row, and/or a third depth estimate on the basis of the reference image and a third subset of the plurality of images, wherein the images of the third subset of the plurality of images are associated with locations being associated with the reference column, and
combining the first depth estimate, the second depth estimate and/or the third depth estimate for determining the depth of the pixel of the reference image;
wherein the image processing method further comprises:

determining a respective confidence value associated with the first depth estimate, the second depth estimate and/or the third depth estimate on the basis of a structure tensor defined by the first subset, the second subset and/or the third subset of the plurality of images, respectively;

excluding a depth estimate from processing on the basis of a filter K defined by the equation:

where d denotes the disparity of the pixel and b denotes the baseline defined by the reference image and the first subset of the plurality of images.



[0026] The image processing method according to the second aspect of the invention can be performed by the image processing apparatus according to the first aspect of the invention. Further features of the image processing method according to the second aspect of the invention result directly from the functionality of the image processing apparatus according to the first aspect of the invention and its different implementation forms.

[0027] According to a third aspect the invention relates to a computer program comprising program code for performing the method according to the second aspect of the invention when executed on a computer.

[0028] The invention can be implemented in hardware and/or software.

BRIEF DESCRIPTION OF THE DRAWINGS



[0029] Further embodiments of the invention will be described with respect to the following figures, in which:

Fig. 1 shows a schematic diagram of an image processing apparatus according to an embodiment;

Fig. 2 shows an illustrative example of a grid of a plurality of images that can be processed by an image processing apparatus and method according to an embodiment;

Fig. 3 shows an illustrative example of a grid of a plurality of images that can be processed by an image processing apparatus and method according to an embodiment;

Fig. 4 shows a schematic diagram illustrating the pixel selection in a grid of a plurality of images implemented in an image processing apparatus according to an embodiment;

Fig. 5 shows a schematic diagram illustrating the depth estimation implemented in an image processing apparatus according to an embodiment;

Fig. 6 shows a schematic diagram illustrating the depth determination implemented in an image processing apparatus according to an embodiment;

Fig. 7 shows a schematic diagram illustrating the confidence value determination implemented in an image processing apparatus according to an embodiment;

Fig. 8 shows a schematic diagram illustrating the confidence value determination implemented in an image processing apparatus according to an embodiment; and

Fig. 9 shows a schematic diagram of an image processing method according to an embodiment.


DETAILED DESCRIPTION OF EMBODIMENTS



[0030] In the following detailed description, reference is made to the accompanying drawings, which form a part of the disclosure, and in which are shown, by way of illustration, specific aspects in which the disclosure may be practiced. It is understood that other aspects may be utilized and structural or logical changes may be made without departing from the scope of the present disclosure. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims.

[0031] It is understood that a disclosure in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if a specific method step is described, a corresponding device or apparatus may include a unit to perform the described method step, even if such unit is not explicitly described or illustrated in the figures. Further, it is understood that the features of the various exemplary aspects described herein may be combined with each other, unless specifically noted otherwise.

[0032] Figure 1 shows a schematic diagram of an image processing apparatus 100 according to an embodiment. The image processing apparatus 100 is configured to determine a depth 107 of a pixel of a reference image of a plurality of images representing a visual scene relative to a plurality of locations representing, for instance, camera positions. The plurality of locations define a regular two-dimensional grid with rows and columns. Figures 2 and 3 show an exemplary square-shaped two-dimensional grid 200 of camera positions. In the exemplary square-shaped two-dimensional grid 200 of figures 2 and 3 the central image has been chosen as the reference image 201 for illustration purposes. As the person skilled in the art will appreciate, according to the present invention any one image of the plurality of images shown in figures 2 and 3 could be chosen as reference image as well. The reference image 201 is located within a reference row 203 and a reference column 205 of the two-dimensional grid 200.

[0033] The image processing apparatus 100 comprises a depth determiner 101 configured to determine a first depth estimate on the basis of the reference image 201 and a first subset of the plurality of images 105 for determining the depth 107 of the pixel of the reference image 201, wherein the images of the first subset are associated with locations being associated with at least one row of the grid 200 different than the reference row 203 and with at least one column of the grid 200 different than the reference column 205.

[0034] In an embodiment, the image processing apparatus 100 can comprise an image recorder 102 configured to record the plurality of images 105 representing the visual scene at the plurality of locations. In an embodiment, the image recorder 102 comprises, for instance, a movable camera, an array of cameras or a plenoptic camera. In an embodiment, the image processing apparatus 100 is configured to determine a depth map for the reference image 201 by determining a respective depth 107 for a plurality of pixels of the reference image 201.

[0035] In an embodiment, the depth determiner 101 is configured, for instance, to determine a first depth estimate on the basis of the reference image 201 and the subset of the plurality of images 105 of the grid 200 lying along the dashed line 207 shown in figure 2, corresponding to one diagonal of the grid 200.

[0036] In an embodiment, the depth determiner 101 is configured, for instance, to determine a first depth estimate on the basis of the reference image 201 and the subset of the plurality of images 105 of the grid 200 lying along the dashed line 209 shown in figure 2 making an angle of less than 45° with a line defined by the reference row 203 of the grid 200.

[0037] In an embodiment, the depth determiner 101 is further configured to determine a second depth estimate on the basis of the reference image 201 and a second subset of the plurality of images 105, wherein the images of the second subset of the plurality of images 105 are associated with locations being associated with the reference row 203, and wherein the depth determiner 101 is further configured to combine the first depth estimate and the second depth estimate for determining the depth 107 of the pixel of the reference image. In an embodiment, the depth determiner 101 is further configured, for instance, to determine the second depth estimate on the basis of the reference image 201 and the second subset of the plurality of images 105 lying along the dashed line 307 shown in figure 3.

[0038] In an embodiment, the depth determiner 101 is further configured to determine a third depth estimate on the basis of the reference image 201 and a third subset of the plurality of images 105, wherein the images of the third subset of the plurality of images 105 are associated with locations being associated with the reference column 205, and wherein the depth determiner 101 is further configured to combine the first depth estimate, the second depth estimate and/or the third depth estimate for determining the depth 107 of the pixel of the reference image 201. In an embodiment, the depth determiner 101 is further configured, for instance, to determine the third depth estimate on the basis of the reference image 201 and the third subset of the plurality of images 105 lying along the dashed line 309 shown in figure 3.

[0039] In an embodiment, the depth determiner 101 is configured to determine the depth 107 of the pixel of the reference image 201 by computing a median value of the first depth estimate, the second depth estimate and/or the third depth estimate. In an embodiment, the depth determiner 101 is configured, for instance, to determine the depth 107 of the pixel of the reference image 201 by computing a median value of the first depth estimate determined on the basis of the reference image 201 and the subset of the plurality of images 105 of the grid 200 lying along the dashed line 207 shown in figure 2, the second depth estimate determined on the basis of the reference image 201 and the second subset of the plurality of images 105 lying along the dashed line 309 shown in figure 3 and the third depth estimate determined on the basis of the reference image 201 and the third subset of the plurality of images 105 lying along the dashed line 307 shown in figure 3.

[0040] In an embodiment, the depth determiner 101 is configured to determine the first depth estimate on the basis of the reference image 201 and the first subset of the plurality of images 105 for determining the depth 107 of the pixel of the reference image 201, wherein the reference image 201 and the first subset of the plurality of images 105 share a common center pixel.

[0041] As the person skilled in the art will appreciate, the baseline 207a for the images lying along the dashed line 207 shown in figure 2 and the baseline 209a for the images lying along the dashed line 209 shown in figure 2 differ from the baseline 307a for the images lying along the dashed line 307 shown in figure 3 and the baseline 309a for the images lying along the dashed line 309 shown in figure 3.

[0042] For the images lying along the dashed line 207 shown in figure 2 or the images lying along the dashed line 209 shown in figure 2 the pixels that are processed must lie on lines within a given image whose direction forms the same angle, for instance, with the lower edge of the image as the dashed line 207 or the dashed line 209 forms, for instance, with the horizontal line 307 shown in figure 3. This is illustrated in more detail in figure 4. The dashed line 405 shown in figure 4 defines an exemplary subset of the plurality of images 105 of the grid 200 that can be processed by the image processing apparatus 100 together with the reference image to determine the depth of a pixel within the reference image 201. The dashed line 405 show in figure 4 makes an angle α with, for instance, the horizontal line 307 shown in figure 3. As can be taken from the more detailed views on the right hand side of figure 4, in the image 401, for instance, the pixels potentially corresponding to the pixel of the reference image 201 whose depth is to be determined lie along lines, such as the line 403, making the same angle α with, for instance, the lower edge of the image 401. As the person skilled in the art will appreciate, this will lead to a sparser sampling of potentially corresponding pixels along the line 405 compared to a sampling along the lines 307 or 309 described above in the context of figure 3. For this reason, in an embodiment, the depth determiner 101 is configured to interpolate the intensity of the pixel whose depth is to be determined between the intensity of the pixel in the reference image 201 and the intensities of the corresponding pixels in the subset of the plurality of images 105 being processed and to determine the depth of the pixel on the basis of this interpolation. In an embodiment, the depth determiner 101 is further configured, for instance, to interpolate the intensity of the pixel whose depth is to be determined between the intensity of the pixel in the reference image 201 and the intensity of the corresponding pixel in the image 407 shown in figure 4. In an embodiment, the depth determiner 101 can be configured to perform a bilinear interpolation.

[0043] In an embodiment, the depth determiner 101 is configured to determine the first depth estimate, the second depth estimate and/or the third depth estimate by determining the slope of the epipolar line defined by the position of the pixel in the reference image and the positions of the corresponding pixels in the first subset, the second subset or the third subset of the plurality of images, respectively. This embodiment of determining the depth estimate using the slope of the epipolar line will be described in more detail further below under further reference to figures 5 and 6.

[0044] Figure 5 illustrates determining the depth estimate using the slope of the epipolar line for the exemplary line 307 shown in figure 3. The subset 501 of the plurality of images 105 lying along the line 307 shown in figure 3 (which is equivalent to the subset of the plurality of images 105 belonging to the reference row 203) is stacked in a step 503 as a 3D image cube 505. The pixel in the reference image 201 whose depth is to be determined and the corresponding pixels in the other images of the subset 501 of the plurality of images 105 define a line 505a along a horizontal plane within the 3D image cube 505. In a step 507 this horizontal plane is extracted by slicing the 3D image cube 505 resulting in an epipolar image 509. Within the epipolar image 509 the line 505a defines an epipolar line 509a that forms a certain angle with the horizontal direction of the image, which, in turn, can be used to provide a depth estimate.

[0045] Figure 6 illustrates determining the depth estimate using the slope of the epipolar line for the exemplary line 209 shown in figure 2. As the person skilled in the art will appreciate, in this case the baseline bj is larger than the baseline bi for the example shown in figure 6. The pixel in the reference image 201 whose depth is to be determined and the corresponding pixels in the other images of the subset 601 of the plurality of images 105 define a line 605a along a horizontal plane 605. In a step 607 those images of the subset 601 that are not to be processed are removed from the horizontal plane resulting in a modified horizontal plane 609. In a step 611 the modified horizontal plane is refocused resulting in a refocused modified horizontal plane 613 for determining the angle defined by the epipolar line 613a, which, in turn, can be used to provide a disparity and depth estimate.

[0046] In an embodiment, the image processing apparatus 100 further comprises a confidence value determiner 103, wherein the confidence value determiner 103 is configured to determine a respective confidence value associated with the first depth estimate, the second depth estimate and/or the third depth estimate. In an embodiment, the depth determiner 101 is configured to determine the depth 107 of the pixel of the reference image 201 by choosing as the depth 107 of the pixel of the reference image 201 the depth estimate from the group consisting of the first depth estimate, the second depth estimate and/or the third depth estimate having the largest confidence value.

[0047] In an embodiment, the confidence value determiner 103 is configured to determine the confidence value for the first depth estimate, the second depth estimate and/or the third depth estimate on the basis of a structure tensor defined by the first subset, the second subset and the third subset of the plurality of images 105, respectively. As is known to the person skilled in the art, the structure tensor is based on the calculation of the image derivatives. In an embodiment, the structure tensor can be used for determining the depth estimate by determining the slope of the epipolar line defined by the position of a pixel in the reference image and the positions of the corresponding pixels in the subsets of the plurality of images as well. Using the structure tensor for determining a depth estimate and/or a confidence value in a way that can be implemented in the present invention is described in great detail, for instance, in the article "Globally Consistent Depth Labeling of 4D Light Fields", S. Wanner and B. Goldluecke, Computer Vision and Pattern Recognition (CVPR), 2012 or in the PhD thesis "Orientation Analysis in 4D Light Fields", Sven Wanner, Heidelberg Collaboratory for Image Processing (HCI), University of Heidelberg, 2014. In figure 7 every dot in the two plots shows the estimated confidence (y-axis) and estimated disparity (x-axis) of a certain 3D point/pixel in the visual scene. Both plots show the dots relative to the same points of the 3D scene. However, the left plot in figure 7 shows the case of smaller baselines. If the baselines are larger, then the estimated disparities will be larger in average.

[0048] In an embodiment, the confidence value determiner 103 is configured to exclude depth estimates of less reliable pixels on the basis of a filter K defined by the equation:

where d denotes the disparity of the pixel and b denotes the baseline defined by the reference image and the first subset of the plurality of images.

[0049] Figure 8 illustrates the approach to apply a post-processing filtering using the filter K of the estimated confidence maps based on their relative estimated disparity values. Only pixels within the bandwidth of the filter are processed using the Gaussian kernel defined by the filter K, i.e. the solid dots shown in figure 8 lying below the line 801. In an embodiment, the post-processing is performed only for the images along the directions different form the vertical and horizontal one.

[0050] Figure 9 shows a schematic diagram of an image processing method 900 according to an embodiment. The image processing method 900 is configured to determine the depth 107 of a pixel of the reference image 201 of the plurality of images 105 representing a visual scene relative to a plurality of locations, wherein the plurality of locations define a two-dimensional grid 200 with rows and columns and wherein the location of the reference image 201 is associated with a reference row 203 and a reference column 205 of the grid 200. The image processing method 900 comprising the step of determining 901 a first depth estimate on the basis of the reference image 201 and a first subset of the plurality of images 105 for determining the depth 107 of the pixel of the reference image 201, wherein the images of the first subset are associated with locations being associated with at least one row of the grid different than the reference row 203 and with at least one column of the grid different than the reference column 205.

[0051] The image processing method 900 can be performed, for instance, by the image processing apparatus 100.

[0052] Embodiments of the invention may be implemented in a computer program for running on a computer system, at least including code portions for performing steps of a method according to the invention when run on a programmable apparatus, such as a computer system or enabling a programmable apparatus to perform functions of a device or system according to the invention.

[0053] A computer program is a list of instructions such as a particular application program and/or an operating system. The computer program may for instance include one or more of: a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a computer system.

[0054] The computer program may be stored internally on computer readable storage medium or transmitted to the computer system via a computer readable transmission medium. All or some of the computer program may be provided on transitory or non-transitory computer readable media permanently, removably or remotely coupled to an information processing system. The computer readable media may include, for example and without limitation, any number of the following: magnetic storage media including disk and tape storage media; optical storage media such as compact disk media (e.g., CD-ROM, CD-R, etc.) and digital video disk storage media; nonvolatile memory storage media including semiconductor-based memory units such as FLASH memory, EEPROM, EPROM, ROM; ferromagnetic digital memories; MRAM; volatile storage media including registers, buffers or caches, main memory, RAM, etc.; and data transmission media including computer networks, point-to-point telecommunication equipment, and carrier wave transmission media, just to name a few.

[0055] A computer process typically includes an executing (running) program or portion of a program, current program values and state information, and the resources used by the operating system to manage the execution of the process. An operating system (OS) is the software that manages the sharing of the resources of a computer and provides programmers with an interface used to access those resources. An operating system processes system data and user input, and responds by allocating and managing tasks and internal system resources as a service to users and programs of the system.

[0056] The computer system may for instance include at least one processing unit, associated memory and a number of input/output (I/O) devices. When executing the computer program, the computer system processes information according to the computer program and produces resultant output information via I/O devices.

[0057] The connections as discussed herein may be any type of connection suitable to transfer signals from or to the respective nodes, units or devices, for example via intermediate devices. Accordingly, unless implied or stated otherwise, the connections may for example be direct connections or indirect connections. The connections may be illustrated or described in reference to being a single connection, a plurality of connections, unidirectional connections, or bidirectional connections. However, different embodiments may vary the implementation of the connections. For example, separate unidirectional connections may be used rather than bidirectional connections and vice versa. Also, plurality of connections may be replaced with a single connection that transfers multiple signals serially or in a time multiplexed manner. Likewise, single connections carrying multiple signals may be separated out into various different connections carrying subsets of these signals. Therefore, many options exist for transferring signals.

[0058] Those skilled in the art will recognize that the boundaries between logic blocks are merely illustrative and that alternative embodiments may merge logic blocks or circuit elements or impose an alternate decomposition of functionality upon various logic blocks or circuit elements. Thus, it is to be understood that the architectures depicted herein are merely exemplary, and that in fact many other architectures can be implemented which achieve the same functionality.

[0059] Thus, any arrangement of components to achieve the same functionality is effectively "associated" such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as "associated with" each other such that the desired functionality is achieved, irrespective of architectures or intermedial components. Likewise, any two components so associated can also be viewed as being "operably connected" or "operably coupled" to each other to achieve the desired functionality.

[0060] Furthermore, those skilled in the art will recognize that boundaries between the above described operations are merely illustrative. The multiple operations may be combined into a single operation, a single operation may be distributed in additional operations and operations may be executed at least partially overlapping in time. Moreover, alternative embodiments may include multiple instances of a particular operation, and the order of operations may be altered in various other embodiments.

[0061] Also for example, the examples, or portions thereof, may implemented as soft or code representations of physical circuitry or of logical representations convertible into physical circuitry, such as in a hardware description language of any appropriate type.

[0062] Also, the invention is not limited to physical devices or units implemented in nonprogrammable hardware but can also be applied in programmable devices or units able to perform the desired device functions by operating in accordance with suitable program code, such as mainframes, minicomputers, servers, workstations, personal computers, notepads, personal digital assistants, electronic games, automotive and other embedded systems, cell phones and various other wireless devices, commonly denoted in this application as "computer systems".

[0063] However, other modifications, variations and alternatives are also possible. The specifications and drawings are, accordingly, to be regarded in an illustrative rather than in a restrictive sense.


Claims

1. An image processing apparatus (100) for determining a depth (107) of a pixel of a reference image (201) of a plurality of images (105) representing a visual scene relative to a plurality of locations, the plurality of locations defining a two-dimensional grid (200) with rows and columns, the location of the reference image (201) being associated with a reference row (203) and a reference column (205) of the grid (200), the image processing apparatus (100) comprising:

a depth determiner (101) configured to determine a first depth estimate on the basis of the reference image (201) and a first subset of the plurality of images (105) for determining the depth (107) of the pixel of the reference image (201), wherein the images of the first subset are associated with locations being associated with at least one row of the grid (200) different than the reference row (203) and with at least one column of the grid (200) different than the reference column (205),

wherein the depth determiner (101) is further configured to determine a second depth estimate on the basis of the reference image (201) and a second subset of the plurality of images (105), wherein the images of the second subset of the plurality of images (105) are associated with locations being associated with the reference row (203), and/or to determine a third depth estimate on the basis of the reference image (201) and a third subset of the plurality of images (105), wherein the images of the third subset of the plurality of images (105) are associated with locations being associated with the reference column (205),

wherein the image processing apparatus (101) further comprises a confidence value determiner (103), wherein the confidence value determiner (103) is configured to determine a respective confidence value associated with the first depth estimate, the second depth estimate and/or the third depth estimate;

wherein the confidence value determiner (103) is configured to determine the confidence value for the first depth estimate, the second depth estimate and/or the third depth estimate on the basis of a structure tensor defined by the first subset, the second subset and the third subset of the plurality of images (105), respectively;

wherein the confidence value determiner (103) is configured to exclude a depth estimate from processing on the basis of a filter K defined by the equation:

where d denotes the disparity of the pixel and b denotes the baseline defined by the reference image and the first subset of the plurality of images,

wherein the depth determiner (101) is further configured to combine the first depth estimate, the second depth estimate and/or the third depth estimate for determining the depth (107) of the pixel of the reference image (201).


 
2. The image processing apparatus (100) of claim 1, wherein the depth determiner (101) is configured to determine the depth (107) of the pixel of the reference image (201) by computing a median value of the first depth estimate, the second depth estimate and/or the third depth estimate.
 
3. The image processing apparatus (100) of claim 1 or 2, wherein the depth determiner (101) is configured to determine the first depth estimate, the second depth estimate and/or the third depth estimate by determining the slope of the epipolar line (509a; 613a) defined by the position of the pixel in the reference image (201) and the positions of the corresponding pixels in the first subset, the second subset or the third subset of the plurality of images (105), respectively.
 
4. The image processing apparatus (100) of claim 1, wherein the depth determiner (101) is configured to determine the depth (107) of the pixel of the reference image (201) by choosing as the depth (107) of the pixel of the reference image (201) the depth estimate from the group consisting of the first depth estimate, the second depth estimate and/or the third depth estimate having the largest confidence value.
 
5. The image processing apparatus (100) of any one of the preceding claims, wherein the depth determiner (101) is configured to determine the first depth estimate based on interpolating the intensity of the pixel between the intensity of the pixel in the reference image (201) and the intensities of the corresponding pixels in the first subset of the plurality of images (105).
 
6. The image processing apparatus (100) of any one of the preceding claims, wherein the image processing apparatus (100) further comprises an image recorder (102) configured to record the plurality of images (105) representing the visual scene at the plurality of locations.
 
7. The image processing apparatus (100) of claim 6, wherein the image recorder (102) comprises a movable camera, an array of cameras or a plenoptic camera.
 
8. The image processing apparatus (100) of any one of the preceding claims, wherein the image processing apparatus (100) is configured to determine a depth map for the reference image (201) by determining a respective depth (107) for a plurality of pixels of the reference image (201).
 
9. The image processing apparatus (100) of any one of the preceding claims, wherein the depth determiner (103) is configured to determine the first depth estimate on the basis of the reference image (201) and the first subset of the plurality of images (105), wherein the reference image (201) and the first subset of the plurality of images (105) have a common center pixel.
 
10. An image processing method (900) for determining a depth (107) of a pixel of a reference image (201) of a plurality of images (105) representing a visual scene relative to a plurality of locations, the plurality of locations defining a two-dimensional grid (200) with rows and columns, the location of the reference image (201) being associated with a reference row (203) and a reference column (205) of the grid (200), the image processing method (900) comprising the step of:

determining (901) a first depth estimate on the basis of the reference image (201) and a first subset of the plurality of images (105) for determining the depth (107) of the pixel of the reference image (201), wherein the images of the first subset are associated with locations being associated with at least one row of the grid different than the reference row (203) and with at least one column of the grid different than the reference column (205);

determining a second depth estimate on the basis of the reference image (201) and a second subset of the plurality of images (105), wherein the images of the second subset of the plurality of images (105) are associated with locations being associated with the reference row (203), and/or a third depth estimate on the basis of the reference image (201) and a third subset of the plurality of images (105), wherein the images of the third subset of the plurality of images (105) are associated with locations being associated with the reference column (205),

wherein the image processing method (900) further comprises:

determining a respective confidence value associated with the first depth estimate, the second depth estimate and/or the third depth estimate on the basis of a structure tensor defined by the first subset, the second subset and/or the third subset of the plurality of images (105), respectively;

excluding a depth estimate from processing on the basis of a filter K defined by the equation:

where d denotes the disparity of the pixel and b denotes the baseline defined by the reference image and the first subset of the plurality of images, and

combining the first depth estimate, the second depth estimate and/or the third depth estimate for determining the depth (107) of the pixel of the reference image (201).


 
11. A computer program comprising a program code for performing the image processing method (900) of claim 10 when executed on a computer.
 


Ansprüche

1. Bildverarbeitungseinrichtung (100) zum Bestimmen einer Tiefe (107) eines Pixels eines Referenzbilds (201) einer Mehrzahl von Bildern (105), die eine visuelle Szene in Relation zu einer Mehrzahl von Lagen darstellt, wobei die Mehrzahl von Lagen ein zweidimensionales Gitter (200) mit Reihen und Spalten definiert, wobei die Lage des Referenzbilds (201) mit einer Referenzreihe (203) und einer Referenzspalte (205) des Gitters (200) assoziiert ist, wobei die Bildverarbeitungseinrichtung (100) umfasst:

eine Tiefenbestimmungseinheit (101), die ausgestaltet ist, eine erste Tiefenschätzung auf der Basis des Referenzbilds (201) und eines ersten Teilsatzes der Mehrzahl von Bildern (105) zu bestimmten, um die Tiefe (107) des Pixels des Referenzbilds (201) zu bestimmen, wobei die Bilder des ersten Teilsatzes mit Lagen assoziiert sind, die mit mindestens einer Reihe des Gitters (200), die sich von der Referenzreihe (203) unterscheidet, und mit mindestens einer Spalte des Gitters (200), die sich von der Referenzspalte (205) unterscheidet, assoziiert sind,

wobei die Tiefenbestimmungseinheit (101) ferner ausgestaltet ist, eine zweite Tiefenschätzung auf der Basis des Referenzbilds (201) und eines zweiten Teilsatzes der Mehrzahl von Bildern (105) zu bestimmen, wobei die Bilder des zweiten Teilsatzes der Mehrzahl von Bildern (105) mit Lagen assoziiert sind, die mit der Referenzreihe (203) assoziiert sind, und/oder eine dritte Tiefenschätzung auf der Basis des Referenzbilds (201) und eines dritten Teilsatzes der Mehrzahl von Bildern (105) zu bestimmen, wobei die Bilder des dritten Teilsatzes der Mehrzahl von Bildern (105) mit Lagen assoziiert sind, die mit der Referenzspalte (205) assoziiert sind,

wobei die Bildverarbeitungseinrichtung (101) ferner eine Konfidenzwertbestimmungseinheit (103) umfasst, wobei die Konfidenzwertbestimmungseinheit (103) ausgestaltet ist, einen jeweiligen, mit der ersten Tiefenschätzung, der zweiten Tiefenschätzung und/oder der dritten Tiefenschätzung assoziierten Konfidenzwert zu bestimmen;

wobei die Konfidenzwertbestimmungseinheit (103) ausgestaltet ist, den Konfidenzwert für die erste Tiefenschätzung, die zweite Tiefenschätzung und/oder die dritte Tiefenschätzung auf der Basis eines durch den ersten Teilsatz, den zweiten Teilsatz bzw. den dritten Teilsatz der Mehrzahl von Bildern (105) definierten Strukturtensors zu bestimmen;

wobei die Konfidenzwertbestimmungseinheit (103) ausgestaltet ist, eine Tiefenschätzung aus einer Verarbeitung auf der Basis eines durch folgende Gleichung definierten Filters K auszuschließen:

wobei d die Disparität des Pixels bezeichnet und b die durch das Referenzbild und den ersten Teilsatz der Mehrzahl von Bildern definierte Grundlinie bezeichnet,

wobei die Tiefenbestimmungseinheit (101) ferner ausgestaltet ist, die erste Tiefenschätzung, die zweite Tiefenschätzung und/oder die dritte Tiefenschätzung zum Bestimmen der Tiefe (107) des Pixels des Referenzbilds (201) zu kombinieren.


 
2. Bildverarbeitungseinrichtung (100) nach Anspruch 1, wobei die Tiefenbestimmungseinheit (101) ausgestaltet ist, die Tiefe (107) des Pixels des Referenzbilds (201) durch Berechnen eines Medianwerts der ersten Tiefenschätzung, der zweiten Tiefenschätzung und/oder der dritten Tiefenschätzung zu bestimmen.
 
3. Bildverarbeitungseinrichtung (100) nach Anspruch 1 oder 2, wobei die Tiefenbestimmungseinheit (101) ausgestaltet ist, die erste Tiefenschätzung, die zweite Tiefenschätzung und/oder die dritte Tiefenschätzung durch Bestimmen der Kurve der durch die Position des Pixels in dem Referenzbild (201) und die Positionen der entsprechenden Pixel in dem ersten Teilsatz, dem zweiten Teilsatz bzw. dem dritten Teilsatz der Mehrzahl von Bildern (105) definierten Epipolarlinie (509a; 613a) zu bestimmen.
 
4. Bildverarbeitungseinrichtung (100) nach Anspruch 1, wobei die Tiefenbestimmungseinheit (101) ausgestaltet ist, die Tiefe (107) des Pixels des Referenzbilds (201) durch Auswählen als die Tiefe (107) des Pixels des Referenzbilds (201) der Tiefenschätzung aus der Gruppe bestehend aus der ersten Tiefenschätzung, der zweiten Tiefenschätzung und/oder der dritten Tiefenschätzung mit dem größten Konfidenzwert zu bestimmen.
 
5. Bildverarbeitungseinrichtung (100) nach einem der vorhergehenden Ansprüche, wobei die Tiefenbestimmungseinheit (101) ausgestaltet ist, die ersten Tiefenschätzung basierend auf Interpolieren der Intensität des Pixels zwischen der Intensität des Pixels in dem Referenzbild (201) und den Intensitäten der entsprechenden Pixel in dem ersten Teilsatz der Mehrzahl von Bildern (105) zu bestimmen.
 
6. Bildverarbeitungseinrichtung (100) nach einem der vorhergehenden Ansprüche, wobei die Bildverarbeitungseinrichtung (100) ferner eine Bildaufnahmeeinheit (102) umfasst, die ausgestaltet ist, die Mehrzahl von Bildern (105) aufzunehmen, die die visuelle Szene an der Mehrzahl von Lagen darstellen.
 
7. Bildverarbeitungseinrichtung (100) nach Anspruch 6, wobei die Bildaufnahmeeinheit (102) eine bewegbare Kamera, eine Anordnung von Kameras oder eine plenoptische Kamera umfasst.
 
8. Bildverarbeitungseinrichtung (100) nach einem der vorhergehenden Ansprüche, wobei die Bildverarbeitungseinrichtung (100) ausgestaltet ist, eine Tiefenkarte für das Referenzbild (201) durch Bestimmen einer jeweiligen Tiefe (107) für eine Mehrzahl von Pixeln des Referenzbilds (201) zu bestimmen.
 
9. Bildverarbeitungseinrichtung (100) nach einem der vorhergehenden Ansprüche, wobei die Tiefenbestimmungseinheit (103) ausgestaltet ist, die erste Tiefenschätzung auf der Basis des Referenzbilds (201) und des ersten Teilsatzes der Mehrzahl von Bildern (105) zu bestimmen, wobei das Referenzbild (201) und der erste Teilsatz der Mehrzahl von Bildern (105) ein gemeinsames mittleres Pixel aufweisen.
 
10. Bildverarbeitungsverfahren (900) zum Bestimmen einer Tiefe (107) eines Pixels eines Referenzbilds (201) einer Mehrzahl von Bildern (105), die eine visuelle Szene in Relation zu einer Mehrzahl von Lagen darstellt, wobei die Mehrzahl von Lagen ein zweidimensionales Gitter (200) mit Reihen und Spalten definiert, wobei die Lage des Referenzbilds (201) mit einer Referenzreihe (203) und einer Referenzspalte (205) des Gitters (200) assoziiert ist, wobei das Bildverarbeitungsverfahren (900) den folgenden Schritt umfasst:

Bestimmen (901) einer ersten Tiefenschätzung auf der Basis des Referenzbilds (201) und eines ersten Teilsatzes der Mehrzahl von Bildern (105), um die Tiefe (107) des Pixels des Referenzbilds (201) zu bestimmen, wobei die Bilder des ersten Teilsatzes mit Lagen assoziiert sind, die mit mindestens einer Reihe des Gitters, die sich von der Referenzreihe (203) unterscheidet, und mit mindestens einer Spalte des Gitters, die sich von der Referenzspalte (205) unterscheidet, assoziiert sind;

Bestimmen einer zweiten Tiefenschätzung auf der Basis des Referenzbilds (201) und eines zweiten Teilsatzes der Mehrzahl von Bildern (105), wobei die Bilder des zweiten Teilsatzes der Mehrzahl von Bildern (105) mit Lagen assoziiert sind, die mit der Referenzreihe (203) assoziiert sind, und/oder einer dritten Tiefenschätzung auf der Basis des Referenzbilds (201) und eines dritten Teilsatzes der Mehrzahl von Bildern (105), wobei die Bilder des dritten Teilsatzes der Mehrzahl von Bildern (105) mit Lagen assoziiert sind, die mit der Referenzspalte (205) assoziiert sind,

wobei das Bildverarbeitungsverfahren (900) ferner umfasst:

Bestimmen eines jeweiligen mit der ersten Tiefenschätzung, der zweiten Tiefenschätzung und/oder der dritten Tiefenschätzung assoziierten Konfidenzwerts auf der Basis eines durch den ersten Teilsatz, den zweiten Teilsatz bzw. den dritten Teilsatz der Mehrzahl von Bildern (105) definierten Strukturtensors;

Ausschließen einer Tiefenschätzung aus einer Verarbeitung auf der Basis eines durch folgende Gleichung definierten Filters K:

wobei d die Disparität des Pixels bezeichnet und b die durch das Referenzbild und den ersten Teilsatz der Mehrzahl von Bildern definierte Grundlinie bezeichnet, und

Kombinieren der ersten Tiefenschätzung, der zweiten Tiefenschätzung und/oder der dritten Tiefenschätzung zum Bestimmen der Tiefe (107) des Pixels des Referenzbilds (201).


 
11. Computerprogramm, umfassend Programmcode zum Durchführen des Bildverarbeitungsverfahrens (900) nach Anspruch 10, wenn es auf einem Computer ausgeführt wird.
 


Revendications

1. Appareil de traitement d'image (100) pour déterminer une profondeur (107) d'un pixel d'une image de référence (201) d'une pluralité d'images (105) représentant une scène visuelle par rapport à une pluralité d'emplacements, la pluralité d'emplacements définissant une grille bidimensionnelle (200) avec des rangées et des colonnes, l'emplacement de l'image de référence (201) étant associé à une rangée de référence (203) et à une colonne de référence (205) de la grille (200), l'appareil de traitement d'image (100) comprenant :

un module de détermination de profondeur (101), configuré pour déterminer une première estimation de profondeur sur la base de l'image de référence (201) et d'un premier sous-ensemble de la pluralité d'images (105) pour déterminer la profondeur (107) du pixel de l'image de référence (201), les images du premier sous-ensemble étant associées à des emplacements associés à au moins une rangée de la grille (200) différente de la rangée de référence (203) et à au moins une colonne de la grille (200) différente de la colonne de référence (205),

le module de détermination de profondeur (101) étant en outre configuré pour déterminer une deuxième estimation de profondeur sur la base de l'image de référence (201) et d'un deuxième sous-ensemble de la pluralité d'images (105), les images du deuxième sous-ensemble de la pluralité d'images (105) étant associées à des emplacements associés à la rangée de référence (203), et/ou pour déterminer une troisième estimation de profondeur sur la base de l'image de référence (201) et d'un troisième sous-ensemble de la pluralité d'images (105), les images du troisième sous-ensemble de la pluralité d'images (105) étant associées à des emplacements associés à la colonne de référence (205),

l'appareil de traitement d'image (101) comprenant en outre un module de détermination de valeur de confiance (103), le module de détermination de valeur de confiance (103) étant configuré pour déterminer une valeur de confiance respective associée à la première estimation de profondeur, à la deuxième estimation de profondeur et/ou à la troisième estimation de profondeur ;

le module de détermination de valeur de confiance (103) étant configuré pour déterminer la valeur de confiance pour la première estimation de profondeur, la deuxième estimation de profondeur et/ou la troisième estimation de profondeur sur la base d'un tenseur de structure défini respectivement par le premier sous-ensemble, le deuxième sous-ensemble et le troisième sous-ensemble de la pluralité d'images (105) ;

le module de détermination de valeur de confiance (103) étant configuré pour exclure du traitement une estimation de profondeur sur la base d'un filtre K défini par l'équation :

où d désigne la disparité du pixel et b désigne le scénario de départ défini par l'image de référence et le premier sous-ensemble de la pluralité d'images,

le module de détermination de profondeur (101) étant en outre configuré pour combiner la première estimation de profondeur, la deuxième estimation de profondeur et/ou la troisième estimation de profondeur pour déterminer la profondeur (107) du pixel de l'image de référence (201).


 
2. Appareil de traitement d'image (100) selon la revendication 1, dans lequel le module de détermination de profondeur (101) est configuré pour déterminer la profondeur (107) du pixel de l'image de référence (201) en calculant une valeur médiane de la première estimation de profondeur, de la deuxième estimation de profondeur et/ou de la troisième estimation de profondeur.
 
3. Appareil de traitement d'image (100) selon la revendication 1 ou 2, dans lequel le module de détermination de profondeur (101) est configuré pour déterminer la première estimation de profondeur, la deuxième estimation de profondeur et/ou la troisième estimation de profondeur en déterminant la pente de la ligne nucléale (509a ; 613a) définie par la position du pixel dans l'image de référence (201) et les positions des pixels correspondants dans, respectivement, le premier sous-ensemble, le deuxième sous-ensemble ou le troisième sous-ensemble de la pluralité d'images (105).
 
4. Appareil de traitement d'image (100) selon la revendication 1, dans lequel le module de détermination de profondeur (101) est configuré pour déterminer la profondeur (107) du pixel de l'image de référence (201) en choisissant comme profondeur (107) du pixel de l'image de référence (201) l'estimation de profondeur qui a la valeur de confiance maximale dans le groupe constitué de la première estimation de profondeur, de la deuxième estimation de profondeur et/ou de la troisième estimation de profondeur.
 
5. Appareil de traitement d'image (100) selon l'une quelconque des revendications précédentes, dans lequel le module de détermination de profondeur (101) est configuré pour déterminer la première estimation de profondeur sur la base d'une interpolation de l'intensité du pixel entre l'intensité du pixel dans l'image de référence (201) et les intensités des pixels correspondants dans le premier sous-ensemble de la pluralité d'images (105).
 
6. Appareil de traitement d'image (100) selon l'une quelconque des revendications précédentes, l'appareil de traitement d'image (100) comprenant en outre un module d'enregistrement d'image (102) configuré pour enregistrer la pluralité d'images (105) représentant la scène visuelle à la pluralité d'emplacements.
 
7. Appareil de traitement d'image (100) selon la revendication 6, dans lequel le module d'enregistrement d'image (102) comprend un appareil photo mobile, un réseau d'appareils photos ou un appareil photo plénoptique.
 
8. Appareil de traitement d'image (100) selon l'une quelconque des revendications précédentes, l'appareil de traitement d'image (100) étant configuré pour déterminer une carte de profondeur pour l'image de référence (201) en déterminant une profondeur (107) respective pour une pluralité de pixels de l'image de référence (201).
 
9. Appareil de traitement d'image (100) selon l'une quelconque des revendications précédentes, dans lequel le module de détermination de profondeur (103) est configuré pour déterminer la première estimation de profondeur sur la base de l'image de référence (201) et du premier sous-ensemble de la pluralité d'images (105), l'image de référence (201) et le premier sous-ensemble de la pluralité d'images (105) ayant un pixel central commun.
 
10. Procédé de traitement d'image (900) pour déterminer une profondeur (107) d'un pixel d'une image de référence (201) d'une pluralité d'images (105) représentant une scène visuelle par rapport à une pluralité d'emplacements, la pluralité d'emplacements définissant une grille bidimensionnelle (200) avec des rangées et des colonnes, l'emplacement de l'image de référence (201) étant associé à une rangée de référence (203) et à une colonne de référence (205) de la grille (200), le procédé de traitement d'image (900) comprenant les étapes consistant à :

déterminer (901) une première estimation de profondeur sur la base de l'image de référence (201) et d'un premier sous-ensemble de la pluralité d'images (105) pour déterminer la profondeur (107) du pixel de l'image de référence (201), les images du premier sous-ensemble étant associées à des emplacements associés à au moins une rangée de la grille différente de la rangée de référence (203) et à au moins une colonne de la grille différente de la colonne de référence (205) ;

déterminer une deuxième estimation de profondeur sur la base de l'image de référence (201) et d'un deuxième sous-ensemble de la pluralité d'images (105), les images du deuxième sous-ensemble de la pluralité d'images (105) étant associées à des emplacements associés à la rangée de référence (203), et/ou une troisième estimation de profondeur sur la base de l'image de référence (201) et d'un troisième sous-ensemble de la pluralité d'images (105), les images du troisième sous-ensemble de la pluralité d'images (105) étant associées à des emplacements associés à la colonne de référence (205),

le procédé de traitement d'image (900) consistant en outre à :

déterminer une valeur de confiance respective, associée à la première estimation de profondeur, à la deuxième estimation de profondeur et/ou à la troisième estimation de profondeur sur la base d'un tenseur de structure défini respectivement par le premier sous-ensemble, le deuxième sous-ensemble et/ou le troisième sous-ensemble de la pluralité d'images (105) ;

exclure du traitement une estimation de profondeur sur la base d'un filtre K défini par l'équation :

où d désigne la disparité du pixel et b désigne le scénario de départ défini par l'image de référence et le premier sous-ensemble de la pluralité d'images ; et

combiner la première estimation de profondeur, la deuxième estimation de profondeur et/ou la troisième estimation de profondeur pour déterminer la profondeur (107) du pixel de l'image de référence (201).


 
11. Programme informatique comprenant un code de programme permettant, lorsqu'il est exécuté sur un ordinateur, de réaliser le procédé de traitement d'image (900) selon la revendication 10.
 




Drawing
































Cited references

REFERENCES CITED IN THE DESCRIPTION



This list of references cited by the applicant is for the reader's convenience only. It does not form part of the European patent document. Even though great care has been taken in compiling the references, errors or omissions cannot be excluded and the EPO disclaims all liability in this regard.

Patent documents cited in the description




Non-patent literature cited in the description