[0001] Present day passenger vehicles are increasingly equipped with camera-based assistance
systems, which capture the surrounding environment of the vehicle, and provide a variety
of functions for improving driving safety and comfort. The functionality of these
systems is based on the analysis of the recorded image data. Therefore, the quality
of the system's predictions is directly related to the quality of the image data.
[0002] One factor that affects the image quality and which is difficult to control is the
degree of contamination of the optical system of the camera. The cameras may be positioned
at places with less risk of contamination, or the cameras may be cleaned by an electric
wiper. Despite of these provisions, it is impossible to avoid a contamination of the
optical system completely. Therefore, it has been proposed to detect dirt particles
on a camera lens automatically in order to trigger an appropriate action. An example
for such an automatic detection of lens contaminations is disclosed in the European
patent application
EP 2351351.
[0003] In a first aspect, the present specification discloses a computer implemented method
for detecting image artifacts.
[0004] Image data with image frames is received from a vehicle camera, for example over
an automotive data bus, and an intensity difference between neighbouring pixels in
a first direction of an image frame is compared with a pre-determined upper threshold
and with a pre-determined lower threshold.
[0005] The first direction may correspond to the rows of an image frame. Furthermore, the
pixel transition values can also be computed in a second direction, or y-direction,
with respect to the pixel locations of the image frame. Thereby, the overall detection
quality can be improved and stripe shaped artifacts can be avoided. The second direction
may correspond to the columns of an image frame.
[0006] A pixel transition value is set to a first value when the previously computed intensity
difference of neighbouring pixels is greater than the pre-determined upper threshold.
The pixel transition value is set to a second value when the intensity difference
is less than the pre-determined lower threshold and the pixel transition value is
set to zero when the intensity difference lies between the pre-determined upper threshold
and the pre-determined lower threshold.
[0007] In particular, the upper threshold can be set to a positive value and the lower threshold
to a negative value, and the positive and the negative value can have equal magnitude.
The upper threshold and the lower threshold may also be equal and, in particular,
they may both be equal to zero. The first value can be chosen as a positive value,
such as 1 or a positive constant a, and the second value can be chosen as a negative
value, such as -1 or as the negative of the first value.
[0008] If the intensity difference is exactly equal to one of the thresholds it may be set
to the respective first or second value or it may be set to zero. The pixel transition
value is also referred to as "transition type".
[0009] Accumulated pixel transition values are computed from the pixel transition values
of corresponding pixel locations of the image frames by applying a low pass filter
with respect to time, wherein time is represented by the frame index. In one embodiment,
the low pass filter is computed as an accumulated value at a frame index f for the
respective first and second direction. The accumulated value is computed as a weighted
sum of the accumulated value at the earlier frame index f - 1 and the pixel transition
value at the earlier frame index f. In particular, the weight factor of the accumulated
value at the earlier frame index f - 1 may be set to at least 0.95. Thereby, a major
contribution comes from the previous estimation, which results in a low pass filter.
[0010] Furthermore, the accumulated pixel transition values are smoothed out with respect
to the pixel locations by applying a spatial filter to the pixel locations, in particular
by computing a convolution of the spatial filter. In particular, the spatial filter
can be provided as filter with filter coefficients between 0 and 1 that fall off to
zero as a function of a difference from a central point, for example by a circular
filter.
[0011] In one embodiment, the low pass filtering with respect to time is performed before
the spatial filtering. In another embodiment, spatial filtering is performed before
the low pass filter with respect to time. In the first case, the low pass filter is
applied to the pixel transition values to obtain accumulated pixel transition values
and the spatial filter is applied to the accumulated pixel transition values. In the
second case, the spatial filter is applied to the pixel transition values to obtain
smoothed pixel transition values and the low pass filter with respect to time is applied
to the smoothed pixel transition values.
[0012] The pixel transition values that have been accumulated with respect to time and smoothed
with respect to the pixel locations (x, y) are referred as "smoothed and accumulated
pixel transition values". This expression refers to both sequences of filtering.
[0013] In one embodiment, the spatial filter is realized as an averaging filter, for which
the filter coefficients add up to 1. This is equivalent to a total volume of 1 under
the filter function if the filter is defined step-wise and the coordinates (x, y)
have a distance of 1.
[0014] Magnitude values of the pixel locations are computed for the smoothed pixel transition
values of the pixel locations. If the smoothed pixel transition values are computed
with respect to one direction only, the magnitude values can be computed by taking
the modulus.
[0015] If the smoothed pixel transition values are computed with respect to the first direction
and with respect to the second direction, a magnitude value can be computed by adding
the squared values for the respective first and second directions, and in particular,
it can be computed as an L2-norm, which is also referred to as Euclidean norm. Then,
the pixels of potential artifact regions are identified by comparing the magnitude
value for given pixel locations (x, y) with a pre-determined detection threshold.
[0016] In a further aspect, the present specification discloses a computer implemented method
for correcting image artifacts. According to this method, image data with image frames
is received from a vehicle camera, for example via an automotive data bus.
[0017] Pixel quotient values for the respective pixel locations are computed in a first
direction, or x-direction. In particular, the first direction can be provided by the
rows of an image frame. In order to improve the image correction and to avoid stripe
shaped artifacts, pixel quotient values for the respective pixel locations can also
be computed in a second direction, or y-direction. In particular, the second direction
can be provided by the columns of an image frame.
[0018] A numerator of the pixel quotient value comprises an image intensity at a given pixel
location and a denominator of the pixel quotient value comprises an image intensity
at a neighbouring pixel in the respective first or second direction. By using pixels
neighbouring positions, the method is "localized", and does not combine pixels from
pixel locations, which are far apart. This feature contributes to a sparse matrix
for a system of linear equations.
[0019] Median values of the pixel quotient values are computed for the respective pixel
locations with respect to time, wherein time is represented by frame index. In particular,
the median value can be computed as a streaming median value, which approximates a
true median.
[0020] The attenuation factors of the pixel locations of the image frames are computed as
an approximate solution to a system of linear equations in the attenuation factors
of the respective pixel locations (x, y), wherein the attenuation factors of the pixel
locations are represented as a vector.
[0021] The system of linear equations comprises a first set of linear equations, in which
the previously determined median values appear as pre-factor of the respective attenuation
factors. Furthermore, the system of linear equations comprises a second set of linear
equations, which determine values of the attenuation factors at corresponding pixel
locations. In particular, the second set of linear equations may be determined by
the abovementioned method for identifying image artifacts.
[0022] A corrected pixel intensity for a pixel of the image frame at a given time t is derived
by dividing the observed pixel intensity by the previously determined attenuation
factor B(x, y), where the attenuation factor lies between 0 and 1.
[0023] In a particular embodiment, the median values of the pixel quotient values are obtained
as streaming median values of the pixel quotient values up to a frame index f. The
streaming median value is derived from a median value estimate for the previous frame
index f - 1 and the pixel quotient value at frame index f.
[0024] The streaming median value approximates the true value of a median. The streaming
median value of the current frame index and pixel is computed by adding a pre-determined
value "delta" to the previous estimate if the current pixel quotient value is greater
than the previous streaming median value. If the current pixel quotient value is less
than the previous streaming median value, the pre-determined value "delta" is subtracted
from the previous streaming median value. Otherwise, the current streaming median
value is set to the previous streaming median value.
[0025] In particular, the abovementioned system of linear equation can be solved approximately
using an iterative method. A number of iteration steps may be determined in advance
or in dependence of a convergence rate.
[0026] The pre-factors of the attenuation factor in the linear equations can be defined
as elements of a constraint matrix. In one embodiment, the method comprises multiplying
the system of linear equations with the transposed constraint matrix. The resulting
system of linear equations is solved using an iterative method. In particular, the
iterative method can be provided by a conjugate gradient method, which is used for
finding the minimum of a quadratic form that is defined by the resulting equation.
[0027] According to a further aspect, the present specification discloses a computation
unit for carrying out the abovementioned method of detecting image artifacts, for
example by providing integrated circuits, ASICS, microprocessors computer readable
memory with data and computer readable instructions and the like.
[0028] The computation unit comprises an input connection for receiving image data and an
output connection for outputting locations of detected pixels. For a bidirectional
data connection, the output and input connections may also coincide. The locations
of detected pixels can also be marked in a memory area, for example by providing pointers
to data structures etc.
[0029] The computation unit is operative to execute the abovementioned artifact detection
method, in particular, the computation unit is operative to compare intensity differences
between neighbouring pixels in a first direction with a pre-determined upper threshold
and with a pre-determined lower threshold and to set a pixel transition value according
to the intensity difference.
[0030] The computation unit sets the pixel transition value to a first value when the intensity
difference is greater than the pre-determined upper threshold, to a second value when
the intensity difference is less than the pre-determined lower threshold and sets
the pixel transition value to zero when the intensity difference lies between the
pre-determined upper threshold and the pre-determined lower threshold.
[0031] Furthermore, the computation unit computes accumulated pixel transition values of
the respective pixel transition values by applying a low pass filter with respect
to a frame index or with respect to time. The computation unit computes smoothed pixel
transition values by applying a spatial filter to the accumulated pixel transition
values and computes a magnitude value of the smoothed pixel transition values for
the pixel locations of the image frame.
[0032] The computation unit outputs the detected pixels via the output connection, for example
by storing a reference to pixel locations or the coordinates of the pixel locations
of the detected artifacts in a computer readable memory of the computation unit.
[0033] Then, the computation unit identifies pixels of potential artifact regions by comparing
the magnitude value with a predetermined detection threshold.
[0034] Moreover, the present specification discloses a vehicle camera with the aforementioned
computation unit, wherein the vehicle camera is connected to the input connection
of the computation unit.
[0035] In a further aspect, the present specification discloses a computation unit for correcting
image artifacts. The computation unit comprises an input connection for receiving
image data and an output connection for outputting corrected image frames, which may
also coincide for a bidirectional data connection.
[0036] The computation unit is operative to execute the abovementioned method for correcting
image artifacts. In particular, the computation unit is operative to compute pixel
quotient values in a first direction, wherein the pixel quotient values are derived
from a quotient. The numerator of the quotient comprising an image intensity at a
given pixel location and the denominator comprising an image intensity at a neighbouring
pixel in the first direction.
[0037] Furthermore, the computation unit computes median values of the pixel quotient values
with respect to time and computes attenuation factors of the respective pixel locations
of the image frame. The attenuation factors are computed as an approximate solution
to a system of linear equations in the attenuation factor, the system of linear equations
comprising a first set of linear equations and a second set of linear equations.
[0038] The equations of the first set of equations relate the value of an attenuation factor
at a first pixel location to the value of an attenuation factor at an adjacent or
neighbouring pixel location in the respective first or second direction. In the first
set of linear equations, the median values appear as pre-factor of the attenuation
factors.
[0039] The second set of linear equations determines values of the attenuation factors at
respective pixel locations, which are known by other means, for example by using the
abovementioned artifact detection method.
[0040] Then, the computation unit derives corrected pixel intensities by dividing the observed
pixel intensities, or, in other words, the pixel intensities in the received current
image frame, by the corresponding attenuation factors B(x, y) of the respective pixel
locations.
[0041] Furthermore, the present specification discloses a vehicle camera with the computation
unit for correcting the image artifacts, wherein the vehicle camera is connected to
the input connection of the computation unit.
[0042] The subject matter of the present specification is now explained in further detail
with respect to the following Figures in which
- Figure 1
- shows an image of a vehicle camera that contains image contaminations,
- Figure 2
- shows a pixel variation measure of the image of Fig. 1 in the x direction,
- Figure 3
- shows a pixel variation measure of the image of Fig. 1 in the y direction,
- Figure 4
- shows the result of smoothing out the image of Fig. 2 by convolution with a circular
filter,
- Figure 5
- shows the result of smoothing out the image of Fig. 3 by convolution with a circular
filter,
- Figure 6
- shows an overall pixel variation measure that is computed from the arrays of Figs.
4 and 5,
- Figure 7
- shows the result of thresholding the overall pixel variation measure of Fig. 6,
- Figure 8
- shows an image with an overlaid synthetic blur mask,
- Figure 9
- shows a corrected image, which is derived from the image of Fig. 8,
- Figure 10
- shows a pixel variation measure ξ_x in the x-direction of Fig. 8,
- Figure 11
- shows a pixel variation measure ξ_y in the y-direction of Fig. 8,
- Figure 12
- shows the synthetic blur mask,
- Figure 13
- shows the estimated blur mask,
- Figure 14
- shows an original image with artifacts,
- Figure 15
- shows a corrected image,
- Figure 16
- shows a pixel variation measure ξ_x in the x-direction of Fig. 14,
- Figure 17
- shows a pixel variation measure ξ_y in the y-direction of Fig. 14,
- Figure 18
- shows an estimated image attenuation or blur mask, and
- Figure 19
- shows an image defect correction system according to the present specification.
DETAILED DESCRIPTION
[0043] In the following description, details are provided to describe the embodiments of
the present specification. It shall be apparent to one skilled in the art, however,
that the embodiments may be practised without such details.
[0044] A common assumption in imaging systems is that the radiance emitted from a scene
is observed directly at the sensor. However, there are often physical layers or media
lying between the scene and the imaging system. For example, the lenses of vehicle
cameras, consumer digital cameras, or the front windows of security cameras often
accumulate various types of contaminants over time such as fingerprints, dust and
dirt. Also, the exposure of cameras to aggressive environments can cause defects in
the optical path, like stone chips, rifts or scratches at the camera lens. Artifacts
from a dirty camera lens are shown in Fig. 1.
[0045] These artifacts can be disturbing for users and can seriously impair the analysis
of the scene by automatic methods. For example, in the automotive area, the wrong
analysis of a scene may lead to the turning off or malfunction of security system
when they are needed in case of an accident, to false alarms, or to unwanted action
from the vehicle like an erroneous automatic break action. These artifacts can cause
potentially life-threatening situations.
[0046] One possible prevention measure against dirty lenses is to clean them at pre-determined
times. However, because many camera systems are automated and are not often inspected,
such as many automotive systems, an automatic way of detecting such artifacts is needed.
Similarly, an automatic way of detecting lens damage that cause image artifacts is
needed. A method that detects that there are issues on the lens can notify a human
that it needs attention or it can disable or notify the methods that follow it that
that particular part of the image is not usable.
[0047] In cases where it is not practical or commercially viable to clean or change the
camera, such as for applications like outdoor security cameras, underwater cameras
or automotive cameras or for videos captured in the past, a computational algorithm
according to the present specification may provide advantages by artificially removing
the artifacts caused by dirt or by a lightly damaged-lens, so that the methods analyzing
the images can operate properly.
[0048] Unlike image inpainting and hole-filling methods, an algorithm according to the present
specification makes use of a computational model for the process of image formation
to detect that the lens are dirty or directly recover the image information, in particular
those image points which are still partially visible in the captured images.
[0049] Artifacts caused by dirt and lens damage as well as artifacts caused by obstructions
can be described using an image formation model in which the scene radiance is reduced,
either by attenuation, in the case of lens dirt or light lens damage, or, in the case
of occluders, by obstruction. In general, attenuation tends to make the affected regions
darker. Because of camera defocus, this attenuation is smooth-varying and the high
frequencies in the original scene radiance are partially preserved in the degraded
images.
[0050] This can be seen in Fig. 1, where the edges of the background are still partially
visible on the degraded image.
[0051] The current specification discloses two types of image correction methods, which
make use of these observations. According to a first type of method a method a location
where the lens contains attenuation or occluding-type artifacts is detected. According
to a second type of method, the amount by which the images are attenuated at each
pixel is detected and s an estimate of the artifact-free image is obtained. The methods
use only the information measured from a sequence of images, which is obtained in
an automated way. They make use of temporal information but require only a small number
of frames to achieve a solution. The methods according to the present specification
do not require that the images are totally uncorrelated, but only that there is some
movement, as the one expected in, for example, a moving vehicle. The methods work
best when the statistics of the images being captured obeys a natural image statistics.
[0052] There are few methods on the literature that deal with this issue. The method SIGGRAPH
according to the reference "Removing Image Artifacts Due to Dirty Camera Lenses and
Thin Occluders", by J. Gu, R. Ramamoorthi, P.N. Belhumeur and S.K. Nayar, in ACM Transactions
on Graphics (Proceedings of SIGGRAPH Asia), Dec. 2009, attempts to detect and correct
the artifacts in image sequences but requires that the output of computing the mean
image and mean image derivative magnitude over time are mostly constant valued image-sized
arrays except where artifacts are located (see Fig. 6 b) and c) of the aforementioned
paper.
[0053] This means that these quantities are only usable after a very large amount of frames,
i.e., a long time, and that the scenes have to be very diverse and uncorrelated. While
the first condition imposes a long detection time, the latter one is typically not
applicable at all, since the scenes in most real-life scenarios have always about
the same type of content, e.g., a road below and sky above for automotive applications.
The authors themselves recognize that lighting is typically unevenly distributed and
propose a solution with inferior quality.
[0054] Image inpainting and hole-filling techniques assume that the location of the artifacts
are known and then replace the affected areas with a synthesized estimate obtained
from the neighboring regions. By contrast, a correction method according to the present
specification makes use of information of the original scene that is still partially
accessible to recover the original scene. In many cases, the result is more faithful
to the actual structure of the original unobserved image. In areas where the image
is totally obstructed, inpainting methods can be used.
[0055] The reference "Removal of Image Artifacts Due to Sensor Dust" by C. Zhou and S. Lin,
Association for Computing Machinery, Inc., June 2007 describes reducing the appearance
of dust in an image by first formulating a model of artifact formation due to sensor
dust and then using contextual information in the image and a color consistency constraint.
This method has a very narrow application range, i.e., the detection of dust particles,
and minimizes a non-convex function, which may be computationally intensive and instable.
[0056] Finally, some methods detect areas in the image that rarely contain high frequencies.
Although drops of water on the lens and obstructing dirt has this effect, attenuating
artifacts exhibit a transparency-type effect that lets a lot of high frequencies from
the scene go through. This means that such areas would not be detected.
[0057] Figs. 1 to 7 illustrate a method for detecting image attenuations according to a
first embodiment of the present specification. Figs. 8 to 18 illustrate a method for
correcting image contaminations according to a second embodiment of the present specification.
[0058] In the Figures 1 - 18, the pixel numbers in the x-direction are indicated on the
x-axis and the pixel numbers in y-direction are indicated on the y-axis. The image
format of the image in Figs. 1 - 18 is about 1280 x 800 pixels.
[0059] According to a first embodiment, a detection method is disclosed, which is suitable
for detecting if there is a disturbance in the optical path caused by attenuating
or obstructing elements. The model for describing attenuating or obstructing elements
is:

where I
f is the observed image with artifacts, the index "
f", which is also referred to as time index "t", is a frame index that numbers the
image frames in the order of their recording,
I0f is the original unobserved image and
B ∈ [0,1] is the attenuation mask, where 0 indicates total obstruction and 1, no obstruction.
The intensity "I" refers to luminance values, but similar processing can be done in
RGB or in other color spaces. Computing the horizontal derivative of the previous
equation leads to

wherein x and y are respective horizontal and vertical pixel indices and the pixels
are number consecutively in the vertical and the horizontal directions.
[0060] In cases where there is no change in the attenuation mask, i.e.,
B(
x+1
,y) ≅
B(x,y), this equation becomes

[0061] According to natural image statistics, pixel intensities vary very little between
consecutive pixels in most of the image, with very few exceptions. This is the principle
behind JPEG compression, which works by not transmitting the highfrequency components
of the image, i.e., the variations, for most of the image. Equivalently, many ill-posed
problems such as image restoration or other recovery methods impose that the lasso,
also known as "least absolute shrinkage and selection operator", or the
L1-norm of the image derivatives is minimized, which translates the observation that
most derivative values have about zero magnitude and only a few exceptions occur.
[0062] Considering that each pixel intensity value
I0f(x,y) is given by the addition of an idealized value and noise following a Normal
distribution

(0,σ
2) with zero mean and some variance, a typical model, the smooth variation of natural
images can be represented as

which implies that

meaning, among other things, that the derivatives of
If(
x,
y) have about the same amount of positive and negative values.
[0063] These considerations are used to detect that a variation exists in
B at a particular location and direction of the image, by counting the average amount
of positive minus negative transitions that occur. If there is a predominance of positive
or negative transitions by, say, 20%, there is a variation in the value of
B and, therefore, attenuation. This is done by first computing the transition type,

where T is a threshold. A threshold T = 0 is used in the experimental analysis to
obtain the Figures 2 and 3. The corrected Figures 9 and 15, the time averaged transition
magnitudes of Figs. 6 and 7 and the estimated blur masks of 13 and 18 have been obtained
with a moving camera and after applying the method for a few frames.
[0064] Then, an Infinite Impulse Response (IIR) filter in time is used to accumulate the
transitions,

where the superscript f and the subscript f indicate a frame number and
α is the feedback filter coefficient of the IIR filter.

may be initialized with 0.
[0065] One way of determining
α as a function of a number of frames F is given by determining a value of
α that makes a positive detection achieve a value of 0.95 after filtering with F frames,
which can be shown to be given by the expression

[0066] Using about F = 600 frames for update time, which is equivalent to about 20 seconds,
a value of
α = 0.99 is reached, which is used in the experimental analysis.

is computed in an analogous way as

and

[0067] The results of computing the arrays

and

for the image of Fig. 1, are shown in Figs. 2 and 3, respectively. In Figs. 2 and
3, a black colour signifies a negative transition, a white colour signifies a positive
transition and a grey colour signifies no transition.
[0068] In the particular application of detecting attenuations caused by dirty lenses, the
attenuation is out of focus. This means that B varies smoothly and a transition at
a particular pixel should be accompanied by similarly signed transitions at pixels
nearby.
[0069] To constrain the method so that it only finds smooth-varied attenuation maps,

and

are blurred with a circular
filter K whose coefficients add up to 1,

and

wherein * denotes the convolution, and the expressions left of the arrows refer to
the results of the convolution. The intensity values of the resulting smoothed out
arrays

and

are illustrated in Figs. 4 and 5, respectively, if the original image is given by
Fig. 1. Isolated black and white pixels and stripe shaped artifacts, which are still
present in Figs. 2 and 3, are suppressed or eliminated in Figs. 4 and 5, and the light
and dark regions are more contiguous and have smoother boundaries.
[0070] Herein, a "circular filter" refers to a filter that is circularly symmetric with
respect to the spatial dimensions x and y. A symmetric multivariate Gaussian filter
or a Mexican-hat shaped filter are examples for circular filters. Naturally, any filter
shape and type can be used, depending on image resolution and camera and filter properties.
[0071] Then the overall magnitude
Sf(x,y) of a transition at the pixel location (x,y) is computed as the Euclidean norm of
the individual magnitudes for the x- and y- directions:

and a transition exists if
sf(
x,y) ≥ T
2. In the experimental analysis of Fig. 7, a threshold
T2 = 0.2 is used. The computation of the sign, the addition for many pixels (in this
case) and a threshold is denoted in the robust statistics literature as the sign test.
Fig. 6 shows the intensities of the array
Sf(
x,y), and Fig. 7 shows the thresholded array
Sf(
x,y), when the recorded image is provided by Fig. 1.
[0072] Fig. 7 shows that the algorithm detects dirt regions but also other time independent
features with strongly varying intensities such as the lens border and the border
of the car from which the image is taken. Features like the car border and the lens
border are always present in the image and can be identified and masked out easily.
Conversely, the thresholding according to Fig. 7 can also be used to identify image
portions which are not affected by dirt, scratches and the like.
Second embodiment: correcting the attenuation
[0073] According to a second embodiment of the present specification, a method is disclosed
for determining an amount of attenuation and for obtaining an estimate of the artifact-free
image based on the determined amount of attenuation. This embodiment is illustrated
in the Figs. 8 - 18. Fig. 8 shows an image with an overlaid artificial contamination
with a blur mask that comprises the letters "t e s t". Fig. 9 shows a recovered image,
which is derived from the image of Fig. 8 according to the below mentioned image recovery
algorithm. Fig. 10 shows a pixel variation measure ξ
x in the x-direction of Fig. 8 and Fig. 11 shows a pixel variation measure ξ
y in the y-direction of Fig. 8. The computation of the pixel variation measure is explained
further below.
[0074] Fig. 12 shows the actual blur mask and Fig. 13 shows the estimated blur mask, which
is obtained by solving the below mentioned equation (19). The final result of Fig.
9 is obtained by solving the below mentioned equation (21).
[0075] Figs. 14 - 18 show the analogous results to Figs. 8 to 13 using the original image
and a real contamination instead of an artificial blur mask. Fig. 14 shows the original
image, Fig. 15 shows the corrected image using the below mentioned image correction
method. Fig. 16 shows a pixel variation measure ξ
x in the x-direction of Fig. 14 and Fig. 17 shows a pixel variation measure ξ
y in the y-direction of Fig. 14. Figure 18 shows an estimated blur mask or attenuation.
[0076] If natural image statistics holds and pixel intensities vary very little between
consecutive pixels in most of the image, with very few exceptions, the intensities
of neighboring pixels in the uncontaminated image are approximately equal,
I0f(
x + 1,
y) ≅
I0f(
x,y),
which means that a non-zero derivative at this pixel is caused by the influence of
the attenuation factor B. Thereby the derivative equation becomes, in the pixels where
this assumption holds,

[0077] The previous equation shows that, in locations where the image varies smoothly, the
quantity ξ
x(
x,y) depends only on
B, which is constant in time during the recording of the video. Therefore, ξ
x(
x,y) is also constant in time. If ξ
x(
x,y) is not constant, it is because the initial assumption that the image varies smoothly
is failing at that particular pixel and frame.
[0078] According to natural image statistics, this occurs rarely in natural images. The
method according to the current specification takes this into account by considering
these different values as outliers value of ξ
x(
x,y) with respect to time. To estimate the central value of ξ
x(
x,y), many techniques that deal with outliers can be used but arguably the best one is
the median, which is a robust statistic with a breakdown percentage of 50%.
[0079] The estimation of the median value of ξ
x(
x,y) according to the definition of the median requires storing many frames and then,
for each pixel, sorting the pixel intensity values at position (x,y) and obtaining
the central one. This is in general not a practicable solution. According to the present
specification an approximation to the median is computed instead, according to a method
which is described below.
Streaming median
[0080] Consider a one-dimensional sequence of numbers
p(
t),

and that we want to estimate an approximation of the median of all points observed
up to the last observation t, m(t). Then an approximation of the median can be calculated
according to the following method. A starting value
m(-1) of the median estimate m(t) is initialized with some value (e.g., zero) and then,
for each new observation
p(
t), compute

where Δ is a suitably chosen value and t is a time index, such as the frame index
f. This method does not require that all previous values of m are stored and does
only a comparison and an addition per point and observation, which is very efficient
from a computational and storage point of view. Also, as
t → ∞,
m(
t) → median({
p(0),...,
p(
t)}), or, in other words, the median estimate approaches the real value of the median
for sufficiently large values of t.
[0081] Concerning the value of Δ, if Δ is too small, m(t) will tend towards the real value
of the median too slowly. If Δ is too large, it will tend towards the value of the
real median quickly but will then oscilate too much.
[0082] Although a constant value of Δ, which was obtained experimentally, is used in the
exemplary embodiment of Figs. 8 - 18, a possible approach could consist of starting
with a large Δ for fast convergence and then, once m(t) stabilizes, a small Δ for
increased precision.
[0083] Other approximations of values that are obtained by analyzing a ranked set of data
can be obtained, too.
[0084] By making the upward and downward changes different, other approximations can be
achieved. For example, the first and third quartiles can be computed respectively
as:
[0085] First quartile:

and
[0086] Third quartile :

Estimating the attenuation factor B
[0087] The attenuation factor B is estimated using the previously calculated streaming median
method to estimate an approximation
ξ̂x(
x,y) of the median value of
ξx(
x,y) over time. Using the relationship

obtained before, it follows that

which indicates how values of B are related with each other. A similar derivation
shows for the vertical derivatives

where

is an estimate of the median of

[0088] By setting some values of B to 1 in locations where it is determined that there is
no attenuation or obstruction, a set of constraint equations for B(x, y) is obtained,

[0089] The pixel locations (x, y) may be obtained, for example, by using the detection method
according to the first embodiment.
[0090] The equations (15), (16) and (18) can be represented in matrix form through the equation

where b represents the array
B reshaped as a column vector with dimensions (#X × #
Y) × 1, wherein the vector b is formed by taking each row of B consecutively,
S is a sparse matrix of size
(#constraints) × (#
X × #
Y) and r is a column vector with dimensions
(#constraints) × 1.
[0091] The number of constraints "#constraints" is equal to the number of constraint equations
(15), (16) and (18). The number of constraints is approximately (#X-1)*#Y horizontal
constraints plus (#Y-1)*#X vertical constraints plus N constraints for N points in
which B is known.
[0092] The matrix S is obtained by writing the constraints of equations (15), (16) and (18)
into S. Each constraint is written into a row 1 of the sparse matrix S, wherein values
of S, which are not assigned a value, have a value of zero. In particular, the matrix
S can be stored efficiently in computer memory by only storing non-zero coefficients.
[0093] According to one embodiment, the matrix S is constructed as follows. The counter
variable 1 is initialized with value 1 and is incremented with every new constraint
that is added and the coordinate (x, y) traverses the pixel locations row by row starting
with (x, y) = (1, 1). If there is a constraint for (x, y) from equation (15), coordinate
(1, (y-1)*#X + x + 1) of S is set to 1, coordinate (1, (y-1)*#X + x) is set to

and coordinate (1, 1) or r is set to 0. After adding this constraint, 1 is incremented
by 1.
[0094] If there is a constraint for (x, y) from equation (16), coordinate (1, y*#X + x)
of S is set to 1, coordinate (1, (y-1)*#X + x) of S is set to

and coordinate (1, 1) or r is set to 0. After adding this constraint, 1 is incremented
by 1. If there is a constraint for (x, y) from equation (18) the coordinate (1, (y-1)*#X
+ x) of S is set to 1 and the coordinate (1, 1) of r is set to 1, and 1 is incremented
by 1. Then, (x, y) is set to the next value and the procedure is repeated.
[0095] The resulting Equation (19) may in general be overdetermined and is not solved directly.
Instead, both sides of the equation (19) are multiplied by
ST from the left, thereby obtaining a symmetric matrix
STS: STSb = STr. This equation is also known as a normal equation in the context of a least squares
approximation. The normal equation is solved approximately with an iterative method,
thereby obtaining the vector b. For example, the iterative method may be provided
by a least square solver, such as the conjugate gradient method, which approximates
the vector b that minimizes the quadratic form

2
[0096] The array B is obtained from the column vector b by reshaping the vector b back into
array form. The unobserved image is estimated simply by dividing each pixel of the
observed image with the estimated B for that pixel,

for pixels (x, y) with 0 <
B(
x,
y) < 1. Thereby, an attenuation B(x, y) can be compensated.
[0097] According to a modified embodiment, constraint equations of equation (18) that are
not needed are identified and are not included into the matrix S. For example, an
algorithm may identify boundary regions of the artifacts and exclude points (x, y)
outside the boundary regions from the equations (18) and from the vector b. Or, conversely,
an algorithm may be used to identify interior regions with no attenuation, B = 1 and
exclude the points of the interior regions from the equations (18) and from the vector
b.
[0098] Preferably, at least one constraint equation (18) is provided for each row of the
image frames and, if present, for each column of the image frames. Thereby, the one
or more known attenuation values B(x, y) can be used to find the attenuation using
equations (15) and (16) in the pixel locations in which the attenuation is not known
beforehand.
[0099] Fig. 19 shows, by way of example, an image defect correction system 10 according
to the present application. A sensor surface 12 of a video camera is connected to
an image capture unit 13 which is connected to a video buffer 14. An artifact detection
unit 15 and an artifact correction unit 16 are connected to the video buffer 14. A
display 17 is connected to the artifact correction unit 16. The dashed error indicates
an optional use of an output of the artifact detection unit 15 as input for the artifact
correction unit 16.
[0100] Furthermore, an image evaluation unit 19 is connected to the artifact correction
unit 16. Various driver assistance units such as a brake assistant unit 20, a parking
assistant unit 21 and a traffic sign detection unit 22 are connected to the image
evaluation unit 19. The display 18 is connected to the units 20, 21, 22 for displaying
output data of the units 20, 21 and 22.
[0101] The artifact detection unit 15 is operative to execute an artifact detection according
to the first embodiment of the present specification and the artifact correction unit
16 is operative to execute an artifact correction method according to the second embodiment
of the present specification, for example by providing a computing means such as a
microprocessor, an integrated circuit, an ASIC, a computer readable memory for storing
data and computer executable code etc.
[0102] Although the above description contains much specificity, these should not be construed
as limiting the scope of the embodiments but merely providing illustration of the
foreseeable embodiments. Especially the above stated advantages of the embodiments
should not be construed as limiting the scope of the embodiments but merely to explain
possible achievements if the described embodiments are put into practise. Thus, the
scope of the embodiments should be determined by the claims and their equivalents,
rather than by the examples given.
[0103] Among others, the pixel matrix may be traversed column-wise instead of row by row
and the direction of traversing the matrix may be reversed. The constraint equation
for the attenuation may be expressed in terms of the preceding pixel (x, y-1) or (x-1,
y) instead of being expressed in terms of the next pixel (x, y + 1) or (x + 1, y).
In this case, there is no constraint equation for the first column or for the first
row, respectively.
1. A method for detecting image artifacts, comprising
- receiving image data from a vehicle camera, the image data comprising image frames,
- comparing intensity differences between neighbouring pixels in a first direction
with a pre-determined upper threshold and with a pre-determined lower threshold,
- setting a pixel transition value to a first value when the intensity difference
is greater than the pre-determined upper threshold, setting the pixel transition value
to a second value when the intensity difference is less than the pre-determined lower
threshold and setting the pixel transition value to zero when the intensity difference
lies between the pre-determined upper threshold and the pre-determined lower threshold,
- computing accumulated pixel transition values of the pixel transition values by
applying a low pass filter with respect to a frame index,
- computing smoothed pixel transition values by applying a spatial filter with respect
to pixel locations,
- computing a magnitude value of the smoothed and accumulated pixel transition values
for the pixel locations of the image frame,
- identifying pixels of potential artifact regions by comparing the magnitude value
with a predetermined detection threshold.
2. The method of claim 1, comprising
- comparing intensity differences between neighbouring pixels in a second direction
with the pre-determined upper threshold and with the pre-determined lower threshold,
- setting the pixel transition value to the first value when the intensity difference
is greater than the pre-determined upper threshold, setting the pixel transition value
to the second value when the intensity difference is less than the pre-determined
lower threshold and setting the pixel transition value to zero when the intensity
difference lies between the pre-determined upper threshold and the pre-determined
lower threshold,
3. The method of claim 1 or claim 2, the computation of output values of the low pass
filter comprises computing an accumulated value at a frame index f for the respective
first or second direction as a weighted sum of the accumulated value at the earlier
frame index f - 1 and the smoothed pixel transition value at the earlier at frame
index f.
4. The method of claim 3, wherein a weight factor of the accumulated value at the earlier
frame index f - 1 is at least 0.95.
5. A method for correcting image artifacts, comprising
- receiving image data from a vehicle camera, the image data comprising image frames,
- computing pixel quotient values in a first direction, wherein the pixel quotient
values are derived from a quotient, the numerator of the quotient comprising an image
intensity at a given pixel location and the denominator comprising an image intensity
at a neighbouring pixel in the first direction,
- computing median values of the pixel quotient values with respect to time,
- computing attenuation factors of pixel locations as an approximate solution to a
system of linear equations in the attenuation factor, the system of linear equations
comprising a first set of linear equations, in which the median values appear as pre-factor
of the attenuation factors, and a second set of linear equations, which determine
values of the attenuation factors at respective pixel locations,
- deriving corrected pixel intensities by dividing the observed pixel intensities
by the corresponding attenuation factors B(x, y) of the respective pixel locations.
6. The method according to claim 5, comprising computing pixel quotient values in a second
direction, wherein the pixel quotient values are derived from a quotient, the numerator
of the quotient comprising an image intensity at a given pixel location and the denominator
comprising an image intensity at a neighbouring pixel in the second direction, wherein
the median values are computed for the pixel quotient values in the first direction
and for the pixel quotient values in the second direction.
7. The method according to claim 5, comprising using the method according to one of the
claims 1 to 4 for determining the second set of linear equations.
8. The method according to one of the claims 4 to 6,
wherein the median value is computed as a streaming median value, the streaming median
value being derived from a median value estimate for the previous frame index f -
1 and the pixel quotient value at frame index f.
9. The method according to claim 8, wherein the streaming median value of the current
frame index is computed by adding a pre-determined value delta to the previous estimate
if the pixel quotient value is greater than the previous streaming median value, by
subtracting the pre-determined value delta if the pixel quotient value is less than
the previous streaming median value, and by setting the current streaming median value
to the previous streaming median value otherwise.
10. The method according to claim 9, wherein the pre-factors of the attenuation factors
are defined by a constraint matrix, the method comprising multiplying the system of
linear equations with the transposed constraint matrix and solving the resulting system
of linear equations using an iterative method.
11. A computer program product for executing a method according to one of the claims 1
to 10.
12. A computation unit for detecting image artifacts,
the computation unit comprising
an input connection for receiving image data,
an output connection for outputting locations of detected pixels,
wherein the computation unit is operative to
- compare intensity differences between neighbouring pixels in a first direction with
a pre-determined upper threshold and with a pre-determined lower threshold,
- set a pixel transition value to a first value when the intensity difference is greater
than the pre-determined upper threshold, setting the pixel transition value to a second
value when the intensity difference is less than the pre-determined lower threshold
and setting the pixel transition value to zero when the intensity difference lies
between the pre-determined upper threshold and the pre-determined lower threshold,
- compute accumulated pixel transition values of the pixel transition values by applying
a low pass filter with respect to a frame index,
- compute smoothed pixel transition values by applying a spatial with respect to pixel
locations,
- compute a magnitude value of the smoothed and accumulated pixel transition values
for the pixel locations of the image frame,
- identify pixels of potential artifact regions by comparing the magnitude value with
a predetermined detection threshold.
13. A vehicle camera with the computation unit according to claim 12, wherein the vehicle
camera is connected to the input connection of the computation unit.
14. A computation unit for correcting image artifacts,
the computation unit comprising
an input connection for receiving image data,
an output connection for outputting corrected image frames, wherein the computation
unit is operative to
- compute pixel quotient values in a first direction,
wherein the pixel quotient values are derived from a quotient, the numerator of the
quotient comprising an image intensity at a given pixel location and the denominator
comprising an image intensity at a neighbouring pixel in the first direction,
- computing median values of the pixel quotient values with respect to time,
- compute attenuation factors of pixel locations as an approximate solution to a system
of linear equations in the attenuation factor, the system of linear equations comprising
a first set of linear equations, in which the median values appear as pre-factor of
the attenuation factors, and a second set of linear equations, which determine values
of the attenuation factors at respective pixel locations,
- derive corrected pixel intensities by dividing the observed pixel intensities by
the corresponding attenuation factors B(x, y) of the respective pixel locations.
15. A vehicle camera with the computation unit according to claim 14, wherein the vehicle
camera is connected to the input connection of the computation unit.