[0001] This invention relates to an image processing method, to image processing apparatus
configured to operate the method and to a computer readable medium comprising a computer
program for performing the method. It is particularly but not exclusively applicable
to the presentation to an observer of an image in which potential regions of interest
in the image are selected and prioritised.
[0002] An operator of remotely-controlled equipment often is presented with an image as
seen by the equipment or by some other remote image source. The operator is required
to interpret the image, and then to direct the remotely-controlled equipment to respond
appropriately. As an example, the remotely-controlled equipment may be an unmanned
aerial vehicle (UAV) which carries a video camera whose images are transmitted to
an operator on the surface or in another aircraft. The operator is required to interpret
the images and to determine which of the possible features of interest revealed in
the images (hereinafter for convenience "targets") should be engaged or further investigated
by the UAV. Because the images may show many possible targets, it is important that
the operator correctly identifies the most important ones.
[0003] Generally a potential target will present itself in the image as being of contrasting
intensity (brighter or darker) relative to its surroundings. Thus any contrasting
area is potentially of interest, and the present invention is directed, at least in
its preferred embodiments, to assessing and prioritising such areas of the image,
so that the most target - like are presented preferentially to the operator.
[0004] In one aspect, the invention provides an image processing method in. which features
of interest are prioritised for presentation to an observer, comprising selecting
features in the image according to total contrast (as herein defined), sorting the
selected features according to size and ranking the sorted features according to their
approximation to a preferred size.
[0005] By "total contrast" we mean the total relative intensity of a feature of interest
(hereinafter the FOI) relative to the surrounding parts of the image. It will be appreciated
that the total relative intensity is a function of the local intensity from point
to point summed over the area of the FOI. Thus an intense but small FOI can have the
same total contrast as a larger but less intense one.
[0006] The method may comprise filtering the image so as to emphasize local changes of contrast
therein, and selecting features of interest according to the local magnitude of the
filter response.
[0007] In another aspect, the invention provides an image-processing method comprising convolving
a selected FOI within the image with a mask of a first size, repeating the convolution
with a mask of a second size, and calculating the ratio of the convolution results,
as an indication of the size of the FOI.
[0008] By assessing the feature of interest according to its size, it is possible to recognise
those which are of size appropriate to the targets being sought. For example, it is
possible to distinguish a vehicle-sized target from a much smaller but more intense
feature such as a small fire or a decoy flare, which could provide a return in the
processed image comparable to the lower-intensity but larger return provided by a
vehicle.
[0009] The method may comprise comparing the ratio with a preferred range of ratio values
and assigning to the FOI a score which indicates the closeness of the ratio to a value
associated with a preferred size of FOI.
[0010] The convolution preferably is one which is linearly responsive to contrast changes.
[0011] The convolution may be Laplacian of Gaussian. Alternatively, it may be Difference
of Gaussian or Determinant of Hessian.
[0012] The masks may be squares having sides defined by odd numbers of pixels of the image.
Alternatively they could be other shapes, for example elongated rectangles if the
targets of interest are long and thin.
[0013] The method may be repeated for at least one further selected FOI, and the FOIs may
be ranked according to their ratios or their assigned scores.
[0014] The method may comprise filtering an input image and selecting as a FOI a region
for which the filter response is a local maximum.
[0015] Also it may comprise selecting as a single FOI two said local maxima which fall within
a template representative of the preferred size of FOI.
[0016] The input image may be a band-pass filtered image. The band-pass filtered image may
be produced by convolving an input image with a Gaussian blur and a high-pass filter.
[0017] The method may comprise identifying as a said local maximum a pixel of the filtered
image which has a filter response greater than that of any other pixel which is contiguous
with the pixel.
[0018] The identified local maxima may be ranked in order of filter response magnitude.
[0019] There may be considered for identification as a said local maximum only pixels having
a filter response exceeding a threshold value.
[0020] The threshold value may be adjusted so as to limit the number of pixels considered
for identification as said local maxima.
[0021] Those local maxima whose filter responses exceed a second higher threshold value
may be ranked as equal.
[0022] The second threshold value may be adjusted so that the number of local maxima whose
filter response exceed that threshold value tends towards a predetermined number.
[0023] A said threshold value may be adjusted by means of an alpha filter.
[0024] A further aspect of the invention provides image processing apparatus configured
to operate a method as set forth above.
[0025] Preferably the prioritised FOIs are highlighted by superimposing them on an image
displayed to the operator.
[0026] The invention also provides a computer-readable medium comprising a computer program
which when installed and operated performs a method as set forth above.
[0027] The invention now will be described merely by way of example with reference to the
accompanying drawings, wherein;
Figure 1 illustrates a typical operational scenario of a UAV,
Figure 2 shows the key algorithmic blocks of an embodiment of the invention,
Figure 3, 4, 5 and 6 illustrate the operation of parts of the algorithm,
Figures 7 and 9 show the logic employed in parts of the algorithm, and
Figure 8 illustrates typical Laplacian of Gaussian components,
[0028] Referring to Figure 1, a UAV 10 is shown over-flying terrain containing a gun emplacement
12, a building 14, an armoured combat vehicle (tank) 16 and a small but intense fire
18.
[0029] As known per se, the UAV has an on-board camera with a field of view 22 which transmits
images back to a ground station, whereat the images are processed by a computer 24
and displayed on a display 26 for interpretation by an operator, who controls the
UAV via a keyboard 28 which typically also includes a joystick.
[0030] Depending on his interpretation of the received images, the operator may (for example)
instruct the UAV to examine one or more of the targets more closely, or to engage
it, or to take no action. The four illustrated targets in practice may be only a small
proportion of a much larger number, and the computer 24 contains image processing
software according to the invention to sort and prioritise the features of interest
seen by the UAV's camera before the images are presented to the operator.
[0031] Referring to Figure 2, the image processing routine in this preferred embodiment
comprises an initial selection 30 of potential points of interest (image features)
followed by a two-stage scoring procedure 32, 34 which results in the display to the
operator of target-like regions of interest, here the gun emplacement and tank 12,
16.
[0032] The initial selection of points of interest is achieved by convolving the received
image with a band-pass filter comprising a Gaussian blur (which provides a band limit)
and a high-pass filter. This will result in a filter response in which local changes
in contrast in the original image are identified and emphasised. Put alternatively,
the band-pass filter acts as a differentiator and provides an indication of the rate
of change of image intensity across the image field. Figure 3 illustrates the filter
response, in which there are a number of potential features of interest represented
by local maxima in the filter response, the three largest being 36, 37 and 38. In
practice there likely will be many such peaks at this stage of the image processing
routine, and the majority of them are discarded in order to focus only on the significant
ones, thereby reducing the data-processing load on the computer 24.
[0033] Each response therefore is compared to a threshold value 40 so that the majority
of responses are rejected immediately, as shown in Figure 3. The response threshold
40 is adjusted based on the number of responses exceeding the threshold in each image
frame, so as to select a manageable number. In Figure 3, only the three prominent
responses 36, 37, 38 would be selected by adjusting the threshold 40.
[0034] Each surviving response is then assessed by comparing the pixels forming the image
feature with their neighbours. Each pixel 42 (figure 4) is compared with its eight
immediate neighbours 44, and only local maxima are accepted. A local maximum is a
pixel in which the filter response is greater than that in any of the immediately-neighbouring
pixels
[0035] The remaining image features are deemed features of interest (FOI) and these are
sorted by response strength, to create an ordered list. Though this step is computationally
intensive, the overall processor load is reduced as a result of the preceding steps.
Local clusters of FOIs are grouped, since a fragmented image feature may generate
multiple FOIs. Each FOI is examined in turn, beginning with the strongest, and any
FOI within a predetermined pixel distance of another, calculated according to the
size of target being sought is absorbed. For example if the preferred target size
is 3x3 pixels, all local maxima falling within a 3x3 template are deemed to be from
the same target. Put alternatively, in this case, any local maxima lying within two
pixels of each other are assumed to be from the same target. The magnitude of the
strongest response from each group of pixels is maintained. Two grouping examples
are illustrated in Figures 5a and 5b. Pixels 46, 48 stand out from their neighbours
and are one pixel apart diagonally. They fall within a 3x3 template and thus would
be rendered as a single FOI if 3x3 is the desired target size. The position of the
FOI is determined by the average x and y coordinates of the two pixels. Pixels 50,
52 are two pixels that are further apart diagonally and thus give rise to two distinct
FOIs if the desired target size is 3x3. If the preferred target size template is 5x5
pixels however, they would be assimilated as a single FOI.
[0036] The FOIs thereby identified are allocated a score indicating their priority. Maximum
and minimum scores, S
max and S
min respectively, define the range of scores. The scoring takes place in two steps:
- 1. Unrefined scores are produced, based on the band-pass filter convolution response.
- 2. Scores are refined using a ratio of Laplacian of Gaussian convolution responses.
[0037] In the first stage of the prioritization process, each response, R, is compared to
an upper and further lower threshold, T
u and T
I respectively. The lower threshold T
I is additional to the threshold 40. Scores, S, are assigned on a linear scale based
on the response of each FOI relative to the threshold values. FOIs with responses
that fall below the lower threshold T
I are discarded, while FOIs with responses above the upper threshold T
u are capped at the maximum score, S
max. All responses that fall between the upper and lower thresholds are scored as follows:

[0038] The upper and lower thresholds are adjusted using an alpha filter, with parameters
dependent on the number of FOIs exceeding each threshold. The generic alpha filter,
for
k-1, k>0, desired responses exceeding threshold T, is given by

where
Rk is the
kth ranked response. The value of α can be increased to force the threshold to adapt
more quickly. The averaging nature of the mechanism ensures that the scoring scheme
adapts to persistent changes in the image scene without reacting unnecessarily to
individual frame fluctuations. This ensures that scores are consistent across image
frames, but adaptive to differing scenarios.
[0039] The aim is to adjust the upper and lower thresholds to control the number of FOIs
and the associated scores. For the upper threshold, T
u, (figure 6) a large adjustment is made when the number of FOI responses exceeding
the threshold is greater than three or equal to zero, a small adjustment is made when
the threshold is exceeded by one or two FOI responses, and no change occurs when exactly
three FOI responses are above the threshold, illustrated in Figure 7. In general this
results in two or three FOIs being given a maximum score.
[0040] Similarly, the lower threshold is adjusted to allow a set number of FOIs (a number
suited to image size, to avoid cluttering of the image) to achieve at least the minimum
score,
Smin, with all FOIs scoring less than
Smin, being discarded.
[0041] The second stage of the prioritization mechanism, score refinement, is achieved by
convolution with a pair of Laplacian of Gaussian masks of different sizes. Figure
8 illustrates the x and y components of a typical Laplacian of Gaussian mask; light
areas are high values and the darkest areas are zero. This representation indicates
that a strong response would be expected when the x component (to the left in Figure
8) is convolved with an area of the image containing a vertical edge, while the y
component responds to horizontal edges. A combined strong response from each component
is indicative of the presence of a blob in the image.
[0042] Convolving each mask with all image pixels would be time consuming, but at this stage,
the masks need only be convolved at each of the already-identified FOIs. The principle
behind the use of the mask pairs is to reduce the impact of contrast, placing the
emphasis on size-matching. Many filters have the disadvantage of providing similar
responses for small image features of high contrast and larger image features of moderate
contrast. In this embodiment of the invention, however by inspecting the ratio of
the two responses from the Laplacian of Gaussian mask pairs, the effect of contrast
is reduced. Table 1 lists the theoretical responses of 9x9 pixel and 15x15 pixel masks
for various image feature sizes, each a white square on a black background, where
C is the contrast of the image feature. With images as received in practice, the resulting
ratios are less distinct, but the method nevertheless can provide an efficient means
of eliminating false alarms and aiding prioritization.
Table 1: Laplacian of Gaussian responses for ideal image features
| Size (pixels) |
9x9 |
15x15 |
| Sigma |
1.2 |
2.0 |
| Image Feature Size |
|
|
| 1x1 |
0.221C |
0.080C |
| 3x3 |
0.764C |
0.511C |
| 5x5 |
0.350C |
0.736C |
| 7x7 |
0.056C |
0.557C |
| 9x9 |
0.004C |
0.272C |
| 11x11 |
0.004C |
0.095C |
[0043] The scoring refinement uses the calculated Laplacian of Gaussian ratio to adjust
the FOI score. Using a set of test sequences with marked image feature locations,
test data was collected to find a distribution of and expected ratio for the Laplacian
responses. FOIs with ratios that closely match the expectation are given a large score
increase while FOIs with ratios much further from the expectation are given a score
reduction. For example, with reference to table 1, when seeking image features of
size 3x3, the Laplacian of Gaussian (LoG) ratio for image features of size 1x1 (here
0.221/0.08 = 2.76) would likely be significantly larger than the desired ratio for
a 3x3 image (0.764/0.511=1.50), resulting in a score decrease. Similarly, it can be
seen that features larger than 3x3 pixels yield LoG ratios significantly smaller than
1.5, Thus by reducing the scores of FOIs having LoG ratios outside a preferred range,
e.g. 1 to 2 when seeking targets of size 3x3, emphasis can be given to FOIs of the
preferred size. The range is initially based on the theoretical values for a white
square on a black background, and can be refined using real data. The revised scoring
is maintained within the interval
[Smin, Smax] by capping scores at
Smax and discarding any FOIs that now score less than
Smín. The result is a prioritized list of FOIs that favours image features of the expected
size. The refinement logic is illustrated in Figure 9. In operational embodiments
of the invention, these ratios, and corresponding ratios for other image sizes which
the operator may select as of interest, are embodied in look-up tables in a database
46 (figure 1) which are accessed according to the desired target size inputted by
the operator.
[0044] The final stage of the algorithm involves the selection of features of interest 48,
Figure 2, based on the refined score. The result is a list of contrasting features
of interest selected according to size, which can be determined by a single parameter
inputted by the operator. These FOIs can conveniently be displayed to the operator
by superimposing them on the original displayed image with on-screen tags so that
those of highest interest are immediately evident.
[0045] Considering the theoretical basis of the invention further, it is a characteristic
of scale-invariant feature transform (SIFT) operations that the response of a filter
is sensitive to both size and contrast, where the band-pass of the filter is selected
to give a peak response for objects of a chosen size. Problems can occur when an object,
which is not of the chosen in size, has high contrast such that a filter develops
a significant response which exceeds the response of a lower contrast object of larger
size. As a precursor to developing any normalization process it is necessary to examine
filter responses for ideal target objects, in this case a uniform intensity square.
A range of scale factors σ need to be considered; these can be selected using the
rule relating the side L of the square and the peak response of a Laplacian SIFT operator:

and so for object sizes of 3x3, 9x9 and 17x17 pixels, the scale factors are 1.2,
3.6 and 6.8 respectively.
[0046] Consider the response of these SIFT filters to a uniform intensity square target
of contrast C, on a bland background. The effect of the filter is to eliminate the
background response, responding only to the contrast signal of the target. Integrating
the filter impulse response function over the entire 2D space results in zero output;
the target signal can be thought of as a background signal to which a set of target
contrast signals has been added, which generates the filter response. The following
table 2 (being similar to table 1, but more detailed) shows the variation in filter
response as a function of scale factor and target size:
Table 2. Comparison of SIFT Filter Outputs
| Target Size |
Mask Size = 9x9 |
Mask Size = 27x27 |
Mask Size = 51 x51 |
| (NxN pixels) |
σ1=1.2 |
σ2 = 3.6 |
σ3 = 6.8 |
| 1 |
0.2210C |
0.0246C |
0.0069C |
| 3 |
0.7638C |
0.1994C |
0.0602C |
| 5 |
0.3510C |
0.4512C |
0.1578C |
| 7 |
0.0556C |
0.6529C |
0.2839C |
| 9 |
0.0038C |
0.7250C |
0.4186C |
| 11 |
0.0038C |
0.6652C |
0.5426C |
| 13 |
. |
0.5245C |
0.6400C |
| 15 |
. |
0.3640C |
0.7008C |
| 17 |
. |
0.2260C |
0.7216C |
| 19 |
. |
0.1268C |
0.7047C |
| 21 |
. |
0.0648C |
0.6569C |
| 23 |
. |
0.0303C |
0.5874C |
| 25 |
. |
0.0130C |
0.5057C |
| 27 |
. |
0.0051C |
0.4204C |
[0047] It can be seen that by considering the ratio of the responses of two different mask
sizes for a given desired target size, the contrast C is eliminated.
[0048] Suppose that we define three categories of target:
| Small |
- 3x3 pixels |
| Medium |
- 9x9 pixels |
| Large |
- 17x17 pixels |
and let the filter response at a pixel to the k-th scale factor be R(σ
k). Based on Table 2 a possible scheme for classifying object responses is:

[0049] The operation of the embodiment of the invention herein specifically described can
be summarised as follows:
- A band-pass filter is convolved with an input image
- Significant image features are identified, these being pixels whose filter responses
are local maxima, i.e. each one exceeds that of its eight neighbours and a threshold
value.
- The image features are considered to be features of interest (FOIs), and are sorted
by response.
- Local clusters of pixels are grouped as a single FOI.
- Unrefined scores are based linearly on FOI response values relative to lower and upper
thresholds.
- Scores are refined using the ratio of two Laplacian of Gaussian convolutions.
- A set of prioritized FOIs is sorted according to score.
[0050] The invention comprises any novel feature or combination of features herein described,
whether or not specifically claimed. The appended abstract is repeated here as part
of the specification.
[0051] An image-processing method comprises convolving a selected feature of interest (FOI)
within the image with a mask of a first size, repeating the convolution with a mask
of a second size, and calculating the ratio of the convolution responses, as an indication
of the size of the FOI. Preferably the convolution masks are Laplacian of Gaussian.
The method can be useful for prioritising potential targets in a field of view for
presentation to an operator
1. An image-processing method comprising convolving a selected feature of interest (hereinafter
FOI) within the image with a mask of a first size, repeating the convolution with
a mask of a second size, and calculating the ratio of the convolution results, as
an indication of the size of the FOI.
2. The method of claim 1, comprising comparing the ratio with a preferred range of ratio
values and assigning to the FOI a score which indicates the closeness of the ratio
to a value associated with a preferred size of FOI.
3. The method of claim 1 or claim 2, wherein the convolution is Laplacian of Gaussian.
4. The method of any of claims 1 to 3, wherein the masks are squares having sides defined
by odd numbers of pixels of the image.
5. The method of any preceding claim, comprising repeating the method for at least one
further selected FOI, and ranking the FOIs according to their ratios or their assigned
scores.
6. The method of claim 5, comprising filtering an input image and selecting as a FOI
a region for which the filter response is a local maximum.
7. The method of claims 2 and 6, comprising selecting as a single FOI two said local
maxima which fall within a template representative of the preferred size of FOI.
8. The method of claim 6 or 7, wherein the input image is filtered by convolving the
input image with a Gaussian blur and a high-pass filter.
9. The method of any of claims 6 to 8, comprising ranking the identified local maxima
in order of filter response magnitude.
10. The method of claim 9, comprising considering for identification as a said local maximum
only pixels having a filter response exceeding a threshold value, and adjusting the
threshold value so as to limit the number of pixels considered for identification
as said local maxima.
11. The method of claim 10, comprising ranking as equal those local maxima whose filter
responses exceed a second higher threshold value.
12. The method of claim 11, comprising adjusting the second threshold value so that the
number of local maxima whose filter response exceed that threshold value tends towards
a predetermined number.
13. The method of any of claims 10 to 12, comprising adjusting a said threshold by means
of an alpha filter.
14. Image processing apparatus configured to operate the image processing method of any
preceding claim.
15. A computer-readable medium comprising a computer program which when installed and
operated performs the method of any of claims 1 to 13.