Technical Field
[0001] The invention relates to the technical field of deep learning, in particular to a
horse speed calculation system and method based on deep learning.
Technical Background
[0002] Equestrian is a sport that integrates exercise, fitness and leisure, and is loved
by increasingly more people in recent years. Speed horse racing based on equestrian
has developed into a competition event, which uses speed and riding skills to win
and challenges the overall level of a rider. At present, in predicting the competition
ranking of horses, it is generally based on naked eyes to observe the state of the
horses (including fatigue level, sweating amount and movement) to make predictions.
There is also a lack of more accurate scientific and intelligent means.
Summary of the Invention
[0003] In view of the shortcomings of the prior art, the present invention aims to provide
a horse speed calculation system and method based on deep learning, which realizes
the use of artificial intelligence technology to observe horses and calculate horse
speeds in a more scientific way.
[0004] In order to achieve the above objectives, the present invention adopts the following
technical solutions:
A method for a horse speed calculation system based on deep learning, comprising the
following steps:
S1. before a start of a competition, horses participating in the competition walk
around a competition field, a camera shoots a video of a target horse walking around
the competition field and sends the video to an optical flow field calculation module;
during a video shooting, the camera rotates to maintain the target horse within a
shooting range;
S2. performing an image extraction from the video obtained by the camera and calculating
an optical flow field between consecutive images:
the optical flow field calculation module decodes the video, input from the camera,
into single frames, and extracts an image every set number of frames N, that is, two
consecutive images are separated by N frames, and calculates the optical flow field
between two consecutive images;
S3. performing an object artificial intelligence detection on the video:
an object artificial intelligence detection module detects objects that appear in
the video, and saves a position and a size of each of the detected objects, comprising
a position and a size of the target horse; a position of an object is a position of
its center point, and a size comprises a length and a width;
S4. an optical flow field filtering module filters out all moving objects from the
optical flow field obtained in step S2 according to the objects detected in step S3;
S5. a camera speed calculation module calculates a direction and a speed of the camera
according to the filtered optical flow field in step S4;
S6. calculating a speed of the target horse between two consecutive images:
according to the position of the target horse detected by the object artificial intelligence
detection module, a displacement of the target horse between two consecutive images
is calculated by subtracting a position of a previous image from a position of the
target horse in a current image, a unit of the displacement is pixel/N frames; accordingly,
a target horse speed calculation module calculates a final camera speed in pixels/N
frames, that is, a speed Hp of the target horse is:

ω is an angular velocity of the camera, unit is pixel/N frames, d is a position of a center of the target horse in the previous image, and d is a position of a center of the target horse in the current image;
S7. an output module averages all speeds of the target horse between two consecutive
images obtained by calculation to obtain an average speed of the target horse.
[0005] Further, in step S2, the number of frames is set to 5 frames.
[0006] Further, in step S2, a method using Farneback is used to calculate the optical flow
field.
[0007] Further, in step S3, the object artificial intelligence detection module uses a YOLOv3
network to implement an artificial intelligence detection of the objects.
[0008] Further, in step S6, a unit conversion is performed on
Hp to obtain the target horse speed in unit of horse length/second;
first, calculating the target horse speed Hp in unit of pixel per second through H p :

fps is a number of frames per second of the video;
then, converting pixels to a horse length, and calculating the target horse speed
Vt in the unit of horse length/second;

Ḣp is the speed of the target horse in unit of pixels per second; Pixels is the number of pixels of the horse length, unit is pixel/horse length.
[0009] Even further, for a length of the target horse detected by the object artificial
intelligence detection module, using an abnormal state detection method RANSAC to
review the detected horse length; assuming that the target horse walks normally within
a short period of time after the start of the video, as new data points appear, a
new regression model is updated based on this set of sampled data; checking the new
data points against the new regression model, and against high-quality data points
of an existing regression model, any abnormal data values will be removed and replaced
with a known effective length.
[0010] The present invention further provides a horse speed calculation system based on
deep learning, comprising:
a camera: the camera is provided in a competition field, and is used to shoot a video
of a target horse walking around the competition field before a competition; during
a shooting process, the camera rotates to keep the target horse in a shooting range;
an optical flow field calculation module: used to extract images from the video obtained
by the camera and calculate an optical flow field between consecutive images;
an object artificial intelligence detection module: used to use an artificial intelligence
to detect and save a position and a size of an object in the video;
an optical flow field filter module: used to filter the optical flow field calculated
by the optical flow field calculation module according to a detection result of the
object artificial intelligence detection module to filter out all moving objects;
a camera speed calculation module: used to calculate a direction and a speed of the
camera using the optical flow field filtered by the optical flow field filter module;
a target horse speed calculation module: used to calculate, according to the position
of the target horse detected by the object artificial intelligence detection module,
a displacement of the target horse between two consecutive images by subtracting a
position of a previous image from a position of the target horse in a current image,
and use the displacement of the horse to adjust the speed of the camera between two
consecutive images calculated by the camera speed calculation module to obtain a speed
Hp of the target horse, an adjustment formula is Hp = ω - (ḋ - d), ω is an angular velocity of the camera, unit is pixel/N frames, d is a position of a center of the target horse in the previous image, and d is a position of a center of the target horse in the current image;
an output module is used to average all speeds of the target horse between two consecutive
images to obtain and output an average speed of the target horse.
[0011] The beneficial effect of the present invention is: the present invention realizes
the use of artificial intelligence technology to observe horses and calculate the
speed of the horses in a more scientific way, so that the competition ranking of the
horses can be effectively predicted.
Description of the Figures
[0012]
Figure 1 is an illustrative diagram of a network architecture of YOLOv3;
Figure 2 is an illustrative diagram of an influence on the histogram of an optical
flow field when optical flow vectors are removed from the moving objects;
Figure 3 is an exemplary diagram showing different examples of horse lengths in different
postures;
Figure 4 is an illustrative diagram of test results of an embodiment of the present
invention.
Detailed Description
[0013] The present invention will be further described below in conjunction with the accompanying
figures. It should be noted that these embodiments are based on the present technical
solution to provide detailed implementation and specific operation procedures, but
the scope of protection of the present invention is not limited to these embodiments.
Embodiment 1
[0014] This embodiment provides a method for a horse speed calculation system based on deep
learning, comprising the following steps:
S1. before a start of a competition, horses participating in the competition walk
around a competition field, a camera shoots a video of a target horse walking around
the competition field and sends the video to an optical flow field calculation module;
during a video shooting, the camera rotates to maintain the target horse within a
shooting range;
[0015] The video is usually recorded in AVI or MP4 encoding, but real-time RTSP streaming
may also be used. The video is 23 to 26 frames per second.
[0016] S2. performing an image extraction from the video obtained by the camera and calculating
an optical flow field between consecutive images:
The rotation speed
ω of the camera is equal to the radian rate
φ of the camera change divided by time
t. The linear speed of the target horse (assuming it is in the camera's field of view)
is proportional to the factor of the distance between the camera and the target horse.
However, since the expected output is expressed by the length of the horse and the
camera settings (zoom level, distance, etc.) are not fixed, this embodiment uses pixels
per second, that is, using pixel displacement to express the rotation speed of the
camera.
[0017] In this embodiment, in order to obtain the pixel displacement, the optical flow between
every two consecutive images is calculated (Horn and Schunck). In this embodiment,
the optical flow field calculation module decodes the video input by the camera into
single frames, and extracts an image every 5 frames, that is, there are 5 frames apart
between two consecutive images. This is because the horse riding speed is relatively
slow, which means that the speed of the camera rotation is also slow, so the overall
performance can be improved by skipping frames without affecting accuracy. Optical
flow is used to estimate a movement between two related images, such as image speed
or displacement of discrete objects (Beauchemin and Barron). Every pixel in the first
image has value. Assuming that the time between two images is very short, the pixel
values in the second image are:
I(
x,y,t) represents the pixel values of a certain pixel in the second image,
x and
y represent the coordinate values of the pixel on the x-axis and y-axis respectively,
t represents time, Δ
x, Δ
y and Δ
t represent value changes of the x-axis coordinate value, the y-axis coordinate value
and the time respectively, of a certain pixel of the second image relative to the
corresponding pixel on the first image;
[0018] The purpose of the optical flow is to find the corresponding pixels between two images
and express them as a direction and a length (a direction is expressed as angle, and
a length is expressed as the number of pixels). There are several methods of calculating
the optical flow field. In this embodiment, the Farneback method is adopted.
[0019] Two images can be modelled as:

[0020] Here,
φi(x,
y,
t) and
ηi(
x,y,t)
, i = 1,2,.
..,n are the basis functions of the
n process models of the two images, and
Ai are the coefficients of the spatial and temporal changes of each process model.
[0021] Assuming
ED (
u, v) is a measurement deviation,
ES (
u, v) is a smoothing term, and
α is a smoothing weighting factor, the optical flow field of the two images is the
sum of the data and the minimum of the weighted smoothing term is:
E(
u, v) represents the optical flow field of two images.
[0022] S3. performing an object artificial intelligence detection on the video:
[0023] Since the obtained optical flow field contains various noises, the noise caused by
the movement of other objects (mainly people and horses) must be removed.
[0024] The object artificial intelligence detection module is used to detect objects that
appear in the video, and save a position and a size of each detected object, comprising
the position and the size of the target horse; wherein the detected position will
be used to filter the optical flow vector of related objects. The position and size
of the target horse are also saved as the target displacement for compensation and
calculation of the horse length.
[0025] In this embodiment, the object detection artificial intelligence is derived from
YOLOv3, and YOLOv3 is derived from the Dark-net developed by Redmon and Farhadi. Like
other object detection models, YOLOv3 is a fully convolutional neural network (CNN)
that applies residual layers and skip connection techniques, allowing layers from
different depths to help inferring results. The network consists of 53 convolutional
layers and the last fully connected layer for inference. Figure 1 shows the network
layout of YOLOv3.
[0026] Compared with other traditional object detection artificial intelligence, the advantage
of YOLOv3 is that only one inference is required, which is much faster than other
multi-dimensional/multi-pass models.
[0027] The input image is entered into 13×13 grids, and each grid predicts up to 5 objects.
For each bounding box, the network will predict 5 coordinates
tx, ty,
tw, th, to where
tx, ty are the x-axis coordinate and y-axis coordinate of the center point of the bounding
box respectively,
tw, th are the width and height of the bounding box respectively, and
to is the confidence degree of the bounding box; therefore, the width and the height
of the boundary frame of each grid in column x and row y with coordinate (
cx,cy) is
pw,ph, and the prediction of the boundary frame of an object is:

wherein
α() is a logical activation function or normalization function to have the result value
∈ [0,
1]. Pr(object) is the probability of an object, and
IOU(
b, object) is the accuracy of the boundary frame. In addition to these coordinates, each prediction
will also have a type associated with it. The YOLOv3 network used in this embodiment
has undergone fine-tuning training on humans and horses. The training is completed
by performing transfer learning on the last fully connected layer, so that the network
can detect humans and horses more accurately.
[0028] The prediction of each frame will provide the position (center point) and size (length
and width) of each object. This information will be used in the following sections
to reduce the noise from the optical flow field and to calculate the displacement
and size of the target horse.
[0029] S4. an optical flow field filtering module filters out all moving objects from the
optical flow field obtained in step S2 (to zero) according to the objects detected
in step S3. At this time, the maximum component of the optical flow field will be
the camera speed in pixels every 5 frames. Figure 2 shows the effect on the optical
flow field histogram when the optical flow vector is removed from the moving objects,
where Figure 2(a) is the amplitude histogram before filtering, and Figure 2(b) is
the amplitude histogram after filtering.
[0030] S5. a camera speed calculation module calculates a direction and a speed of the camera
according to the filtered optical flow field in step S4:
Each optical flow vector is formed by two parts: direction (angle) and speed (pixel):
- 1. Direction: the camera direction value ωφ is equal to the maximum component of the optical flow field direction histogram.
- 2. Speed: the camera speed value ωm is equal to the maximum component of the optical flow field speed histogram (the
opposite direction has been removed).


wherein ωφ and ωm is the direction and speed (displacement) of the camera, and DH is the optical flow field direction histogram, MH is the optical flow field speed histogram, and Fωφ is a vector filter for direction.
[0031] S6. calculating a speed of the target horse between two consecutive images:
Since the speed of the target horse may change, and the camera is unlikely to keep
the target in the center of the field of view, this deviation will affect the result.
In order to reduce this effect, the calculated camera speed may be adjusted by the
displacement of the target horse, and this adjustment is based on the displacement
of the target horse in the current image relative to the previous image (that is,
5 frames before), and the unit is pixel/ 5 frames.
[0032] In particular, in step S3, the object artificial intelligence detection module detects
the position and size of each object (human and horse), comprising the position and
size of the target horse. The displacement of the target horse (in pixels/5 frames)
is calculated by subtracting the previous position from the current position of the
target horse. According to this, the target horse speed calculation module calculates
the final camera speed in pixels, that is, the speed of the target horse is:
ω is the angular velocity of the camera (in pixels / 5 frames),
ḋ is the previous position of the center of the target horse, and
d is the current position of the center of the target horse. Since the desired result
is in the unit of horse length per second,
Hp is required to be converted. First, the time unit will be converted from 5 frames
to 1 second.
fps is the number of frames per second of the video, and the target horse speed
Ḣp in the unit of pixels per second is:

[0033] Then, the target horse speed in the unit of pixels per second is converted into the
target horse speed in the unit of horse length per second.
[0034] In step S2.2, the object artificial intelligence detection module detects the position
and size of the target horse. But unlike the position, the size of the target horse
depends on its posture, and the accuracy of the horse length also depends largely
on its posture. Figure 3 (a) and (b) show different horse postures, and the difference
in the length of the two horses is clearly observed. In the case that the object artificial
intelligence detection module may not be able to detect the target, the previously
known horse length will be used.
[0035] In order to reduce this situation, this embodiment adopts the abnormal state detection
method RANSAC to review the length of the detected horses. Assuming that the target
horse walks normally within a short time after the start of the video, as new data
points appear, a new regression model will be updated based on this set of sampled
data. New data points are checked against the new regression model, and against high-quality
data points of the existing regression model, any abnormal data values will be removed
and replaced with known effective lengths. Pixels are converted to a horse length,
and the target horse speed
Vt is calculated in the unit of horse length/second;
Ḣp is the speed of the target horse in unit of pixels per second;
Pixels is the number of pixels of the horse length, unit is pixel/horse length;
[0036] S7. The output module averages all the calculated target horse speeds between two
consecutive images to obtain the average speed of the target horse. The average speed
is expressed in terms of horse length per second. The final result may be displayed
on real-time video or output to a file.
[0037] According to the average speed of each horse obtained by the prediction, the ranking
of the competition can be effectively predicted.
[0038] The performance of the method in this embodiment will be further explained through
testing.
[0039] There are more than 3000 sample videos in this test. 40 video clips from venue A
and 40 video clips from venue B are randomly sampled from more than 3000 video clips.
The duration of each video is between 30 to 45 seconds. Figure 4 shows the test results
on venue B. Figure 5 shows the test results on venue A. The x-axis represents the
interval with an error rate of 5%, and the y-axis represents the number of results
within the error rate. The error rate is the difference between the estimated result
and the actual result divided by the absolute value of the actual result:
ε is the error (percentage),
RT is the actual result (horse length/second),
RT is the predicted result obtained by the method of this embodiment.
[0040] The actual speed value observed manually can be estimated by measuring the time between
two points in the video of a known distance. In addition, the horse length uses the
average length of a race horse (2.4m), not the actual length of the target horse.
The actual speed is calculated as follows:
Vt = actual speed (horse body/ second)
D = the distance between two known points in the sand circle (m)
T = elapsed time (seconds)
L = the length of the an average racehorse (2.4 m/horse body)
[0041] The results showed that more than 86% (69/80) of the test samples fell within 10%
of the error. Further, only less than 1.5% (1/80) of the test samples fell outside
the 20% error range.
[0042] It can be seen that the accuracy of the method in Embodiment 1 is comparable to the
expected value. The accuracy rate exceeds 90%, and the recall rate exceeds 86%. In
addition, the environment of different competition venues is usually very different
(background, number of moving objects/obstacles and viewing distance). Therefore,
the method of Embodiment 1 is robust to different sand circle conditions and may adapt
to different viewpoints.
Embodiment 2
[0043] This embodiment provides a horse speed calculation system based on deep learning,
comprising:
a camera: the camera is provided in a competition field, and is used to shoot a video
of a target horse walking around the competition field before a competition; during
a shooting process, the camera rotates to keep the target horse in a shooting range;
an optical flow field calculation module: used to extract images from the video obtained
by the camera and calculate an optical flow field between consecutive images;
an object artificial intelligence detection module: used to use a YOLOv3 network to
detect and save the positions and sizes of the objects in the video;
an optical flow field filter module: used to filter the optical flow field calculated
by the optical flow field calculation module according to a detection result of the
object artificial intelligence detection module to filter out all moving objects;
a camera speed calculation module: used to calculate a direction and a speed of the
camera using the optical flow field filtered by the optical flow field filter module;
a target horse speed calculation module: used to calculate, according to the position
of the target horse detected by the object artificial intelligence detection module,
a displacement of the target horse between two consecutive images by subtracting a
position of a previous image from a position of the target horse in a current image.
The displacement of the horse is used to compensate the camera speed between two consecutive
images calculated by the camera speed calculation module to obtain the speed of the
target horse in pixels between the two consecutive images, then unit conversion on
the speed of the target horse in pixels is performed.
an output module is used to average all speeds of the target horse between two consecutive
images to obtain and output an average speed of the target horse. The unit of the
average speed is horse length/second.
[0044] For those skilled in the art, various corresponding changes and modifications may
be given based on the above technical solutions and ideas, and all these changes and
modifications should be included in the protection scope of the claims of the present
invention.
1. A method for a horse speed calculation system based on deep learning,
characterized in that, comprising the following steps:
S1. before a start of a competition, horses participating in the competition walk
around a competition field, a camera shoots a video of a target horse walking around
the competition field and sends the video to an optical flow field calculation module;
during a video shooting, the camera rotates to maintain the target horse within a
shooting range;
S2. performing an image extraction from the video obtained by the camera and calculating
an optical flow field between consecutive images:
the optical flow field calculation module decodes the video, input from the camera,
into single frames, and extracts an image every set number of frames N, that is, two
consecutive images are separated by N frames, and calculates the optical flow field
between two consecutive images;
S3. performing an object artificial intelligence detection on the video:
an object artificial intelligence detection module detects objects that appear in
the video, and saves a position and a size of each of the detected objects, comprising
a position and a size of the target horse; a position of an object is a position of
its center point, and a size comprises a length and a width;
S4. an optical flow field filtering module filters out all moving objects from the
optical flow field obtained in step S2 according to the objects detected in step S3;
S5. a camera speed calculation module calculates a direction and a speed of the camera
according to the filtered optical flow field in step S4;
S6. calculating a speed of the target horse between two consecutive images:
according to the position of the target horse detected by the object artificial intelligence
detection module, a displacement of the target horse between two consecutive images
is calculated by subtracting a position of a previous image from a position of the
target horse in a current image, a unit of the displacement is pixel/N frames; accordingly,
a target horse speed calculation module calculates a final camera speed in pixels/N
frames, that is, a speed Hp of the target horse is:

ω is an angular velocity of the camera, unit is pixel/N frames, d is a position of a center of the target horse in the previous image, and d is a position of a center of the target horse in the current image;
S7. an output module averages all speeds of the target horse between two consecutive
images obtained by calculation to obtain an average speed of the target horse.
2. The method according to claim 1, characterized in that, in step S2, the number of frames is set to 5 frames.
3. The method according to claim 1, characterized in that, in step S2, a method using Farneback is used to calculate the optical flow field.
4. The method according to claim 1, characterized in that, in step S3, the object artificial intelligence detection module uses a YOLOv3 network
to implement an artificial intelligence detection of the objects.
5. The method of claim 1,
characterized in that, in step S6, a unit conversion is performed on
Hp to obtain the target horse speed in unit of horse length/second;
first, calculating the target horse speed Hp in unit of pixel per second through Hp :

fps is a number of frames per second of the video;
then, converting pixels to a horse length, and calculating the target horse speed
Vt in the unit of horse length/second;

Ḣp is the speed of the target horse in unit of pixels per second; Pixels is the number of pixels of the horse length, unit is pixel/horse length.
6. The method of claim 5, characterized in that, for a length of the target horse detected by the object artificial intelligence
detection module, using an abnormal state detection method RANSAC to review the detected
horse length; assuming that the target horse walks normally within a short period
of time after the start of the video, as new data points appear, a new regression
model is updated based on this set of sampled data; checking the new data points against
the new regression model, and against high-quality data points of an existing regression
model, any abnormal data values will be removed and replaced with a known effective
length.
7. A horse speed calculation system based on deep learning,
characterized in that, comprising:
a camera: the camera is provided in a competition field, and is used to shoot a video
of a target horse walking around the competition field before a competition; during
a shooting process, the camera rotates to keep the target horse in a shooting range;
an optical flow field calculation module: used to extract images from the video obtained
by the camera and calculate an optical flow field between consecutive images;
an object artificial intelligence detection module: used to use an artificial intelligence
to detect and save a position and a size of an object in the video;
an optical flow field filter module: used to filter the optical flow field calculated
by the optical flow field calculation module according to a detection result of the
object artificial intelligence detection module to filter out all moving objects;
a camera speed calculation module: used to calculate a direction and a speed of the
camera using the optical flow field filtered by the optical flow field filter module;
a target horse speed calculation module: used to calculate, according to the position
of the target horse detected by the object artificial intelligence detection module,
a displacement of the target horse between two consecutive images by subtracting a
position of a previous image from a position of the target horse in a current image,
and use the displacement of the horse to adjust the speed of the camera between two
consecutive images calculated by the camera speed calculation module to obtain a speed
Hp of the target horse, an adjustment formula is Hp = ω - (ḋ - d), ω is an angular velocity of the camera, unit is pixel/N frames, d is a position of a center of the target horse in the previous image, and d is a position of a center of the target horse in the current image;
an output module is used to average all speeds of the target horse between two consecutive
images to obtain and output an average speed of the target horse.