DEEP LEARNING-BASED HORSE SPEED CALCULATION SYSTEM AND METHOD THEREOF

(19)

(11)

EP 4 105 819 A1

(12)	EUROPEAN PATENT APPLICATION
	published in accordance with Art. 153(4) EPC

(43)	Date of publication:
	21.12.2022 Bulletin 2022/51

(21)	Application number: 21741761.7

(22)	Date of filing: 13.01.2021

(51)

International Patent Classification (IPC):

G06K 9/00^(2022.01)

(86)	International application number:
	PCT/CN2021/071435

(87)	International publication number:
	WO 2021/143716 (22.07.2021 Gazette 2021/29)

(84)	Designated Contracting States:
	AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR
	Designated Extension States:
	BA ME
	Designated Validation States:
	KH MA MD TN

(30)

Priority:

13.01.2020 CN 202010030062

(71)	Applicant: Target Information Company Limited
	Nt, Hong Kong (HK)

(72)	Inventor:
	KWAN, Kwok Leung New Territories, Hong Kong (HK)

(74)	Representative: Isern Patentes y Marcas S.L.
	Avda. Diagonal, 463 Bis, 2° 08036 Barcelona 08036 Barcelona (ES)

(54)	DEEP LEARNING-BASED HORSE SPEED CALCULATION SYSTEM AND METHOD THEREOF

(57) This invention discloses a horse speed calculation system and method based on deep learning. Through video of horses walking around a competition field before the game as captured by camera, and through optical flow field calculation, object detection based on deep learning, optical flow field filtering, camera speed calculation, use of horse displacement to adjust camera speed, and unit conversion, average speed of a target horse is ultimately obtained. This invention realizes the use of artificial intelligence technology to observe horses in a more scientific way and to predict and calculate the horse speed.

Description

Technical Field

[0001] The invention relates to the technical field of deep learning, in particular to a horse speed calculation system and method based on deep learning.

Technical Background

[0002] Equestrian is a sport that integrates exercise, fitness and leisure, and is loved by increasingly more people in recent years. Speed horse racing based on equestrian has developed into a competition event, which uses speed and riding skills to win and challenges the overall level of a rider. At present, in predicting the competition ranking of horses, it is generally based on naked eyes to observe the state of the horses (including fatigue level, sweating amount and movement) to make predictions. There is also a lack of more accurate scientific and intelligent means.

Summary of the Invention

[0003] In view of the shortcomings of the prior art, the present invention aims to provide a horse speed calculation system and method based on deep learning, which realizes the use of artificial intelligence technology to observe horses and calculate horse speeds in a more scientific way.

[0004] In order to achieve the above objectives, the present invention adopts the following technical solutions:
A method for a horse speed calculation system based on deep learning, comprising the following steps:

S1. before a start of a competition, horses participating in the competition walk around a competition field, a camera shoots a video of a target horse walking around the competition field and sends the video to an optical flow field calculation module; during a video shooting, the camera rotates to maintain the target horse within a shooting range;

S2. performing an image extraction from the video obtained by the camera and calculating an optical flow field between consecutive images:
the optical flow field calculation module decodes the video, input from the camera, into single frames, and extracts an image every set number of frames N, that is, two consecutive images are separated by N frames, and calculates the optical flow field between two consecutive images;

S3. performing an object artificial intelligence detection on the video:
an object artificial intelligence detection module detects objects that appear in the video, and saves a position and a size of each of the detected objects, comprising a position and a size of the target horse; a position of an object is a position of its center point, and a size comprises a length and a width;

S4. an optical flow field filtering module filters out all moving objects from the optical flow field obtained in step S2 according to the objects detected in step S3;

S5. a camera speed calculation module calculates a direction and a speed of the camera according to the filtered optical flow field in step S4;

S6. calculating a speed of the target horse between two consecutive images:

according to the position of the target horse detected by the object artificial intelligence detection module, a displacement of the target horse between two consecutive images is calculated by subtracting a position of a previous image from a position of the target horse in a current image, a unit of the displacement is pixel/N frames; accordingly, a target horse speed calculation module calculates a final camera speed in pixels/N frames, that is, a speed H_p of the target horse is:

ω is an angular velocity of the camera, unit is pixel/N frames, d is a position of a center of the target horse in the previous image, and d is a position of a center of the target horse in the current image;

S7. an output module averages all speeds of the target horse between two consecutive images obtained by calculation to obtain an average speed of the target horse.

[0005] Further, in step S2, the number of frames is set to 5 frames.

[0006] Further, in step S2, a method using Farneback is used to calculate the optical flow field.

[0007] Further, in step S3, the object artificial intelligence detection module uses a YOLOv3 network to implement an artificial intelligence detection of the objects.

[0008] Further, in step S6, a unit conversion is performed on H_p to obtain the target horse speed in unit of horse length/second;

first, calculating the target horse speed H_p in unit of pixel per second through H _p :

fps is a number of frames per second of the video;

then, converting pixels to a horse length, and calculating the target horse speed V_t in the unit of horse length/second;

Ḣ_p is the speed of the target horse in unit of pixels per second; Pixels is the number of pixels of the horse length, unit is pixel/horse length.

[0009] Even further, for a length of the target horse detected by the object artificial intelligence detection module, using an abnormal state detection method RANSAC to review the detected horse length; assuming that the target horse walks normally within a short period of time after the start of the video, as new data points appear, a new regression model is updated based on this set of sampled data; checking the new data points against the new regression model, and against high-quality data points of an existing regression model, any abnormal data values will be removed and replaced with a known effective length.

[0010] The present invention further provides a horse speed calculation system based on deep learning, comprising:

a camera: the camera is provided in a competition field, and is used to shoot a video of a target horse walking around the competition field before a competition; during a shooting process, the camera rotates to keep the target horse in a shooting range;

an optical flow field calculation module: used to extract images from the video obtained by the camera and calculate an optical flow field between consecutive images;

an object artificial intelligence detection module: used to use an artificial intelligence to detect and save a position and a size of an object in the video;

an optical flow field filter module: used to filter the optical flow field calculated by the optical flow field calculation module according to a detection result of the object artificial intelligence detection module to filter out all moving objects;

a camera speed calculation module: used to calculate a direction and a speed of the camera using the optical flow field filtered by the optical flow field filter module;

a target horse speed calculation module: used to calculate, according to the position of the target horse detected by the object artificial intelligence detection module, a displacement of the target horse between two consecutive images by subtracting a position of a previous image from a position of the target horse in a current image, and use the displacement of the horse to adjust the speed of the camera between two consecutive images calculated by the camera speed calculation module to obtain a speed H_p of the target horse, an adjustment formula is H_p = ω - (ḋ - d), ω is an angular velocity of the camera, unit is pixel/N frames, d is a position of a center of the target horse in the previous image, and d is a position of a center of the target horse in the current image;

an output module is used to average all speeds of the target horse between two consecutive images to obtain and output an average speed of the target horse.

[0011] The beneficial effect of the present invention is: the present invention realizes the use of artificial intelligence technology to observe horses and calculate the speed of the horses in a more scientific way, so that the competition ranking of the horses can be effectively predicted.

Description of the Figures

[0012]

Figure 1 is an illustrative diagram of a network architecture of YOLOv3;

Figure 2 is an illustrative diagram of an influence on the histogram of an optical flow field when optical flow vectors are removed from the moving objects;

Figure 3 is an exemplary diagram showing different examples of horse lengths in different postures;

Figure 4 is an illustrative diagram of test results of an embodiment of the present invention.

Detailed Description

[0013] The present invention will be further described below in conjunction with the accompanying figures. It should be noted that these embodiments are based on the present technical solution to provide detailed implementation and specific operation procedures, but the scope of protection of the present invention is not limited to these embodiments.

Embodiment 1

[0014] This embodiment provides a method for a horse speed calculation system based on deep learning, comprising the following steps:
S1. before a start of a competition, horses participating in the competition walk around a competition field, a camera shoots a video of a target horse walking around the competition field and sends the video to an optical flow field calculation module; during a video shooting, the camera rotates to maintain the target horse within a shooting range;

[0015] The video is usually recorded in AVI or MP4 encoding, but real-time RTSP streaming may also be used. The video is 23 to 26 frames per second.

[0016] S2. performing an image extraction from the video obtained by the camera and calculating an optical flow field between consecutive images:
The rotation speed ω of the camera is equal to the radian rate φ of the camera change divided by time t. The linear speed of the target horse (assuming it is in the camera's field of view) is proportional to the factor of the distance between the camera and the target horse. However, since the expected output is expressed by the length of the horse and the camera settings (zoom level, distance, etc.) are not fixed, this embodiment uses pixels per second, that is, using pixel displacement to express the rotation speed of the camera.

[0017] In this embodiment, in order to obtain the pixel displacement, the optical flow between every two consecutive images is calculated (Horn and Schunck). In this embodiment, the optical flow field calculation module decodes the video input by the camera into single frames, and extracts an image every 5 frames, that is, there are 5 frames apart between two consecutive images. This is because the horse riding speed is relatively slow, which means that the speed of the camera rotation is also slow, so the overall performance can be improved by skipping frames without affecting accuracy. Optical flow is used to estimate a movement between two related images, such as image speed or displacement of discrete objects (Beauchemin and Barron). Every pixel in the first image has value. Assuming that the time between two images is very short, the pixel values in the second image are:

I(x,y,t) represents the pixel values of a certain pixel in the second image, x and y represent the coordinate values of the pixel on the x-axis and y-axis respectively, t represents time, Δx, Δy and Δt represent value changes of the x-axis coordinate value, the y-axis coordinate value and the time respectively, of a certain pixel of the second image relative to the corresponding pixel on the first image;

[0018] The purpose of the optical flow is to find the corresponding pixels between two images and express them as a direction and a length (a direction is expressed as angle, and a length is expressed as the number of pixels). There are several methods of calculating the optical flow field. In this embodiment, the Farneback method is adopted.

[0019] Two images can be modelled as:

[0020] Here, φ_i(x,y,t) and η_i(x,y,t), i = 1,2,...,n are the basis functions of the n process models of the two images, and A_i are the coefficients of the spatial and temporal changes of each process model.

[0021] Assuming E_D (u, v) is a measurement deviation, E_S (u, v) is a smoothing term, and α is a smoothing weighting factor, the optical flow field of the two images is the sum of the data and the minimum of the weighted smoothing term is:

E(u, v) represents the optical flow field of two images.

[0022] S3. performing an object artificial intelligence detection on the video:

[0023] Since the obtained optical flow field contains various noises, the noise caused by the movement of other objects (mainly people and horses) must be removed.

[0024] The object artificial intelligence detection module is used to detect objects that appear in the video, and save a position and a size of each detected object, comprising the position and the size of the target horse; wherein the detected position will be used to filter the optical flow vector of related objects. The position and size of the target horse are also saved as the target displacement for compensation and calculation of the horse length.

[0025] In this embodiment, the object detection artificial intelligence is derived from YOLOv3, and YOLOv3 is derived from the Dark-net developed by Redmon and Farhadi. Like other object detection models, YOLOv3 is a fully convolutional neural network (CNN) that applies residual layers and skip connection techniques, allowing layers from different depths to help inferring results. The network consists of 53 convolutional layers and the last fully connected layer for inference. Figure 1 shows the network layout of YOLOv3.

[0026] Compared with other traditional object detection artificial intelligence, the advantage of YOLOv3 is that only one inference is required, which is much faster than other multi-dimensional/multi-pass models.

[0027] The input image is entered into 13×13 grids, and each grid predicts up to 5 objects. For each bounding box, the network will predict 5 coordinates t_x, t_y, t_w, t_h, t_o where t_x, t_y are the x-axis coordinate and y-axis coordinate of the center point of the bounding box respectively, t_w, t_h are the width and height of the bounding box respectively, and t_o is the confidence degree of the bounding box; therefore, the width and the height of the boundary frame of each grid in column x and row y with coordinate (c_x,c_y) is p_w,p_h, and the prediction of the boundary frame of an object is:

wherein α() is a logical activation function or normalization function to have the result value ∈ [0, 1]. Pr(object) is the probability of an object, and IOU(b, object) is the accuracy of the boundary frame. In addition to these coordinates, each prediction will also have a type associated with it. The YOLOv3 network used in this embodiment has undergone fine-tuning training on humans and horses. The training is completed by performing transfer learning on the last fully connected layer, so that the network can detect humans and horses more accurately.

[0028] The prediction of each frame will provide the position (center point) and size (length and width) of each object. This information will be used in the following sections to reduce the noise from the optical flow field and to calculate the displacement and size of the target horse.

[0029] S4. an optical flow field filtering module filters out all moving objects from the optical flow field obtained in step S2 (to zero) according to the objects detected in step S3. At this time, the maximum component of the optical flow field will be the camera speed in pixels every 5 frames. Figure 2 shows the effect on the optical flow field histogram when the optical flow vector is removed from the moving objects, where Figure 2(a) is the amplitude histogram before filtering, and Figure 2(b) is the amplitude histogram after filtering.

[0030] S5. a camera speed calculation module calculates a direction and a speed of the camera according to the filtered optical flow field in step S4:
Each optical flow vector is formed by two parts: direction (angle) and speed (pixel):

1. Direction: the camera direction value ω_φ is equal to the maximum component of the optical flow field direction histogram.
2. Speed: the camera speed value ω_m is equal to the maximum component of the optical flow field speed histogram (the opposite direction has been removed).

wherein ω_φ and ω_m is the direction and speed (displacement) of the camera, and D_H is the optical flow field direction histogram, M_H is the optical flow field speed histogram, and F_ωφ is a vector filter for direction.

[0031] S6. calculating a speed of the target horse between two consecutive images:
Since the speed of the target horse may change, and the camera is unlikely to keep the target in the center of the field of view, this deviation will affect the result. In order to reduce this effect, the calculated camera speed may be adjusted by the displacement of the target horse, and this adjustment is based on the displacement of the target horse in the current image relative to the previous image (that is, 5 frames before), and the unit is pixel/ 5 frames.

[0032] In particular, in step S3, the object artificial intelligence detection module detects the position and size of each object (human and horse), comprising the position and size of the target horse. The displacement of the target horse (in pixels/5 frames) is calculated by subtracting the previous position from the current position of the target horse. According to this, the target horse speed calculation module calculates the final camera speed in pixels, that is, the speed of the target horse is:

ω is the angular velocity of the camera (in pixels / 5 frames), ḋ is the previous position of the center of the target horse, and d is the current position of the center of the target horse. Since the desired result is in the unit of horse length per second, H_p is required to be converted. First, the time unit will be converted from 5 frames to 1 second. fps is the number of frames per second of the video, and the target horse speed Ḣ_p in the unit of pixels per second is:

[0033] Then, the target horse speed in the unit of pixels per second is converted into the target horse speed in the unit of horse length per second.

[0034] In step S2.2, the object artificial intelligence detection module detects the position and size of the target horse. But unlike the position, the size of the target horse depends on its posture, and the accuracy of the horse length also depends largely on its posture. Figure 3 (a) and (b) show different horse postures, and the difference in the length of the two horses is clearly observed. In the case that the object artificial intelligence detection module may not be able to detect the target, the previously known horse length will be used.

[0035] In order to reduce this situation, this embodiment adopts the abnormal state detection method RANSAC to review the length of the detected horses. Assuming that the target horse walks normally within a short time after the start of the video, as new data points appear, a new regression model will be updated based on this set of sampled data. New data points are checked against the new regression model, and against high-quality data points of the existing regression model, any abnormal data values will be removed and replaced with known effective lengths. Pixels are converted to a horse length, and the target horse speed V_t is calculated in the unit of horse length/second;

Ḣ_p is the speed of the target horse in unit of pixels per second; Pixels is the number of pixels of the horse length, unit is pixel/horse length;

[0036] S7. The output module averages all the calculated target horse speeds between two consecutive images to obtain the average speed of the target horse. The average speed is expressed in terms of horse length per second. The final result may be displayed on real-time video or output to a file.

[0037] According to the average speed of each horse obtained by the prediction, the ranking of the competition can be effectively predicted.

[0038] The performance of the method in this embodiment will be further explained through testing.

[0039] There are more than 3000 sample videos in this test. 40 video clips from venue A and 40 video clips from venue B are randomly sampled from more than 3000 video clips. The duration of each video is between 30 to 45 seconds. Figure 4 shows the test results on venue B. Figure 5 shows the test results on venue A. The x-axis represents the interval with an error rate of 5%, and the y-axis represents the number of results within the error rate. The error rate is the difference between the estimated result and the actual result divided by the absolute value of the actual result:

ε is the error (percentage), R_T is the actual result (horse length/second), R_T is the predicted result obtained by the method of this embodiment.

[0040] The actual speed value observed manually can be estimated by measuring the time between two points in the video of a known distance. In addition, the horse length uses the average length of a race horse (2.4m), not the actual length of the target horse. The actual speed is calculated as follows:

V_t = actual speed (horse body/ second)

D = the distance between two known points in the sand circle (m)

T = elapsed time (seconds)

L = the length of the an average racehorse (2.4 m/horse body)

[0041] The results showed that more than 86% (69/80) of the test samples fell within 10% of the error. Further, only less than 1.5% (1/80) of the test samples fell outside the 20% error range.

[0042] It can be seen that the accuracy of the method in Embodiment 1 is comparable to the expected value. The accuracy rate exceeds 90%, and the recall rate exceeds 86%. In addition, the environment of different competition venues is usually very different (background, number of moving objects/obstacles and viewing distance). Therefore, the method of Embodiment 1 is robust to different sand circle conditions and may adapt to different viewpoints.

Embodiment 2

[0043] This embodiment provides a horse speed calculation system based on deep learning, comprising:

an optical flow field calculation module: used to extract images from the video obtained by the camera and calculate an optical flow field between consecutive images;

an object artificial intelligence detection module: used to use a YOLOv3 network to detect and save the positions and sizes of the objects in the video;

a camera speed calculation module: used to calculate a direction and a speed of the camera using the optical flow field filtered by the optical flow field filter module;

a target horse speed calculation module: used to calculate, according to the position of the target horse detected by the object artificial intelligence detection module, a displacement of the target horse between two consecutive images by subtracting a position of a previous image from a position of the target horse in a current image. The displacement of the horse is used to compensate the camera speed between two consecutive images calculated by the camera speed calculation module to obtain the speed of the target horse in pixels between the two consecutive images, then unit conversion on the speed of the target horse in pixels is performed.

an output module is used to average all speeds of the target horse between two consecutive images to obtain and output an average speed of the target horse. The unit of the average speed is horse length/second.

[0044] For those skilled in the art, various corresponding changes and modifications may be given based on the above technical solutions and ideas, and all these changes and modifications should be included in the protection scope of the claims of the present invention.

Claims

1. A method for a horse speed calculation system based on deep learning, characterized in that, comprising the following steps:

S4. an optical flow field filtering module filters out all moving objects from the optical flow field obtained in step S2 according to the objects detected in step S3;

S5. a camera speed calculation module calculates a direction and a speed of the camera according to the filtered optical flow field in step S4;

S6. calculating a speed of the target horse between two consecutive images:

S7. an output module averages all speeds of the target horse between two consecutive images obtained by calculation to obtain an average speed of the target horse.

2. The method according to claim 1, characterized in that, in step S2, the number of frames is set to 5 frames.

3. The method according to claim 1, characterized in that, in step S2, a method using Farneback is used to calculate the optical flow field.

4. The method according to claim 1, characterized in that, in step S3, the object artificial intelligence detection module uses a YOLOv3 network to implement an artificial intelligence detection of the objects.

5. The method of claim 1, characterized in that, in step S6, a unit conversion is performed on H_p to obtain the target horse speed in unit of horse length/second;

first, calculating the target horse speed H_p in unit of pixel per second through H_p :

fps is a number of frames per second of the video;

then, converting pixels to a horse length, and calculating the target horse speed V_t in the unit of horse length/second;

Ḣ_p is the speed of the target horse in unit of pixels per second; Pixels is the number of pixels of the horse length, unit is pixel/horse length.

6. The method of claim 5, characterized in that, for a length of the target horse detected by the object artificial intelligence detection module, using an abnormal state detection method RANSAC to review the detected horse length; assuming that the target horse walks normally within a short period of time after the start of the video, as new data points appear, a new regression model is updated based on this set of sampled data; checking the new data points against the new regression model, and against high-quality data points of an existing regression model, any abnormal data values will be removed and replaced with a known effective length.

7. A horse speed calculation system based on deep learning, characterized in that, comprising:

an optical flow field calculation module: used to extract images from the video obtained by the camera and calculate an optical flow field between consecutive images;

an object artificial intelligence detection module: used to use an artificial intelligence to detect and save a position and a size of an object in the video;

a camera speed calculation module: used to calculate a direction and a speed of the camera using the optical flow field filtered by the optical flow field filter module;

an output module is used to average all speeds of the target horse between two consecutive images to obtain and output an average speed of the target horse.

Drawing

Search report