FEATURE MAP GENERATION METHOD AND APPARATUS, STORAGE MEDIUM, AND COMPUTER DEVICE

(19)

(11)

EP 4 455 875 A1

(12)	EUROPEAN PATENT APPLICATION
	published in accordance with Art. 153(4) EPC

(43)	Date of publication:
	30.10.2024 Bulletin 2024/44

(21)	Application number: 23851329.5

(22)	Date of filing: 30.05.2023

(51)

International Patent Classification (IPC):

G06F 9/48^(2006.01)

G06V 10/46^(2022.01)

(52)	Cooperative Patent Classification (CPC):
	G06F 9/48; G06V 10/74; G06V 10/75; G06V 10/46; G06V 10/771; G06V 10/44; G06V 10/7715; G06V 10/761; G06V 10/98; G06V 10/36; G06T 7/579; G06V 10/757; G06V 10/462; G06F 18/00; G06T 7/73

(86)	International application number:
	PCT/CN2023/097112

(87)	International publication number:
	WO 2024/032101 (15.02.2024 Gazette 2024/07)

(84)	Designated Contracting States:
	AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC ME MK MT NL NO PL PT RO RS SE SI SK SM TR
	Designated Extension States:
	BA
	Designated Validation States:
	KH MA MD TN

(30)

Priority:

08.08.2022 CN 202210945938

(71)	Applicant: Tencent Technology (Shenzhen) Company Limited
	Shenzhen, Guangdong 518057 (CN)

(72)	Inventor:
	YU, Changsong Shenzhen, Guangdong 518057 (CN)

(74)	Representative: Gunzelmann, Rainer
	Wuesthoff & Wuesthoff Patentanwälte und Rechtsanwalt PartG mbB Schweigerstraße 2 81541 München 81541 München (DE)

(54)	FEATURE MAP GENERATION METHOD AND APPARATUS, STORAGE MEDIUM, AND COMPUTER DEVICE

(57) This application relates to a feature map generation method and apparatus, a storage medium, and a computer device, which may be applied to the field of maps or the field of automatic driving. The method includes: obtaining a plurality of image frames, separately extracting image feature points from each image frame, and determining corresponding feature descriptors based on the extracted image feature points (202); forming image feature points with a matching relationship in the image feature points of each image frame into a feature point set (204); determining a representative feature point from the feature point set, and calculating a difference between a feature descriptor corresponding to a remaining image feature point in the feature point set and a feature descriptor corresponding to the representative feature point (206); determining a position error based on the calculated difference, iteratively updating the remaining image feature point in the feature point set based on the position error, and obtaining an updated feature point set in a case that an iteration stop condition is satisfied (208); and determining a space feature point based on the updated feature point set to generate a feature map for positioning (210).

Description

RELATED APPLICATION

[0001] This application claims priority to Chinese Patent Application No. 2022109459385, filed with the China National Intellectual Property Administration on August 8, 2022 and entitled "FEATURE MAP GENERATION METHOD AND APPARATUS, STORAGE MEDIUM, AND COMPUTER DEVICE", which is incorporated herein by reference in its entirety.

FIELD OF THE TECHNOLOGY

[0002] This application relates to the field of computer technologies, and in particular, to a feature map generation method and apparatus, a computer device, a storage medium, and a computer program product.

BACKGROUND OF THE DISCLOSURE

[0003] With development of computer technologies, visual positioning technologies emerge. In the visual positioning technologies, a feature map may be constructed. The feature map is a data structure and may use a relevant geometric feature (such as a point, a straight line, or a surface) to represent an observation environment, thereby assisting a to-be-positioned moving device in positioning. For example, in automatic driving, an automatic driving vehicle may be positioned by constructing the feature map.

[0004] As application such as automatic driving becomes widespread, a requirement for positioning accuracy is increasingly high. However, a feature map constructed in the related art often suffers from low positioning accuracy during use.

SUMMARY

[0005] According to various embodiments of this application, a feature map generation method and apparatus, a computer device, a computer-readable storage medium, and a computer program product are provided.

[0006] According to a first aspect, this application provides a feature map generation method, performed by a computer, and including: obtaining a plurality of image frames photographed for a target scene, separately extracting image feature points from each image frame, and determining corresponding feature descriptors based on a position in a corresponding image at which the extracted image feature points are located; forming image feature points with a matching relationship in the image feature points of each image frame into a feature point set; determining a representative feature point from the feature point set, and calculating a difference between a feature descriptor corresponding to a remaining image feature point in the feature point set and a feature descriptor corresponding to the representative feature point; determining a position error of the feature point set based on the calculated difference, iteratively updating the remaining image feature point in the feature point set based on the position error, and obtaining an updated feature point set in a case that an iteration stop condition is satisfied; and determining a space feature point corresponding to the updated feature point set based on a position in a corresponding image at which each image feature point in the updated feature point set is located, and generating a feature map based on the space feature point, the feature map being used for positioning a to-be-positioned moving device in the target scene.

[0007] According to a second aspect, this application further provides a feature map generation apparatus. The apparatus includes: a feature extraction module, configured to obtain a plurality of image frames photographed for a target scene, separately extracting image feature points from each image frame, and determine corresponding feature descriptors based on a position in a corresponding image at which the extracted image feature points are located; a feature point set determining module, configured to form image feature points with a matching relationship in the image feature points of each image frame into a feature point set; a difference calculation module, configured to determine a representative feature point from the feature point set, and calculate a difference between a feature descriptor corresponding to a remaining image feature point in the feature point set and a feature descriptor corresponding to the representative feature point; a position update module, configured to determine a position error of the feature point set based on the calculated difference, iteratively update the remaining image feature point in the feature point set based on the position error, and obtain an updated feature point set in a case that an iteration stop condition is satisfied; and a feature map generation module, configured to determine a space feature point corresponding to the updated feature point set based on a position in a corresponding image at which each image feature point in the updated feature point set is located, and generate a feature map based on the space feature point, the feature map being used for positioning a to-be-positioned moving device in the target scene.

[0008] According to a third aspect, this application further provides a computer device. The computer device includes a memory and a processor, the memory having computer-readable instructions stored therein, and the computer-readable instructions, when executed by the processor, implement operations of the foregoing feature map generation method.

[0009] According to a fourth aspect, this application further provides a computer-readable storage medium. The computer-readable storage medium has computer-readable instructions stored thereon, and the computer-readable instructions, when executed by a processor, implementing operations of the foregoing feature map generation method.

[0010] According to a fifth aspect, this application further provides a computer program product. The computer program product includes a computer program, and the computer program, when executed by a processor, implements operations of the foregoing feature map generation method.

[0011] Details of one or more embodiments of this application are provided in the accompanying drawings and descriptions below. Other features, objectives, and advantages of this application become apparent from the specification, the accompanying drawings, and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] To describe the technical solutions in embodiments of this application or conventional technologies more clearly, the following briefly describes the accompanying drawings for describing the embodiments or the conventional technologies. Apparently, the accompanying drawings in the following descriptions show merely the embodiments of this application, and a person of ordinary skill in the art may still obtain other accompanying drawings from disclosed accompanying drawings without creative efforts.

FIG. 1 is a schematic diagram of an application environment of a feature map generation method according to an embodiment.

FIG. 2 is a schematic flowchart of a feature map generation method according to an embodiment.

FIG. 3 is a schematic composition diagram of a feature point set according to an embodiment.

FIG. 4 is a schematic flowchart of generating a feature map based on a space feature point according to an embodiment.

FIG. 5 is a schematic diagram of determining a corresponding position in an input image according to an embodiment.

FIG. 6 is a schematic diagram of a structure of a feature extraction model according to an embodiment.

FIG. 7 is a schematic flowchart of steps of determining positioning information according to an embodiment.

FIG. 8 is a block diagram of a structure of a feature map generation apparatus according to an embodiment.

FIG. 9 is a diagram of an internal structure of a computer device according to an embodiment.

FIG. 10 is a diagram of an internal structure of a computer device according to another embodiment.

DESCRIPTION OF EMBODIMENTS

[0013] The technical solutions in embodiments of this application are clearly and completely described below with reference to the accompanying drawings in the embodiments of this application. Apparently, the described embodiments are merely some rather than all of the embodiments of this application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of this application without creative efforts shall fall within the protection scope of this application.

[0014] A feature map generation method provided in all embodiments of this application may be applied to an intelligent traffic system (ITS) and an intelligent vehicle infrastructure cooperative system (IVICS).

[0015] The feature map generation method provided in the embodiment of this application may be applied to an application environment shown in FIG. 1. A moving device 102 communicates with a server 104 via a network. The moving device 102 refers to one of a device that may move autonomously or a device that may move passively. The device that moves autonomously may be various vehicles, robots, and the like. The device that moves passively may be, for example, a terminal carried by a user and moving with the user, a smartphone, a tablet computer, or a portable wearable device. A photographing device is mounted on the moving device 102. The server 104 may be an independent physical server, a server cluster or a distributed system including a plurality of physical servers, or a cloud server that provides a cloud computing service. Specifically, in a feature map generation phase, a photographing device on any moving device may photograph a target scene to obtain a plurality of image frames and send the photographed plurality of image frames to the server. The server generates a feature map based on each image frame and stores the feature map. In a positioning information determining phase, a to-be-positioned moving device may send inertial measurement data, speed measurement data, and a target image photographed in the target scene to the server. The server may determine positioning information of the to-be-positioned moving device based on the data and the stored feature map, and send the positioning information to the to-be-positioned moving device.

[0016] It may be understood that in another embodiment, in a case that any moving device moves in a target scene, a photographing device on the moving device may photograph the target scene to obtain a plurality of image frames, and then generate a feature map based on each image frame and store the feature map, so that positioning information may be determined based on the stored feature map in a case that the moving device moves in the target scene again. In addition, the feature map generated by the moving device may be further sent to the server. In a case that another to-be-positioned moving device moves in the target scene, the feature map may be downloaded, and positioning information may be determined based on the downloaded feature map. Alternatively, in a case that another to-be-positioned moving device moves in the target scene, inertial measurement data, speed measurement data, and a target image photographed in the target scene may be sent to the server. The server may determine positioning information of the to-be-positioned moving device based on the data and the stored feature map, and return the positioning information to the to-be-positioned moving device.

[0017] In all embodiments, as shown in FIG. 2, a feature map generation method is provided. The method is performed by a computer device. Specifically, the method may be individually performed by a computer device such as the moving device or the server in FIG. 1, or may be collaboratively performed by the moving device and the server. In the embodiment of this application, an example in which the method is applied to the server in FIG. 1 is used for description, and the method includes the following steps.

[0018] Step 202: Obtain a plurality of image frames photographed for a target scene, separately extract image feature points from each image frame, and determine corresponding feature descriptors based on a position in a corresponding image at which the extracted image feature points are located.

[0019] The target scene refers to a specific scene for which the feature map to be generated is targeted. The target scene may be specifically an environment in which a vehicle is located. For example, the target scene may be a scene determined by a possible driving route of the vehicle. In a case that the vehicle is driving in the scene, a plurality of image frames of surroundings are obtained through camera acquisition. The image feature points are specific pixel points on an image that may be used for describing a feature of a scene, such as significant edge points, histogram of oriented gradient features, and Haar features. The feature descriptor has a one-to-one correspondence with the image feature point. The feature descriptor is a representation of a Gaussian image gradient statistical result in a neighborhood near the feature point. The feature descriptor may be used for describing a corresponding image feature point. In the embodiments of the present disclosure including the embodiments of both the claims and the specification (hereinafter referred to as "all embodiments of the present disclosure"), the feature descriptor is preferably configured as a vector for identifying positions of image feature points.

[0020] Specifically, the moving device may acquire the plurality of image frames and transmit the plurality of image frames to the server in real time for processing. Alternatively, the moving device may only be responsible for storing the acquired plurality of image frames. After image acquisition is completed, the stored plurality of image frames are inputted to the server in a specific manner for processing. After obtaining the plurality of image frames photographed for the target scene, the server may extract the image feature points from each image frame. For each image feature point, the moving device may determine a feature descriptor corresponding to the image feature point based on the position in the image at which the image feature point is located, so that image feature points of each image frame and a feature descriptor of each image feature point may be obtained.

[0021] In all embodiments, the image feature points may be extracted by using, but are not limited to, an algorithm such as Good Features to Track. A corresponding function is provided in a computer vision library OpenCV. In another embodiment, feature point extraction may alternatively be performed on the image by training a machine learning model. The machine learning model includes a plurality of convolutional layers. Each convolutional layer outputs a feature image after performing different processing on an original image. The feature image represents a possibility that each position in the original image is a feature point, and an original feature point may be determined based on the feature image. It may be understood that a plurality of image feature points may be extracted from each image frame. The plurality of image feature points refer to at least two image feature points.

[0022] Step 204: Form image feature points with a matching relationship in the image feature points of each image frame into a feature point set.

[0023] The image feature points with a matching relationship refer to similar image feature points. In all embodiments, the image feature points with a matching relationship may be determined based on feature descriptors of the image feature points. In a case that feature descriptors of two image feature points reach a specific degree of similarity, it is considered that the two image feature points are matched. In the embodiments of the present disclosure, taking two frames of images as an example, for two image feature points that appear in the two frames of images respectively, a position vector indicated by a feature descriptor corresponding to each image feature point is used to determine a position distance between the two image feature points. In the case that the distance is less than the target threshold, it is determined that the positions between the two image feature points are approximate, that is, the two image feature points are matched. In other words, the two image feature points appearing in the two frame images respectively match, indicating that the two image feature points refer to feature points at the same content position.

[0024] Specifically, the server may divide all image feature points in each image frame into a set based on a matching relationship between the image feature points, to obtain a plurality of feature point sets. In these feature point sets, the image feature points belonging to the same feature point set have a matching relationship with each other. For example, as shown in FIG. 3, it is assumed that there are three image frames in total. The first image frame includes image feature points A1, A2, and A3. The second image frame includes image feature points B1, B2, B3, and B4. The third image frame includes image feature points C1, C2, and C3. Assuming that A1, B1, and C1 are image feature points with a matching relationship with each other, A2, B2, and C2 are image feature points with a matching relationship with each other, and A3, B3, and C3 are image feature points with a matching relationship with each other, A1, B1, and C1 may form a feature point set 1, A2, B2, and C2 may form a feature point set 2, and A3, B3, and C3 may form a feature point set.

[0025] In all embodiments, it is assumed that there are M frames in the plurality of image frames, and i takes 1 to M. N image feature points are first extracted from an i^th image frame. For the method of extracting image feature points, refer to the foregoing descriptions. In a case that i = 1, for the first image frame, a corresponding feature point set may be created for each image feature point in the first image frame. In a case that i> 1, and j takes 1 to N, it is determined whether there is an image feature point in an (i-1)^th image frame that matches a j^th image feature point in the i^th image frame. If there is an image feature point matching the j^th image feature point, the j^th image feature point is added to a feature point set corresponding to the image feature point matching the j^th image feature point (because the (i-1)^th frame is processed, the feature point set is to exist). In a case that there is no image feature point matching the j^th image feature point, a feature point set corresponding to the j^th image feature point is created. Once no new image feature point is added to a feature point set, it may be considered that the feature point set is created. The images are processed frame by frame by using the foregoing method. After processing of the M^th image frame is completed, a plurality of feature point sets are obtained. Each feature point set includes at least one image feature point, or a sequence of image feature points with a matching relationship with each other. It may be understood that in actual application, in a case that the feature map is constructed in real time, M may not be known, but specific steps are similar to the above, and i only needs to be incremented until all images are processed.

[0026] Step 206: Determine a representative feature point from the feature point set, and calculate a difference between a feature descriptor corresponding to a remaining image feature point in the feature point set and a feature descriptor corresponding to the representative feature point.

[0027] The representative feature point set refers to an image feature point in the feature point set that may represent the feature point set. The remaining image feature point in the feature point set refers to an image feature point other than the representative feature point in the feature point set. For example, it is assumed that a feature point set includes four image feature points A1, B1, C1, and D 1. A1 is a representative feature point, and B1, C1, and D1 are remaining image feature points. In all embodiments, the server may randomly select an image feature point from each feature point set as a respective representative feature point of each feature point set. In another embodiment, the server may calculate an average feature point of each feature point set, and determine an image feature point closest to the respective average feature point in each feature point set as the representative feature point.

[0028] Specifically, to avoid overall offset during iteratively updating the image feature point in the feature point set, in this embodiment, a representative feature point may be determined in each feature point set. In an iterative update process, a position of the representative feature point is kept fixed, a difference between the feature descriptor corresponding to the remaining image feature point in each feature point set and a feature descriptor corresponding to the representative feature point of each feature point set is calculated to obtain a difference corresponding to each remaining image feature point. For example, assuming that the feature point set includes A1-D 1 as an example, it is further assumed that A1 is determined as the representative image feature point, and B1, C1, and D1 are determined as the remaining image feature points. A manner of calculating the difference between the feature descriptor corresponding to the remaining image feature points and the feature descriptor corresponding to the representative image feature point may be as follows:

[0029] Furthermore, the feature descriptors corresponding to A1, B1, C1, and D1 are a1, b1, c1, and d1. Here, the calculation of the differences between the feature points of the images and the calculation of the distances between the locations are taken as an example, s1 = |a1-b1|, s2 = |a1-c1|, and s3 = |a1-d1|. comparing s1, s2 and s3 to determine a minimum value as a position error; and assuming that d2 is the minimum value, updating the positions of the remaining image feature points in the feature point set by using a gradient descent algorithm with reference to d2, so that the image feature point C1 and the image feature point D1 shift towards the direction of the image feature point B1. Then, the iterative update is repeated with reference to the foregoing manner, and when the iteration stop condition is met, an updated feature point set is obtained. The iterated and updated feature point set is used to optimize the spatial feature point, which is configured to generate a feature map, thereby achieving the effect of improving the locating accuracy of the map.

[0030] In all embodiments, for each feature point set, the server may calculate an absolute difference between the feature descriptor corresponding to each remaining image feature point in the feature point set and a feature descriptor corresponding to the feature point set to obtain the difference corresponding to each remaining image feature point. In another embodiment, after calculating the absolute difference, the server may calculate a square of the absolute difference to obtain the difference corresponding to each remaining image feature point.

[0031] Step 208: Determine a position error of the feature point set based on the calculated difference, iteratively update the remaining image feature point in the feature point set based on the position error, and obtain an updated feature point set in a case that an iteration stop condition is satisfied.

[0032] For example, the iteration stop condition may be one of the following: the position error reaches a minimum value, a quantity of iterations reaches a preset quantity, or iteration duration reaches minimum duration.

[0033] Specifically, because in a case that image feature points are determined, each feature point set determines a space feature point, to improve accuracy of the determined space feature point, an overall position error of the feature point set needs to be reduced. Based on this, in this embodiment, for each feature point set, the server may collect a difference corresponding to each remaining image feature point in the feature point set, and determine the position error of the feature point set based on the collected difference, to iteratively update a position of each image feature point other than the representative feature point in a direction of minimizing the position error. Each update is equivalent to optimizing the position of the image feature point. A position error is recalculated based on a feature descriptor corresponding to an optimized image feature point, and next update is performed. The step is repeated to optimize the position of the image feature point a plurality of times. In a case that the iteration stop condition is satisfied, each updated image feature point and the representative feature point belonging to the same feature point set form an updated feature point set. In an update process, a gradient descent algorithm may be used for updating the position of the image feature point.

[0034] In all embodiments, to avoid degradation in an optimization process, the server may calculate a singular value of a Hessian matrix for the feature point set. If the maximum singular value divided by the minimum singular value is greater than a preset threshold, the update is not performed.

[0035] Step 210: Determine a space feature point corresponding to the updated feature point set based on a position in a corresponding image at which each image feature point in the updated feature point set is located, and generate a feature map based on the space feature point, the feature map being used for positioning a to-be-positioned moving device in the target scene.

[0036] The space feature point refers to a three-dimensional feature point, that is, a corresponding point of the feature point on the image in three-dimensional space. The feature map in this embodiment may be a data structure including a plurality of space feature points, and a specific form of the data structure is not limited. The to-be-positioned moving device refers to a moving device that needs to be positioned. The to-be-positioned moving device and a moving device that sends the plurality of image frames may be the same moving device, or may be different moving devices. A pose of the image to which the image feature point belongs refers to a pose of a camera during photographing the image frame. This pose may be obtained through posture transformation based on a pose of the moving device at a same time instant and a relative pose relationship between the camera and the moving device.

[0037] Specifically, for each updated feature point set, the server may perform triangulation calculation based on the position in a corresponding image at which each image feature point in the updated feature point set is located and the pose of the image, to obtain a space feature point corresponding to each feature point set. Further, the server may generate the feature map based on each space feature point and store the feature map. Then, in a subsequent positioning process, the feature map may be used to assist the to-be-positioned moving device in positioning. The triangulation calculation is an existing method for mapping a two-dimensional image feature point to a three-dimensional space feature point. Details are not described herein. It may be understood that a descriptor of the space feature point may be an average value of descriptors of all the image feature points that generate the space feature point.

[0038] In all embodiments, the server may specifically determine the pose of the image to which the image feature point belongs by using the following steps. First, a relative pose between the moving device and the camera is obtained. The relative pose usually remains unchanged in a moving process of the moving device and may be obtained through calibration. Then, a pose of the moving device at each moment is determined based on inertial measurement data and speed measurement data uploaded by the moving device. Next, the pose of the moving device at each moment is aligned with an acquisition moment of the plurality of image frames. The alignment here refers to determining a pose of the moving device corresponding to each image frame. A data acquisition moment corresponding to the pose (a moment when the inertial measurement data and the speed measurement data are acquired) is the same as an acquisition moment of the image frame (or the same within an error allowed range). Finally, a pose of the image frame may be obtained by performing posture transformation based on a pose of the moving device corresponding to each image frame and the relative pose between the moving device and the camera.

[0039] In the foregoing feature map generation method, a plurality of image frames photographed for a target scene are obtained, image feature points from each image frame are separately extracted, and corresponding feature descriptors are determined based on a position in a corresponding image at which the extracted image feature points are located. Image feature points with a matching relationship in the image feature points of each image frame are formed into a feature point set. A representative feature point is determined from the feature point set, and a difference between a feature descriptor corresponding to a remaining image feature point in the feature point set and a feature descriptor corresponding to the representative feature point is calculated. A position error of the feature point set is determined based on the calculated difference, the remaining image feature point in the feature point set is iteratively updated based on the position error, and an updated feature point set is obtained in a case that an iteration stop condition is satisfied. A space feature point corresponding to the updated feature point set is determined based on a position in a corresponding image at which each image feature point in the updated feature point set is located, and a feature map is generated based on the space feature point. Because in a process of generating the feature map, the position of the image feature point is optimized based on the feature descriptor of the image feature point, so that the generated feature map may be robust, to greatly improve positioning accuracy by using the feature map during positioning.

[0040] In all embodiments, the determining a position error of the feature point set based on the calculated difference includes: separately using each remaining image feature point in the feature point set as a target feature point, and separately calculating matching confidence between each target feature point and the representative feature point; calculating a position error corresponding to each target feature point based on the matching confidence and difference corresponding to each target feature point; and collecting the position error corresponding to each target feature point to obtain the position error of the feature point set.

[0041] The matching confidence between the target feature point and the representative feature point is used for representing a matching degree between the target feature point and the representative feature point. A higher matching degree indicates two more similar feature points.

[0042] Specifically, for each feature point set, the server may separately use each remaining image feature point in the feature point set as the target feature point. For each target feature point, the server may calculate the matching confidence between the target feature point and the representative feature point, then multiply the matching confidence and the difference to obtain the position error corresponding to the target feature point, and finally collect the corresponding position error of each target feature point to obtain the position error of the feature point set. The statistics collection may be one of summing, calculating an average value, or calculating a median.

[0043] In a specific embodiment, the server may calculate the position error of the feature point set by using the following Formula (1). j represents a j^th feature point set.

is a position error of the j^th feature point set. u and v represent the image feature points. i(u) represents a u^th image feature point of the i^th image frame. k(v) represents a v^th feature point of a k^th image frame. w_uv is matching confidence. p_u represents a position of an image feature point u on the image. p_v represents a position of an image feature point v on the image. F_i(u)[p_u] represents a descriptor of p_u. F_k(v)[p_v] represents a descriptor of p_v.

[0044] In this embodiment, the matching confidence between the image feature points is calculated, and the position error of each image feature point is obtained based on the matching confidence and the difference. The position error of each image feature point may be accurate. Finally, the position error of the feature point set obtained by collecting the position error of each image feature point in the feature point set is accurate, so that a feature map with high accuracy can be obtained to further improve positioning accuracy.

[0045] In all embodiments, the separately calculating matching confidence between each target feature point and the representative feature point includes: separately obtaining a feature descriptor of each target feature point, and obtaining a feature descriptor of the representative feature point; and separately calculating a vector similarity between the feature descriptor of each target feature point and the feature descriptor of the representative feature point, and using each vector similarity as matching confidence between each target feature point and the representative feature point.

[0046] The vector similarity is used for describing a similarity degree between two vectors. The feature descriptor is in the form of a vector, so that the vector similarity may be calculated. In all embodiments, the vector similarity may be, for example, a cosine similarity.

[0047] Specifically, the server may separately obtain a feature descriptor of each target feature point, and obtain a feature descriptor of the representative feature point; and separately calculate a vector similarity between the feature descriptor of each target feature point and the feature descriptor of the representative feature point, and use each vector similarity as matching confidence between each target feature point and the representative feature point. For example, assuming that a specific feature point set includes image feature points A1, B1, and C1, and C1 is a representative feature point, respective feature descriptors of A1, B1, and C1 are separately obtained. A vector similarity between the feature descriptor of the image feature point A1 and the representative feature point C1 is calculated as matching confidence between the image feature point A1 and the representative feature point C1. A vector similarity between the feature descriptor of the image feature point B1 and the representative feature point C1 is calculated as matching confidence between the image feature point B1 and the representative feature point C1.

[0048] In the foregoing embodiment, a vector similarity between feature descriptors is calculated as matching confidence. Because a feature descriptor describes an image feature point, the obtained matching confidence is accurate.

[0049] In all embodiments, the determining a representative feature point from the feature point set includes: calculating an average feature point position corresponding to the feature point set based on a position in a corresponding image at which each image feature point in the feature point set is located; and determining an image feature point of which a distance from the average feature point position satisfies a distance condition in the feature point set, and using the determined image feature point as the representative feature point.

[0050] The distance condition includes one of the following: a distance from the average feature point position is less than or equal to a distance threshold, or a sorting position is before a sorting threshold in a case that the image feature points are sorted in ascending order of distances from the average feature point position.

[0051] Specifically, for each feature point set, the server may obtain the position in a corresponding image at which each image feature point in the feature point set is located, add position numerical values in the same dimension and then average the position numerical values to obtain a target numerical value of the dimension. The determining a target numerical value of each dimension means that the average feature point position corresponding to the feature point set is determined. For example, assuming that a specific feature point set includes image feature points A1, B1, and C1, a position in an image at which A1 is located is (x1, y1), a position in an image at which B1 is located is (x2, y2), and a position in an image at which C1 is located is (x3, y3), an average feature point position corresponding to the feature point set is ((x1 + x2 + x3)/3, (y1 + y2 + y3)/3).

[0052] For each feature point set, after calculating the average feature point position corresponding to the feature point set, the server may calculate a distance between the position of each image feature point in the feature point set and the average feature point position. An image feature point that satisfies the distance condition is screened based on the calculated distance, and the screened image feature point is determined as the representative feature point.

[0053] In a specific embodiment, the distance condition includes that a distance from the average feature point position is less than or equal to a distance threshold. After the server calculates a distance between each image feature point and an average feature point position corresponding to a feature point set, the distance is separately compared with the distance threshold. In a case that a distance between only one image feature point and an average feature point position corresponding to a feature point set is less than the distance threshold, the image feature point is determined as the representative feature point. In a case that distances between a plurality of image feature points and an average feature point position corresponding to a feature point set are less than the distance threshold, one of these image feature points may be selected as a representative feature point. For example, an image feature point with a smallest distance may be selected as a representative feature point.

[0054] In another specific embodiment, the distance condition includes a sorting position is before a sorting threshold in a case that the image feature points are sorted in ascending order of distances from the average feature point position. After calculating a distance between each image feature point and an average feature point position corresponding to the feature point set, the server may sort image feature points in ascending order based on distances, and select an image feature point that a sorting position is before the sorting threshold as a representative feature point. For example, in a case that the sorting threshold is 2, the image feature point sorted first may be selected as a representative feature point.

[0055] In the foregoing embodiment, an average feature point position corresponding to the feature point set is calculated based on a position in a corresponding image at which each image feature point in the feature point set is located. An image feature point of which a distance from the average feature point position satisfies a distance condition in the feature point set is determined, and the determined image feature point is used as the representative feature point. The determined representative feature point may greatly reflect an overall position characteristic of the feature point set.

[0056] In all embodiments, there are a plurality of feature point sets, and the determining a representative feature point from the feature point set includes: filtering out the feature point set, for each feature point set, in a case that the feature point set satisfies a filtering condition; and performing the operation of determining a representative feature point from the feature point set in a case that the feature point set does not satisfy the filtering condition.

[0057] In this embodiment, the filtering condition includes at least one of the following: a distance between an initial space feature point calculated based on the feature point set and a photographing camera of the plurality of image frames being greater than a first preset distance threshold; a distance between an initial space feature point calculated based on the feature point set and a photographing device of the plurality of image frames being less than a second preset distance threshold, and the second preset distance threshold being less than the first preset distance threshold; disparity calculated based on the feature point set being greater than a preset disparity threshold; or an average reprojection error calculated based on the feature point set being greater than a preset error threshold.

[0058] The initial space feature point refers to a space feature point determined based on the position in a corresponding image at which each image feature point in an unupdated feature point set is located. Filtering out the feature point set is removing the feature point set from a plurality of feature point sets.

[0059] Specifically, for each feature point set in the plurality of feature point sets, the server may calculate the initial space feature point based on the targeted feature point set, and may calculate a distance between the initial space feature point and a photographing device of the plurality of image frames. In a case that the distance is greater than a first preset distance threshold, to be specific, in a case that the space feature point is far away from the photographing device, the feature point set is filtered out. In a case that the distance is less than a second preset distance threshold, to be specific, in a case that the space feature point is close to the photographing device, the feature point set is filtered out. The second preset distance threshold is less that the first preset distance threshold.

[0060] Further, for each remaining feature point set after filtering in the previous step, the server may further perform disparity calculation based on the feature point set. In a case that calculated disparity is greater than a preset disparity threshold, the feature point set is filtered out.

[0061] Further, for each remaining feature point set filtered out in the previous step, the server may alternatively project the initial space feature point calculated based on the feature point set onto the image to which each image feature point in the feature point set belongs, calculate a distance between each image feature point and the projection feature point projected onto a corresponding image to obtain each projection distance, and calculate an average value of the projection distances to obtain an average reprojection error. In a case that the average reprojection error is greater than the preset error threshold, the feature point set is filtered out.

[0062] It may be understood that in other some embodiments, the filtering conditions in a filtering process may alternatively be a part of the foregoing conditions, and a filtering order based on each filtering condition may not be limited to the foregoing order.

[0063] For unfiltered feature point sets, the server may perform the foregoing operation of "determining a representative feature point from the feature point set" to determine the representative feature point from each feature point set, so that image feature points of these feature point sets are performed position optimization by using the method provided in the foregoing embodiment, to obtain each updated feature point set. Finally, a space feature point corresponding to each updated feature point set is determined based on a position in a corresponding image at which each image feature point in each updated feature point set is located, to obtain a plurality of space feature points to generate a feature map.

[0064] In the foregoing embodiment, a feature point set that satisfies the filtering condition is filtered out by setting the filtering condition. In this way, robustness of the feature map is further improved, and positioning accuracy is further improved during the feature map is used to assist in positioning.

[0065] In all embodiments, as shown in FIG. 4, the generating a feature map based on the space feature point includes the following steps.

[0066] Step 402: Determine an average descriptor corresponding to the updated feature point set based on a feature descriptor of each image feature point in the updated feature point set.

[0067] Specifically, for each updated feature point set, the server may refer to the following Formula (2) to calculate an average descriptor corresponding to the feature point set.

[0068] u^j is the average descriptor. j represents a j^th feature point set (the updated feature point set). f is a descriptor of an image feature point in the j^th feature point set.

represents a feature descriptor set corresponding to the j^th feature point set. R^D represents D-dimensional real number space.

[0069] Step 404: Select a feature descriptor of which a similarity to the average descriptor satisfies a similarity condition from the feature descriptors of the image feature points in the updated feature point set, and use the selected feature descriptor as a reference descriptor.

[0070] The similarity condition may be one of the following: a similarity is greater than a preset similarity threshold, or a sorting position is before a sorting threshold in a case that the image feature points are sorted in descending order based on similarities.

[0071] In a specific embodiment, the similarity condition includes that a similarity is greater than the preset similarity threshold. For each updated feature point set, after calculating the average descriptor corresponding to the feature point set, the server separately calculates a similarity between the feature descriptor of each image feature point in the feature point set and the average descriptor, and compares each similarity with the preset similarity threshold. If a similarity corresponding to only one image feature point is greater than the preset similarity threshold, the feature descriptor of the image feature point is determined as the reference descriptor. If similarities corresponding to a plurality of image feature points are greater than the preset similarity threshold, one of feature descriptors corresponding to these image feature points may be selected as the reference descriptor. For example, a feature descriptor with a highest similarity may be selected as the reference descriptor.

[0072] In another specific embodiment, the distance condition includes a sorting position is before the sorting threshold in a case that the image feature points are sorted in descending order based on the similarity. For each updated feature point set, after calculating the similarity between the feature descriptor of each image feature point in the feature point set and the average descriptor, the server may sort feature descriptors of image feature points in descending order based on the similarity. The reference descriptor of which a sorting position is before the sorting threshold is selected from the feature descriptors. For example, in a case that the sorting threshold is 2, the feature descriptor sorted first may be selected as the reference descriptor.

[0073] In another specific embodiment, the server may calculate the reference descriptor with reference to the following Formula (3).

[0074] f^j is the reference descriptor. j represents a j^th feature point set (the updated feature point set). u^j is the average descriptor. f represents the feature descriptor of each image feature point in the j^th feature point set.

represents the feature descriptor set corresponding to the j^th feature point set.

[0075] Step 406: Project the space feature point onto an image to which each image feature point in the updated feature point set belongs to obtain a plurality of projection feature points, and determine a feature descriptor corresponding to each projection feature point based on a position in a corresponding image at which each projection feature point is located.

[0076] Step 408: Determine a reprojection error corresponding to each projection feature point based on a difference between the feature descriptor corresponding to the projection feature point and the reference descriptor.

[0077] Step 410: Collect a reprojection error corresponding to each projection feature point to obtain a target error, iteratively update the space feature point based on the target error, obtain a target space feature point corresponding to the updated feature point set in a case that the iteration stop condition is satisfied, and generate the feature map based on the target space feature point.

[0078] Specifically, for each updated feature point set, after determining a space feature point corresponding to each updated feature point set, the server may project the space feature point onto the image to which each image feature point in the feature point set belongs to obtain a plurality of projection feature points corresponding to the space feature point; may further determine a feature descriptor corresponding to each projection feature point based on a position in a corresponding image at which each projection feature point is located, then separately calculate a difference between each projection feature point and the reference descriptor corresponding to the updated feature point set calculated in step 404 to obtain a respective reprojection error corresponding to each projection feature point; finally collect the reprojection errors to obtain a target error corresponding to the updated feature point set, and iteratively update a space feature point corresponding to the updated feature point set in a direction of minimizing the target error. In other words, an updated space feature point is used as a current space feature point, and step 406 is performed again. In a case that step 406 to step 410 are performed iteratively until the iteration stop condition is satisfied, the obtained space feature point is the target space feature point, and the feature map may be generated based on the target space feature point. The iteration stop condition may be one of the following: the target error reaches a minimum value, a quantity of iterations reaches a preset quantity, or iteration duration reaches preset duration.

[0079] In a specific embodiment, during performing the foregoing step 406 to step 410, the server may calculate the target error with reference to Formula (4).

is the target error, j is a j^th feature point set (the updated feature point set). Z(j) represents a set of images to which each image feature point in the j^th feature point set belongs, i represents an i^th image frame. C_i represents a camera intrinsic parameter corresponding to the i^th image frame. P_j refers to a space feature point corresponding to the j^th feature point set. R_i is a rotation matrix corresponding to the i^th image frame, t_i is a translation matrix corresponding to the i^th image frame. f^j is a reference descriptor corresponding to the j^th feature point set.

[0080] In the foregoing embodiment, the reference descriptor is determined. The space feature point is projected onto the image to which each image feature point in the updated feature point set belongs to obtain the plurality of projection feature points, and a feature descriptor corresponding to each projection feature point is determined based on a position in a corresponding image at which the projection feature point is located. A reprojection error corresponding to each projection feature point is determined based on a difference between the feature descriptor corresponding to the projection feature point and the reference descriptor. A respective reprojection error corresponding to each projection feature point is collected to obtain a target error, the space feature point is iteratively updated based on the target error, and a target space feature point is obtained in a case that the iteration stop condition is satisfied. A position of the space feature point is optimized. The feature map for positioning generated based on an optimized target space feature point can further improve positioning accuracy.

[0081] In all embodiments, the plurality of image frames are photographed by a camera mounted on a target moving device, and the foregoing feature map generation method further includes: obtaining inertial measurement data and speed measurement data of the target moving device during photographing the plurality of image frames, and using the inertial measurement data and the speed measurement data to calculate an initial pose of the to-be-positioned moving device; and determining pre-integration information based on the inertial measurement data, constructing a factor graph based on the pre-integration information and the speed measurement data, and adjusting the initial pose based on the factor graph to obtain a target pose; and the generating a feature map based on the space feature point includes: establishing a correspondence relationship between the space feature point and the target pose, and generating the feature map based on the correspondence relationship and the space feature point.

[0082] In the embodiments of the present disclosure including the embodiments of both the claims and the specification (hereinafter referred to as "all embodiments of the present disclosure"), pre-integration information is preferably configured as an image representation which is used in stereoscopic vision techniques to precalculate disparities between pairs of pixels of an original image, the disparities being associated with disparities. By pre-integration, the disparity map can be quickly calculated, which is a two-dimensional image representing the depth of the scene.

[0083] In the embodiments of the present disclosure including the embodiments of both the claims and the specification (hereinafter referred to as "all embodiments of the present disclosure"), in computer vision technology, the factor graph is a graphical model used for representing and solving complex probability models and optimization problems. It decomposes the problem into a plurality of local variables and factors (functions), and the whole problem is described by the relationship between these local variables and factors. Some applications of factor graphs in computer vision include:

1 Visual SLAM (Simultaneous Localization and Mapping): In SLAM tasks, a factor map can be used to represent the position of the robot, the relationship between map feature points, and observed sensor data. By optimizing the factor map, positioning of the robot and construction of the map can be achieved.
2. Three-dimensional reconstruction: in a three-dimensional reconstruction task, a factor map can be used to represent information such as point cloud data, a camera parameter and feature matching in a scene. By optimizing the factor map, the structure and appearance of the three-dimensional scene can be restored.
3. Motion estimation: in a motion estimation task, a factor map may be used to represent information such as a motion trajectory of an object and a motion parameter of a camera. By optimizing the factor map, the motion state of the object and the motion trajectory of the camera can be estimated.

[0084] In all embodiments, the separately extracting image feature points from each image frame, and determining corresponding feature descriptors based on a position in a corresponding image at which the extracted image feature points are located includes: inputting the image into a trained feature extraction model, and outputting a first tensor corresponding to the image feature points and a second tensor corresponding to the feature descriptors by using the feature extraction model, the first tensor being used for describing a possibility of each feature point existing in each area of the image; performing non-maximum suppression processing on the image based on the first tensor to determine the image feature points of the image from the image; and converting the second tensor into a third tensor consistent with a size of the image, and determining a vector in the third tensor that matches a position in a corresponding image at which each image feature point is located as a descriptor corresponding to the image feature point.

[0085] Specifically, the server inputs the image into a trained feature extraction model, and outputs a first tensor corresponding to the image feature points and a second tensor corresponding to the feature descriptors by using the feature extraction model. The first tensor and the second tensor are both tensors with a plurality of channels, and a size of each channel is smaller than an original input image. A value of each position in the first tensor is used for describing a possibility of a feature point existing in each corresponding area in the original input image, that is, a probability value. For example, it is assumed that a size of an image inputted into the feature extraction model is H x W, an outputted first tensor may be H/N1 x W7N1 x X1, and a second tensor may be H/N2 x W/N2 x X2. N1, N2, X1, and X2 are all positive integers greater than one.

[0086] In all embodiments, in a case that non-maximum suppression processing is performed on the image based on the first tensor, the server may first convert the first tensor into a probability map with the same size as an input image. A local maximum value is searched in the probability map, and a position of the local maximum value is determined as a target position. Because the sizes of the probability map and the input image are consistent, a pixel point in the input image at the same position as the target position may be directly determined as an image feature point of the input image.

[0087] In another embodiment, it is considered that a process of converting the first tensor into the probability map with the same size as the input image is time-consuming. In a case that non-maximum suppression processing is performed on the image based on the first tensor, the server may implement the following steps.

1. Obtaining a maximum value of the first tensor at each position in a direction of the plurality of channels and a channel index corresponding to each maximum value, and respectively obtaining a third tensor and a fourth tensor.

[0088] Specifically, it is assumed that the first tensor includes N (where N is greater than or equal to 2) channels, the server may search for a maximum value in a direction of the N channels for each pixel position in the first tensor. The maximum value searched at each pixel position is used as a numerical value at a corresponding position in the third tensor, so that the third tensor may be obtained. In addition, a channel index of the maximum value searched at each pixel position is used as the numerical value at the corresponding position in the third tensor, so that the fourth tensor may be obtained.

[0089] 2. Determining a target numerical value from the third tensor, and searching for a neighborhood of a position of the target numerical value in the third tensor, the neighborhood of the position the target numerical value including a plurality of target positions, and an image distance between a corresponding position of the target position in the image and a corresponding position of the position of target numerical value in the image being less than a preset distance threshold.

[0090] Specifically, the server may sort numerical values in the third tensor in ascending order to obtain a numerical value set, and traverse the numerical values in the numerical value set in turn. For a traversed numerical value, it is determined whether the traversed numerical value is less than a preset threshold. If the traversed numerical value is less than the preset threshold, the next numerical value is continued to traverse, and if the traversed numerical value is greater than the preset threshold, the traversed numerical value is determined as the target numerical value, so that the neighborhood of the position of the target numerical value in the third tensor is searched. Because the size of the third tensor is reduced relative to the size of the original input image, and the image feature points refer to pixel point in the input image, the neighborhood at which the target numerical value is located needs to be determined based on a position in the original input image that corresponds to the pixel position of the target numerical value in the third tensor. To be specific, in a case that the neighborhood at which the target numerical value is located includes the plurality of target positions, an image distance between a corresponding position of each target position in the input image and a corresponding position of the target numerical value in the image is less than the preset distance threshold. In other words, the corresponding position of each target position in the input image falls within the neighborhood of a corresponding position in the image to which the position of the target numerical value corresponds. For example, as shown in FIG. 5, it is assumed that the position of the target numerical value is a point A, and a corresponding position of the point A in the input image is a point B. In a case that a dashed-line box in FIG. 5 represents a neighborhood of the point B, a corresponding position of each target position of the point A in the neighborhood of the third tensor in the input image falls within the dashed-line box.

[0091] In all embodiments, considering that features extracted by different channels in the first tensor are different, the position in the original image corresponding to the pixel position in the third tensor is related to a channel at which the pixel position is located. For a pixel position (i, j) in the third tensor, the index value is determined as D[i, j] at a corresponding position in the fourth tensor, and a corresponding position of the pixel position in the original image is (N x i + D[i, j]/8, N x j + D[i, j]%8). N is a reduction ratio of the third tensor relative to the original input image. For example, assuming that the original input image is 640 x 480, the first tensor is 80 x 60 x 64, the second tensor is 80 x 60 x 256, the third tensor is 80 x 60 (where each numerical value represents a maximum value of the first tensor in 64 dimensions, a decimal type), D is 80 x 60 (where each numerical value represents an index corresponding to a maximum value of the first tensor in 64 dimensions, an integer type), and the first tensor in 64 dimensions corresponds to every 8x8 area of the original image, a coordinate of the original image corresponding to a coordinate (32, 53, 35) of the first tensor is (32 x 8 + 35/8, 53 x 8 + 35%8) = (260, 427).

[0092] Therefore, a distance between corresponding positions of two pixel positions in the fourth tensor may be calculated as a distance between corresponding positions of two pixel positions in the original input image. For example, for a pixel position (i, j) and another pixel position (i + n, j + n) in the third tensor, a distance between corresponding positions of the two pixel positions in the original image may be obtained by calculating a distance between the pixel position (i, j) and the pixel position (i + n, j + n) in the fourth tensor.

[0093] 3. Determining a target pixel point corresponding to the position of the target numerical value in the image as an image feature point of the image in a case that a search result indicates that the target numerical value is greater than a numerical value corresponding to another position in the neighborhood.

[0094] The target pixel point is determined from the image based on the position of the target numerical value and a corresponding channel index value. The channel index value is determined from the fourth tensor based on the position of the target numerical value. For example, assuming that a coordinate of the pixel position at which a specific target numerical value in the third tensor is located is (i, j), a corresponding position of the pixel position in the fourth tensor is also (i, j). Assuming that a numerical value at the position in the fourth tensor is D[i, j], in a case that a search result indicates that a target numerical value is greater than a corresponding numerical values at another position in the neighborhood, the pixel point with a coordinate (N x i + D[i, j]/8, N x j + D[i, j]%8) in the original input image is determined as a target pixel point corresponding to a position of the target numerical value. N is a reduction ratio of the third tensor relative to the original input image.

[0095] In a specific embodiment, a specific structure of the feature extraction model in the foregoing embodiment may be shown in FIG. 6. A first convolutional block is a 3 x 3 full convolutional layer, stride = 1, and an output channel is 64. A first pooling block is a 3 x 3 largest pooling layer, stride = 1, and an output channel is 64. A second convolutional block is a 3 x 3 full convolutional layer, stride = 2, and an output channel is 64. A third convolutional block is a 3 x 3 full convolutional layer, stride = 1, and an output channel is 64. A fourth convolutional block is a 3 x 3 full convolutional layer, stride = 2, and an output channel is 64. A fifth convolutional block is a 3 x 3 full convolutional layer, stride = 1, and an output channel is 64. A sixth convolutional block is a 3 x 3 full convolutional layer, stride = 2, and an output channel is 128. A seventh convolutional block is a 3 x 3 full convolutional layer, stride = 1, and an output channel is 128. An eighth convolutional block is a 3 x 3 full convolutional layer, stride = 1, and an output channel is 128. A ninth convolutional block is a 3 x 3 full convolutional layer, stride = 1, and an output channel is 128. A tenth convolutional block is a 1 x 1 full convolutional layer, stride = 2, and an output channel is 64. An eleventh convolutional block is a 1 x 1 full convolutional layer, stride = 2, and an output channel is 64. A twelfth convolutional block is a 1 x 1 full convolutional layer, stride = 2, and an output channel is 128. A thirteenth convolutional block is a 1 x 1 full convolutional layer, stride = 2, and an output channel is 128. A fourteenth convolutional block is a 3 x 3 full convolutional layer, stride = 1, and an output channel is 128. A fifteenth convolutional block is a 3 x 3 full convolutional layer, stride = 1, and an output channel is 64. A sixteenth convolutional block is a 3 x 3 full convolutional layer, stride = 1, and an output channel is 128. A seventeenth convolutional block is a 3 x 3 full convolutional layer, stride = 1, and an output channel is 256.

[0096] Assuming that a dimension of an input image is H x W, output of the fifteenth convolutional block of the feature extraction model is a tensor A of the feature point, the dimension is H/8 x W/8 x 64, and a tensor B of a descriptor is outputted on the right, and the dimension is H/8 x W/8 x 256, specific steps of extracting a feature point and a descriptor are as follows.

1. Obtaining a maximum value and an index corresponding to the maximum value in a 64-channel dimension to obtain two tensors C and D with H/8 x W/8.
2. Arranging probability values in the tensor C in descending order as a set E, and setting a target set F for storing a subscript and confidence of a feature point.
3. Traversing the set E, and obtaining subscripts i and j of corresponding values in the tensor D.
4. Skipping traversing C[i, j] if C[i, j] is less than a specific threshold (for example, may be 0.05).
5. Traversing a neighborhood n of C[i, j].
6. Calculating a distance between D[i + n, j + n] (or D[i-n, j-n]) and D[i, j], that is, a distance between a coordinate (8 x (i + n) + D[i + n, j + n]/8, 8 x (j + n) + D[i + n, j + n]%8) on the original image and a coordinate (8 x i + D[i, j]/8, 8 xj + D[i, j]%8), and skipping this operation if the distance is greater than the specific threshold.
7. If C[i + n, j + n] (or C[i-n, j-n]) is greater than C[i, j], exiting the traversing in step 5, otherwise continuing to perform step 5.
8. If step 5 is performed and C[i, j] is greater than any C[i + n, j + n] (or C[i-n, j-n]), putting C[i, j] and (i x 8 + D[i, j]/8, j x 8 + D[i, j]%8) in the target set F.
9. Continuing to perform step 3.
10. Performing bilinear interpolation on the tensor B to obtain a tensor G, a dimension being H x W x 256, and performing L2 norm calculation in a channel direction.
11. Searching for a corresponding descriptor from the tensor G based on a result of the target set F. In other words, for a subscript of each image feature point in the target set F, a position with the same subscript is found from the tensor G. A vector including values of each channel at the position is used as a feature descriptor of the image feature point, and the feature descriptor is a 256-dimensional vector. For example, for a subscript (10, 13) of a specific image feature point in the target set F, a position corresponding to (10, 13) is found from the tensor G. A vector including values of each channel at the position is determined as a feature descriptor of the image feature point.

[0097] In the foregoing embodiment, because there is no need to convert the first tensor into a probability map with the same size as the input image, extraction efficiency of an image feature point is improved.

[0098] In all embodiments, the obtaining a plurality of image frames photographed for a target scene includes: obtaining a plurality of original image frames photographed for the target scene by a fisheye camera, and performing distortion correction on the plurality of original image frames to obtain the plurality of image frames photographed for the target scene.

[0099] In this embodiment, the server obtains the plurality of image frames photographed for the target scene by the fisheye camera. An imaging model of the fisheye camera is approximately a unit spherical projection model. Generally, an imaging process of the fisheye camera is divided into two steps. Firstly, a three-dimensional space point is linearly projected onto a virtual unit sphere. Then, a point on the unit sphere is projected onto an image plane. The process is nonlinear. Because a design of the fisheye camera introduces distortion, an image formed by using the fisheye camera has distortion. Because radial distortion is very serious, a distortion model of the fisheye camera mainly considers radial distortion. A projection function of the fisheye camera is designed to project a huge scene to a limited image plane as much as possible. A design model of the fisheye camera may be roughly divided into four types: an equidistant projection model, an isometric projection model, an orthogonal projection model, and a stereoscopic projection model based on different projection functions. In all embodiments of this application, any one of the four models may be used to perform distortion correction on the plurality of original image frames photographed by the fisheye camera to obtain the plurality of image frames photographed for the target scene.

[0100] In the foregoing embodiment, because the plurality of image frames are photographed by the fisheye camera, the fisheye camera has a wider viewing angle than a pinhole camera, may sense more environmental information, and extract more image feature points, further improving robustness of the generated feature map and further improving positioning accuracy.

[0101] In all embodiments, FIG. 7 is a schematic flowchart of determining positioning information by using a feature map generated in the embodiment of this application, including the following steps.

[0102] Step 702: Obtain inertial measurement data and speed measurement data of a to-be-positioned moving device, and a target image photographed by the moving device in the target scene, and use the inertial measurement data and the speed measurement data to determine an initial pose of the to-be-positioned moving device.

[0103] The inertial measurement data may be data measured by an inertial measurement unit (IMU). The speed measurement data may be data measured by a speed sensor. For example, in a case that the to-be-positioned moving device is a vehicle, the speed measurement data may be data measured by a wheel speedometer. The inertial measurement data and speed measurement data here are data measured when the to-be-positioned moving device moves in the target scene.

[0104] Specifically, the server may receive the inertial measurement data and the speed measurement data, and the target image photographed by the to-be-positioned moving device in the target scene that are sent by the to-be-positioned moving device, and calculate the initial pose of the to-be-positioned moving device by using the inertial measurement data and the speed measurement data based on a preset kinematic model. The preset kinematic model may reflect a relationship between a vehicle position, speed, acceleration, and the like, and time. A specific form of the model is not limited in this embodiment. In actual application, proper settings may be made according to a requirement. For example, an existing bicycle model may be improved to obtain a needed model.

[0105] Step 704: Determine, from the generated feature map based on the initial pose, a space feature point matching a position to obtain a target space feature point.

[0106] In all embodiments, the server may find, from the feature map, based on the position represented by the initial pose, a space feature point matching a position as the target space feature point. In another embodiment, the feature map further includes a corresponding pose storing each space feature point. The pose corresponding to the space feature point may be a pose of the moving device during photographing the plurality of image frames in a process of generating the feature map. Further, in a process of determining the positioning information, the server may compare the initial pose of the to-be-positioned moving device with the pose corresponding to each space feature point, and determine a space feature point corresponding to a pose with a highest matching degree as the target feature point.

[0107] Step 706: Determine an image feature point matching the target space feature point from the target image, form the determined image feature point and the target space feature point into a matching pair, and determine positioning information of the moving device based on the matching pair.

[0108] Specifically, the server may compare the descriptor corresponding to the target space feature point with the feature descriptor corresponding to each image feature point on the target image, determine an image feature point corresponding to the feature descriptor with a highest similarity as an image feature point matching the target space feature point, and form the determined image feature point and the target space feature point into a matching pair, and may further determine positioning information of the moving device based on the matching pair. The descriptor corresponding to the target space feature point may be an average value of feature descriptors of image feature points in the feature point set corresponding to the target space feature point.

[0109] In all embodiments, a PnP algorithm may be used to determine the positioning information based on the matching pair. The PnP algorithm is an existing method. Details are not be described herein. In another embodiment, the determining positioning information based on the matching pair specifically includes: projecting the space feature point in the matching pair onto the target image to obtain a projection feature point; calculating a reprojection error based on the projection feature point and the image feature point in the matching pair; and determining a pose corresponding to a minimum value of a least square function of the reprojection error as a corrected pose, and correcting the initial pose by using the corrected pose to obtain the positioning information. Further, the server may return the positioning information to the to-be-positioned moving device.

[0110] In the foregoing embodiment, because during generating the feature map, the position of the image feature point is optimized based on the feature descriptor of the image feature point, and the generated feature map is robust, so that positioning accuracy is greatly improved by using the feature map during positioning.

[0111] In a specific embodiment, the feature map generation method of this application may be applied to a parking application scenario, and specifically includes the following steps.

I. A server generates a feature map.

[0112]

1. Obtaining a plurality of original image frames photographed for the target scene by a fisheye camera, and performing distortion correction on the plurality of original image frames to obtain the plurality of image frames photographed for the target scene.

[0113] Specifically, a target vehicle equipped with the fisheye camera may run in a garage. An environment in the garage is photographed by the fisheye camera to obtain a plurality of original image frames, and the plurality of original image frames are sent to the server. The server performs distortion correction on the plurality of original image frames to obtain a plurality of image frames photographed for the target scene.

[0114] The target vehicle here and a to-be-parked vehicle may be the same vehicle or different vehicles.

[0115] 2. Separately extracting image feature points from each image frame, and determining corresponding feature descriptors based on a position in a corresponding image at which the extracted image feature points are located.

[0116] Specifically, for each image frame, the server may input the image into a trained feature extraction model, and output a first tensor corresponding to the image feature points and a second tensor corresponding to the feature descriptors by using the feature extraction model, the first tensor being used for describing a possibility of each feature point existing in each area of the image; perform non-maximum suppression processing on the image based on the first tensor to determine the image feature points of the image from the image; and convert the second tensor into a third tensor consistent with a size of the image, and determine a vector in the third tensor that matches a position in a corresponding image at which each image feature point is located as a descriptor corresponding to the image feature point.

[0117] The first tensor includes a plurality of channels, and the performing non-maximum suppression processing on the image based on the first tensor to determine the image feature points of the image from the image includes: obtaining, in a direction of the plurality of channels, a maximum value at each position in the first tensor and a channel index corresponding to each maximum value to separately obtain the third tensor and a fourth tensor; determining a target numerical value from the third tensor, and searching for a neighborhood of a position of the target numerical value in the third tensor, the neighborhood of the position the target numerical value including a plurality of target positions, and an image distance between a position in the image corresponding to each target position and a position in the image corresponding to the position of the target numerical value being less than a preset distance threshold; and determining a target pixel point in the image corresponding to the position of the target numerical value as each image feature point of the image in a case that a search result indicates that the target numerical value is greater than a numerical value corresponding to another position in the neighborhood. The target pixel point is determined from the image based on the position of the target numerical value and a corresponding channel index value. The channel index value is determined from the fourth tensor based on the position of the target numerical value.

[0118] 3. Forming image feature points with a matching relationship in the image feature points of each image frame into a feature point set.

[0119] 4. Filtering out the feature point set, for each feature point set, in a case that the feature point set satisfies a filtering condition; performing step 5 in a case that the feature point set does not satisfy the filtering condition. The filtering condition includes at least one of the following: a distance between an initial space feature point calculated based on the feature point set and a photographing device of the plurality of image frames being greater than a first preset distance threshold; a distance between an initial space feature point calculated based on the feature point set and a photographing device of the plurality of image frames being less than a second preset distance threshold, and the second preset distance threshold being less than the first preset distance threshold; disparity calculated based on the feature point set being greater than a preset disparity threshold; or an average reprojection error calculated based on the feature point set being greater than a preset error threshold.

[0120] 5. Determining a representative feature point from the feature point set, and calculating a difference between a feature descriptor corresponding to a remaining image feature point in the feature point set and a feature descriptor corresponding to the representative feature point.

[0121] Specifically, the server determines a representative feature point from the feature point set by using the following steps: calculating an average feature point position corresponding to the feature point set based on a position in a corresponding image at which each image feature point in the feature point set is located; and determining an image feature point of which a distance from the average feature point position satisfies a distance condition in the feature point set, and using the determined image feature point as the representative feature point. The distance condition includes one of the following: a distance from the average feature point position is less than or equal to a distance threshold, or a sorting position is before a sorting threshold in a case that the image feature points are sorted in ascending order of distances from the average feature point position.

[0122] 6. Separately using each remaining image feature point in the feature point set as a target feature point, and separately calculating matching confidence between each target feature point and the representative feature point; calculating a position error corresponding to each target feature point based on the matching confidence and difference corresponding to each target feature point; and collecting the position error corresponding to each target feature point to obtain the position error of the feature point set.

[0123] The separately calculating matching confidence between each target feature point and the representative feature point includes: separately obtaining a feature descriptor of each target feature point, and obtaining a feature descriptor of the representative feature point; and separately calculating a vector similarity between the feature descriptor of each target feature point and the feature descriptor of the representative feature point, and using each vector similarity as matching confidence between each target feature point and the representative feature point.

[0124] 7. Iteratively updating the remaining image feature point in the feature point set based on the position error, and obtaining an updated feature point set in a case that an iteration stop condition is satisfied.

[0125] Specifically, the server may update a position of the remaining image feature point in the feature point set by using a gradient descent algorithm in a direction of minimizing the position error, determine a descriptor corresponding to the obtained image feature point from the third tensor, then recalculate the position error, and repeat the process continuously until the iteration stop condition is satisfied.

[0126] By using the foregoing steps, a plurality of updated feature point sets may be obtained. The server may determine whether there is a feature point set satisfying the filtering condition in these feature point sets again, filter out the feature point set satisfying the filtering condition, and continue to perform the subsequent steps for a remaining feature point set after filtering out. Forthe filtering condition, refer to the description in the foregoing embodiment.

[0127] 8. Determining a space feature point corresponding to the updated feature point set based on a position in a corresponding image at which each image feature point in the updated feature point set is located, so that a plurality of space feature points may be obtained.

[0128] 9. Optimizing a position of each space feature point, specifically including the following steps.

[0129] 9.1 For each space feature point, determining an average descriptor corresponding to the updated feature point set based on a feature descriptor of each image feature point in the updated feature point set corresponding to the space feature point.

[0130] 9.2 Selecting a feature descriptor of which a similarity to the average descriptor satisfies a similarity condition from the feature descriptors of the image feature points in the updated feature point set, and using the selected feature descriptor as a reference descriptor.

[0131] 9.3 Projecting the space feature point onto an image to which each image feature point in the updated feature point set belongs to obtain a plurality of projection feature points, and determining a feature descriptor corresponding to each projection feature point based on a position in a corresponding image at which each projection feature point is located.

[0132] 9.4 Determining a reprojection error corresponding to each projection feature point based on a difference between the feature descriptor corresponding to the projection feature point and the reference descriptor.

[0133] 9.5. Collecting a respective reprojection error corresponding to each projection feature point to obtain a target error, iteratively updating the space feature point based on the target error, and obtaining a target space feature point in a case that the iteration stop condition is satisfied. The target space feature point is a space feature point after position optimization.

[0134] 10. Generating a feature map based on each optimized target space feature point, and storing the feature map.

II Parking based on the feature map

[0135]

1. Downloading the feature map from the server in a case that a to-be-parked vehicle enters a garage entrance. A user may input a to-be-parked target parking position, so that the vehicle may plan a parking route from the garage entrance to the target parking position for the user based on the feature map.
2. The vehicle automatically runs based on the planned parking route. During running, the vehicle is positioned in the following ways:
2.1 Obtaining current inertial measurement data by an IMU, obtaining current speed measurement data by a wheel speed sensor, and obtaining a current target image photographed by a camera mounted on the vehicle.

2.2 Using the inertial measurement data and the speed measurement data to determine a current initial pose.

2.3 Determining, from the stored feature map based on the current initial pose, a space feature point matching a position, to obtain a target space feature point.

2.4 Determining an image feature point matching the target space feature point from the target image, forming the determined image feature point and the target space feature point into a matching pair, and determining a current position based on the matching pair.
3. Automatically running into the target parking position to complete parking in a case that the current position is the target parking position.

[0136] In another specific embodiment, the feature map generation method of this application may be applied to an application scenario of vacuum cleaning robot automatic cleaning. In this application scenario, a vacuum cleaning robot first walks in a to-be-cleaned area, collects a plurality of image frames in the area, and generates a feature map based on the feature map generation method provided in the embodiment of this application. Further, in subsequent automatic cleaning processes, a cleaning route may be planned by using the feature map, and in an automatic cleaning process, automatic positioning is performed based on the feature map, to perform a cleaning task based on the planned cleaning route.

[0137] It is to be understood that although various steps in flowcharts according to each embodiment are displayed in sequence based on indication of arrows, the steps are not necessarily performed in sequence based on a sequence indicated by the arrows. Unless otherwise explicitly specified in this application, execution of the steps is not strictly limited, and the steps may be performed in other sequences. In addition, at least some of the steps in the flowcharts according to each embodiment may include a plurality of steps or a plurality of stages. These steps or stages are not necessarily performed at a same time instant, but may be performed at different time instants. These steps or stages are not necessarily performed in sequence, and the steps or stages may be performed in turn or alternately with other steps or at least some steps or stages of other steps.

[0138] Based on the same inventive concept, an embodiment of this application further provides a feature map generation apparatus for implementing the foregoing feature map generation method. The solutions to the problem provided by the apparatus are similar to the solutions described in the foregoing method. Therefore, for a specific limitation of one or more feature map generation apparatus embodiments and positioning information determining apparatus provided below, refer to a limitation of the feature map generation method above. Details are not described again herein.

[0139] In all embodiments, as shown in FIG. 8, a feature map generation apparatus 800 is provided, and includes:

a feature extraction module 802, configured to obtain a plurality of image frames photographed for a target scene, separately extracting image feature points from each image frame, and determine corresponding feature descriptors based on a position in a corresponding image at which the extracted image feature points are located;

a feature point set determining module 804, configured to form image feature points with a matching relationship in the image feature points of each image frame into a feature point set;

a difference calculation module 806, configured to determine a representative feature point from the feature point set, and calculate a difference between a feature descriptor corresponding to a remaining image feature point in the feature point set and a feature descriptor corresponding to the representative feature point;

a position update module 808, configured to determine a position error of the feature point set based on the calculated difference, iteratively update the remaining image feature point in the feature point set based on the position error, and obtain an updated feature point set in a case that an iteration stop condition is satisfied;

a feature map generation module 810, configured to determine a space feature point corresponding to the updated feature point set based on a position in a corresponding image at which each image feature point in the updated feature point set is located, and generate a feature map based on the space feature point, the feature map being used for positioning a to-be-positioned moving device in the target scene.

[0140] In the foregoing feature map generation apparatus, a plurality of image frames photographed for a target scene are obtained, image feature points from each image frame are separately extracted, and corresponding feature descriptors based on a position in a corresponding image at which the extracted image feature points are located are determined. Image feature points with a matching relationship in the image feature points of each image frame are formed into a feature point set. A representative feature point is determined from the feature point set, and a difference between a feature descriptor corresponding to a remaining image feature point in the feature point set and a feature descriptor corresponding to the representative feature point is calculated. A position error of the feature point set is determined based on the calculated difference, the remaining image feature point in the feature point set is iteratively updated based on the position error, and an updated feature point set is obtained in a case that an iteration stop condition is satisfied. A space feature point corresponding to the updated feature point set is determined based on a position in a corresponding image at which each image feature point in the updated feature point set is located, and a feature map is generated based on the space feature point. Because in a process of generating the feature map, the position of the image feature point is optimized based on the feature descriptor of the image feature point, so that the generated feature map may be robust, to greatly improve positioning accuracy by using the feature map during positioning.

[0141] In all embodiments, the position update module 808 is configured to separately use each remaining image feature point in the feature point set as a target feature point, and separately calculate matching confidence between each target feature point and the representative feature point; calculate a position error corresponding to each target feature point based on the matching confidence and difference corresponding to each target feature point; and collect the position error corresponding to each target feature point to obtain the position error of the feature point set.

[0142] In all embodiments, the position update module 808 is further configured to separately obtain a feature descriptor of each target feature point, and obtain a feature descriptor of the representative feature point; and separately calculate a vector similarity between the feature descriptor of each target feature point and the feature descriptor of the representative feature point, and use each vector similarity as matching confidence between each target feature point and the representative feature point.

[0143] In all embodiments, the difference calculation module 806 is further configured to calculate an average feature point position corresponding to the feature point set based on a position in a corresponding image at which each image feature point in the feature point set is located; and determine an image feature point of which a distance from the average feature point position satisfies a distance condition in the feature point set, and use the determined image feature point as the representative feature point. The distance condition includes one of the following: a distance from the average feature point position is less than or equal to a distance threshold, or a sorting position is before a sorting threshold in a case that the image feature points are sorted in ascending order of distances from the average feature point position.

[0144] In all embodiments, there are a plurality of feature point sets, and the difference calculation module 806 is further configured to filter out the feature point set, for each feature point set, in a case that the feature point set satisfies a filtering condition; and perform the operation of determining a representative feature point from the feature point set in a case that the feature point set does not satisfy the filtering condition. The filtering condition includes at least one of the following: a distance between an initial space feature point calculated based on the feature point set and a photographing device of the plurality of image frames being greater than a first preset distance threshold; a distance between an initial space feature point calculated based on the feature point set and a photographing device of the plurality of image frames being less than a second preset distance threshold, and the second preset distance threshold being less than the first preset distance threshold; disparity calculated based on the feature point set being greater than a preset disparity threshold; or an average reprojection error calculated based on the feature point set being greater than a preset error threshold.

[0145] In all embodiments, the feature map generation module is further configured to: determine an average descriptor corresponding to the updated feature point set based on a feature descriptor of each image feature point in the updated feature point set; select a feature descriptor of which a similarity to the average descriptor satisfies a similarity condition from the feature descriptors of the image feature points in the updated feature point set, and use the selected feature descriptor as a reference descriptor; project the space feature point onto a corresponding image to which each image feature point in the updated feature point set belongs to obtain a plurality of projection feature points, and determine a feature descriptor corresponding to each projection feature point based on a position in a corresponding image at which each projection feature point is located; determine a reprojection error corresponding to each projection feature point based on a difference between the feature descriptor corresponding to the projection feature point and the reference descriptor; and collect a reprojection error corresponding to each projection feature point to obtain a target error, iteratively update the space feature point based on the target error, obtain a target space feature point corresponding to the updated feature point set in a case that the iteration stop condition is satisfied, and generate the feature map based on the target space feature point.

[0146] In all embodiments, the feature extraction module is further configured to input the image into a trained feature extraction model, and output a first tensor corresponding to the image feature points and a second tensor corresponding to the feature descriptors by using the feature extraction model, the first tensor being used for describing a possibility of each feature point existing in each area of the image; perform non-maximum suppression processing on the image based on the first tensor to determine the image feature points of the image from the image; and convert the second tensor into a third tensor consistent with a size of the image, and determine a vector in the third tensor that matches a position in a corresponding image at which each image feature point is located as a descriptor corresponding to the image feature point.

[0147] In all embodiments, the first tensor includes a plurality of channels, and the feature extraction module is further configured to: obtain, in a direction of the plurality of channels, a maximum value at each position in the first tensor and a channel index corresponding to each maximum value to separately obtain the third tensor and a fourth tensor; determine a target numerical value from the third tensor, and search for a neighborhood of a position of the target numerical value in the third tensor, the neighborhood of the position the target numerical value including a plurality of target positions, and an image distance between a position in the image corresponding to each target position and a position in the image corresponding to the position of the target numerical value being less than a preset distance threshold; and determine a target pixel point in the image corresponding to the position of the target numerical value as each image feature point of the image in a case that a search result indicates that the target numerical value is greater than a numerical value corresponding to another position in the neighborhood, the target pixel point being determined from the image based on the position of the target numerical value and a corresponding channel index value, and the channel index value being determined from the fourth tensor based on the position of the target numerical value.

[0148] In all embodiments, the feature extraction module is further configured to: obtain a plurality of original image frames photographed for the target scene by a fisheye camera, and perform distortion correction on the plurality of original image frames to obtain the plurality of image frames photographed for the target scene.

[0149] In all embodiments, the foregoing apparatus further includes a positioning information determining module, configured to initially obtain inertial measurement data and speed measurement data of the to-be-positioned moving device, and a target image photographed by the moving device in the target scene, and use the inertial measurement data and the speed measurement data to determine an initial pose of the to-be-positioned moving device; determine, from the generated feature map based on the initial pose, a space feature point matching a position, to obtain a target space feature point; determine an image feature point matching the target space feature point from the target image, form the determined image feature point and the target space feature point into a matching pair, and determine positioning information of the moving device based on the matching pair.

[0150] All or some of the modules in the foregoing feature map generation apparatus may be implemented by software, hardware, and a combination thereof. The modules may be embedded in or independent of a processor in a computer device in the form of hardware, and may alternatively be stored in a memory in the computer device in the form of software, so that the processor may call and perform operations corresponding to each module.

[0151] In all embodiments, a computer device is provided. The computer device may be a server, and an internal structure diagram of the computer device may be shown in FIG. 9. The computer device includes a processor, a memory, an input/output interface (I/O for short), and a communication interface. The processor and the memory are connected to the input/output interface via a system bus. The communication interface is connected to the system bus via the input/output interface The processor of the computer device is configured to provide a computation and control capability. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for running of the operating system and the computer program in the non-volatile storage medium. The database of the computer device is used for storing feature map data. The input/output interface of the computer device is configured to exchange information between the processor and an external device. The communication interface of the computer device is configured to communicate with an external terminal. The computer program, when executed by the processor, implements a feature map generation method.

[0152] In all embodiments, a computer device is provided. The computer device may be a terminal mounted in the foregoing moving device, for example, may be anon-board terminal, and an internal structure diagram of the computer device may be shown in FIG. 10. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input apparatus. The processor and the memory are connected to the input/output interface via a system bus. The communication interface, the display unit, and the input apparatus are connected to the system bus via the input/output interface. The processor of the computer device is configured to provide a computation and control capability. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for running of the operating system and the computer program in the non-volatile storage medium. The input/output interface of the computer device is configured to exchange information between the processor and an external device. The communication interface of the computer device is configured to communicate with an external terminal in a wired or wireless manner. A wireless manner may be implemented by Wi-Fi, a mobile cellular network, near field communication (NFC), or another technology. The computer program, when executed by the processor, implements a feature map generation method. The display unit of the computer device is configured to form a visually visible picture, and may be a display, a projection apparatus, or a virtual reality imaging apparatus. The display may be a liquid crystal display or an e-ink display. An input apparatus of the computer device may be a touch layer covering the display, may be a button, a trackball, or a touchpad disposed on a housing of the computer device, or may be an external keyboard, touchpad, mouse, or the like.

[0153] A person skilled in the art may understand that the structure shown in FIG. 9 and FIG. 10 is merely a block diagram of a partial structure related to a solution in this application, and does not constitute a limitation to the computer device to which the solution in this application is applied. Specifically, the computer device may include more or fewer components than those shown in the figure, or some components may be combined, or a different component deployment may be used.

[0154] In all embodiments, a computer device is provided, including a memory and a processor, the memory having computer-readable instructions stored therein, and the computer-readable instructions, when executed by the processor, implementing operations of the foregoing feature map generation method.

[0155] In all embodiments, a computer-readable storage medium is provided, having computer-readable instructions stored thereon, and the computer-readable instructions, when executed by a processor, implementing operations of the foregoing feature map generation method.

[0156] In all embodiments, a computer program product is provided, including computer-readable instructions, and the computer-readable instructions, when executed by a processor, implementing operations of the foregoing feature map generation method.

[0157] User information (including but not limited to user device information, user personal information, and the like) and data (including but not limited to data used for analysis, stored data, displayed data, and the like), included in this application are information and data that all authorized by a user or fully authorized by all parties. Collection, use, and processing of related data need to comply with relevant laws, regulations, and standards of relevant countries and regions.

[0158] A person of ordinary skill in the art may understand that all or some of procedures of the method in the foregoing embodiments may be implemented by computer-readable instructions instructing relevant hardware. The computer-readable instructions may be stored in a non-volatile computer-readable storage medium. When the computer-readable instructions are executed, the procedures of the method embodiments may be implemented. References to the memory, the database, or another medium used in the embodiments provided in this application may all include at least one of a non-volatile or a volatile memory. The non-volatile memory may include a read-only memory (ROM), a magnetic tape, a floppy disk, a flash memory, an optical memory, a high-density embedded non-volatile memory, a resistive random access memory (ReRAM), a magneto-resistive random access memory (MRAM), a ferroelectric random access memory (FRAM), a phase change memory (PCM), a graphene memory, and the like. The volatile memory may include a random access memory (RAM), an external cache, or the like. As description and not a limitation, the RAM may be in various forms, such as a static random access memory (SRAM) or a dynamic random access memory (DRAM). The database involved in various embodiments provided in this application may include at least one of a relational database and a non-relational database. The non-relational database may include a blockchain-based distributed database and the like, which is not limited herein. The processor involved in various embodiments provided in this application may be a general processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic, a data processing logic based on quantum computing, or the like, which is not limited herein.

[0159] Technical features of the foregoing embodiments may be randomly combined. To make description concise, not all possible combinations of the technical features in the foregoing embodiments are described. However, the combinations of these technical features shall be considered as falling within the scope recorded by this specification provided that no conflict exists.

[0160] The foregoing embodiments merely express several implementations of this application. The descriptions thereof are relatively specific and detailed, but should not be understood as limitations to the scope of this application. For a person of ordinary skill in the art, several transformations and improvements can be made without departing from the idea of this application. These transformations and improvements belong to the protection scope of this application. Therefore, the protection scope of the patent of this application shall be subject to the appended claims.

Claims

1. A feature map generation method, executable by a computer device, and the method comprising:

obtaining a plurality of image frames photographed for a target scene, separately extracting image feature points from each image frame, and determining corresponding feature descriptors based on a position in a corresponding image at which the extracted image feature points are located;

forming image feature points with a matching relationship in the image feature points of each image frame into a feature point set;

determining a representative feature point from the feature point set, and calculating a difference between a feature descriptor corresponding to a remaining image feature point in the feature point set and a feature descriptor corresponding to the representative feature point;

determining a position error of the feature point set based on the calculated difference, iteratively updating the remaining image feature point in the feature point set based on the position error, and obtaining an updated feature point set in a case that an iteration stop condition is satisfied; and

determining a space feature point corresponding to the updated feature point set based on a position in a corresponding image at which each image feature point in the updated feature point set is located, and generating a feature map based on the space feature point, the feature map being used for positioning a to-be-positioned moving device in the target scene.

2. The method according to claim 1, wherein the determining a position error of the feature point set based on the calculated difference comprises:

separately using each remaining image feature point in the feature point set as a target feature point, and separately calculating matching confidence between each target feature point and the representative feature point;

calculating a position error corresponding to each target feature point based on the matching confidence and difference corresponding to each target feature point; and

collecting the position error corresponding to each target feature point to obtain the position error of the feature point set.

3. The method according to claim 2, wherein the separately calculating matching confidence between each target feature point and the representative feature point comprises:

separately obtaining a feature descriptor of each target feature point, and obtaining a feature descriptor of the representative feature point; and

separately calculating a vector similarity between the feature descriptor of each target feature point and the feature descriptor of the representative feature point, and using each vector similarity as matching confidence between each target feature point and the representative feature point.

4. The method according to claim 1, wherein the determining a representative feature point from the feature point set comprises:

calculating an average feature point position corresponding to the feature point set based on a position in a corresponding image at which each image feature point in the feature point set is located; and

determining an image feature point of which a distance from the average feature point position satisfies a distance condition in the feature point set, and using the determined image feature point as the representative feature point,

the distance condition comprising one of the following: a distance from the average feature point position being less than or equal to a distance threshold, or a sorting position being before a sorting threshold in a case that the image feature points are sorted in ascending order of distances from the average feature point position.

5. The method according to any one of claims 1 to 4, wherein there are a plurality of feature point sets, and the determining a representative feature point from the feature point set comprises:

filtering out the feature point set, for each feature point set, in a case that the feature point set satisfies a filtering condition,

the filtering condition comprising at least one of the following:

a distance between an initial space feature point calculated based on the feature point set and a photographing device of the plurality of image frames being greater than a first preset distance threshold;

a distance between an initial space feature point calculated based on the feature point set and a photographing device of the plurality of image frames being less than a second preset distance threshold, and the second preset distance threshold being less than the first preset distance threshold;

disparity calculated based on the feature point set being greater than a preset disparity threshold; or

an average reprojection error calculated based on the feature point set being greater than a preset error threshold.

6. The method according to claim 5, the method further comprising:
performing the operation of determining a representative feature point from the feature point set in a case that the feature point set does not satisfy the filtering condition.

7. The method according to claim 1, wherein the generating a feature map based on the space feature point comprises:

determining an average descriptor corresponding to the updated feature point set based on a feature descriptor of each image feature point in the updated feature point set;

selecting a feature descriptor of which a similarity to the average descriptor satisfies a similarity condition from the feature descriptors of the image feature points in the updated feature point set, and using the selected feature descriptor as a reference descriptor;

projecting the space feature point onto an image to which each image feature point in the updated feature point set belongs to obtain a plurality of projection feature points, and determining a feature descriptor corresponding to each projection feature point based on a position in a corresponding image at which each projection feature point is located;

determining a reprojection error corresponding to each projection feature point based on a difference between the feature descriptor corresponding to the projection feature point and the reference descriptor; and

collecting a reprojection error corresponding to each projection feature point to obtain a target error, iteratively updating the space feature point based on the target error, obtaining a target space feature point corresponding to the updated feature point set in a case that the iteration stop condition is satisfied, and generating the feature map based on the target space feature point.

8. The method according to claim 1, wherein the separately extracting image feature points from each image frame, and determining corresponding feature descriptors based on a position in a corresponding image at which the extracted image feature points are located comprises:

inputting the image into a trained feature extraction model, and outputting a first tensor corresponding to the image feature points and a second tensor corresponding to the feature descriptors by using the feature extraction model, the first tensor being used for describing a possibility of each feature point existing in each area of the image;

performing non-maximum suppression processing on the image based on the first tensor to determine the image feature points of the image from the image; and

converting the second tensor into a third tensor consistent with a size of the image, and determining a vector in the third tensor that matches a position in a corresponding image at which each image feature point is located as a descriptor corresponding to the image feature point.

9. The method according to claim 8, wherein the first tensor comprises a plurality of channels, and the performing non-maximum suppression processing on the image based on the first tensor to determine the image feature points of the image from the image comprises:

obtaining, in a direction of the plurality of channels, a maximum value at each position in the first tensor and a channel index corresponding to each maximum value to separately obtain the third tensor and a fourth tensor;

determining a target numerical value from the third tensor, and searching for a neighborhood of a position of the target numerical value in the third tensor, the neighborhood of the position the target numerical value comprising a plurality of target positions, and an image distance between a position in the image corresponding to each target position and a position in the image corresponding to the position of the target numerical value being less than a preset distance threshold; and

determining a target pixel point in the image corresponding to the position of the target numerical value as each image feature point of the image in a case that a search result indicates that the target numerical value is greater than a numerical value corresponding to another position in the neighborhood, and

the target pixel point being determined from the image based on the position of the target numerical value and a corresponding channel index value, and the channel index value being determined from the fourth tensor based on the position of the target numerical value.

10. The method according to any one of claims 1 to 9, wherein the obtaining a plurality of image frames photographed for a target scene comprises:
obtaining a plurality of original image frames photographed for the target scene by a fisheye camera, and performing distortion correction on the plurality of original image frames to obtain the plurality of image frames photographed for the target scene.

11. The method according to any one of claims 1 to 9, wherein the plurality of image frames are photographed by a camera mounted on a target moving device, and the method further comprises:

obtaining inertial measurement data and speed measurement data of the target moving device during photographing the plurality of image frames, and using the inertial measurement data and the speed measurement data to calculate an initial pose of the to-be-positioned moving device; and

determining pre-integration information based on the inertial measurement data, constructing a factor graph based on the pre-integration information and the speed measurement data, and adjusting the initial pose based on the factor graph to obtain a target pose; and

the generating a feature map based on the space feature point comprises:
establishing a correspondence relationship between the space feature point and the target pose, and generating the feature map based on the correspondence relationship and the space feature point.

12. The method according to any one of claims 1 to 9, the method comprising:

obtaining inertial measurement data and speed measurement data of the to-be-positioned moving device, and a target image photographed by the moving device in the target scene, and using the inertial measurement data and the speed measurement data to determine an initial pose of the to-be-positioned moving device;

determining, from the feature map, based on the initial pose, a space feature point matching a position to obtain a target space feature point; and

determining an image feature point matching the target space feature point from the target image, forming the determined image feature point and the target space feature point into a matching pair, and determining positioning information of the moving device based on the matching pair.

13. The method according to claim 12, wherein the determining positioning information of the moving device based on the matching pair comprises:

projecting the space feature point in the matching pair onto the target image to obtain a projection feature point;

calculating a reprojection error based on the projection feature point and the image feature point in the matching pair; and

determining a pose corresponding to a minimum value of a least square function of the reprojection error as a corrected pose, and correcting the initial pose by using the corrected pose to obtain the positioning information.

14. The method according to claim 12, wherein the to-be-positioned moving device comprises a to-be-parked vehicle or a vacuum cleaning robot.

15. A feature map generation apparatus, the apparatus comprising:

a feature extraction module, configured to obtain a plurality of image frames photographed for a target scene, separately extracting image feature points from each image frame, and determine corresponding feature descriptors based on a position in a corresponding image at which the extracted image feature points are located;

a feature point set determining module, configured to form image feature points with a matching relationship in the image feature points of each image frame into a feature point set;

a difference calculation module, configured to determine a representative feature point from the feature point set, and calculate a difference between a feature descriptor corresponding to a remaining image feature point in the feature point set and a feature descriptor corresponding to the representative feature point;

a position update module, configured to determine a position error of the feature point set based on the calculated difference, iteratively update the remaining image feature point in the feature point set based on the position error, and obtain an updated feature point set in a case that an iteration stop condition is satisfied; and

a feature map generation module, configured to determine a space feature point corresponding to the updated feature point set based on a position in a corresponding image at which each image feature point in the updated feature point set is located, and generate a feature map based on the space feature point, the feature map being used for positioning a to-be-positioned moving device in the target scene.

16. The apparatus according to claim 15, wherein the feature map generation module is further configured to:

determine an average descriptor corresponding to the updated feature point set based on a feature descriptor of each image feature point in the updated feature point set;

select a feature descriptor of which a similarity to the average descriptor satisfies a similarity condition from the feature descriptors of the image feature points in the updated feature point set, and use the selected feature descriptor as a reference descriptor;

project the space feature point onto an image to which each image feature point in the updated feature point set belongs to obtain a plurality of projection feature points, and determine a feature descriptor corresponding to each projection feature point based on a position in a corresponding image at which each projection feature point is located;

determine a reprojection error corresponding to each projection feature point based on a difference between the feature descriptor corresponding to the projection feature point and the reference descriptor; and

collect a reprojection error corresponding to each projection feature point to obtain a target error, iteratively update the space feature point based on the target error, obtain a target space feature point corresponding to the updated feature point set in a case that the iteration stop condition is satisfied, and generate the feature map based on the target space feature point.

17. The apparatus according to claim 15, wherein the position update module, configured to separately use each remaining image feature point in the feature point set as a target feature point, and separately calculate matching confidence between each target feature point and the representative feature point; calculate a position error corresponding to each target feature point based on the matching confidence and difference corresponding to each target feature point; collect the position error corresponding to each target feature point to obtain the position error of the feature point set.

18. A computer device, comprising a memory and a processor, the memory having computer-readable instructions stored therein, and the computer-readable instructions, when executable by the processor, implementing operations of the method according to any one of claims 1 to 14.

19. A computer-readable storage medium, having computer-readable instructions stored thereon, and the computer-readable instructions, when executable by a processor, implementing operations of the method according to any one of claims 1 to 14.

20. A computer program product, comprising computer-readable instructions, and the computer-readable instructions, when executable by a processor, implementing operations of the method according to any one of claims 1 to 14.

Drawing

Search report

Cited references

REFERENCES CITED IN THE DESCRIPTION

This list of references cited by the applicant is for the reader's convenience only. It does not form part of the European patent document. Even though great care has been taken in compiling the references, errors or omissions cannot be excluded and the EPO disclaims all liability in this regard.

Patent documents cited in the description

CN2022109459385 [0001]