(19)
(11) EP 3 754 541 A1

(12) EUROPEAN PATENT APPLICATION
published in accordance with Art. 153(4) EPC

(43) Date of publication:
23.12.2020 Bulletin 2020/52

(21) Application number: 19864907.1

(22) Date of filing: 26.09.2019
(51) International Patent Classification (IPC): 
G06K 9/00(2006.01)
G06K 9/62(2006.01)
(86) International application number:
PCT/CN2019/108145
(87) International publication number:
WO 2020/063744 (02.04.2020 Gazette 2020/14)
(84) Designated Contracting States:
AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR
Designated Extension States:
BA ME
Designated Validation States:
KH MA MD TN

(30) Priority: 30.09.2018 CN 201811165758

(71) Applicant: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
Shenzhen, Guangdong 518057 (CN)

(72) Inventor:
  • ZHENG, Kesong
    Shenzhen, Guangdong 518057 (CN)

(74) Representative: Gunzelmann, Rainer 
Wuesthoff & Wuesthoff Patentanwälte PartG mbB Schweigerstraße 2
81541 München
81541 München (DE)

   


(54) FACE DETECTION METHOD AND DEVICE, SERVICE PROCESSING METHOD, TERMINAL DEVICE, AND STORAGE MEDIUM


(57) A face detection method and device, a service processing method, a terminal device, and a storage medium. The method comprises: acquiring a target face image to undergo detection (S201); employing a face alignment algorithm and a sample data set to perform a hierarchy-based fitting training and obtaining a target face alignment model, and then calling the target face alignment model to perform face alignment detection on the target face image and obtaining a target keypoint set of the target face image (S202); and determining, according to the target keypoint set, a feature region of the target face image (S203).




Description

RELATED APPLICATION



[0001] This disclosure claims priority to Chinese Patent Application No. 201811165758.5, entitled "FACE DETECTION METHOD, SERVICE PROCESSING METHOD, APPARATUS, TERMINAL, AND MEDIUM" and filed with the Chinese Patent Office on September 30, 2018.

FIELD OF THE TECHNOLOGY



[0002] This disclosure relates to the field of image processing technologies, and in particular, to a face detection method and apparatus, a service processing method, a terminal device, and a storage medium.

BACKGROUND OF THE DISCLOSURE



[0003] Image processing is a technology that uses a computer to process an image to achieve a desired result. In the field of image processing, a face detection becomes a hot research topic. The face detection may include a face alignment detection. The so-called face alignment detection may also be referred to as a face key point detection, in which a facial image is detected, to locate key feature points of the face, for example, the eyes, the nose, and the corners of the mouth. How to better perform a face detection on a facial image becomes a research focus.

SUMMARY



[0004] Embodiments of this disclosure provide a face detection method and apparatus, a service processing method, a terminal device, and a storage medium, to better perform a face detection on a facial image, thereby improving the accuracy of a detection result.

[0005] According to an aspect, an embodiment of this disclosure provides a face detection method, performed by a terminal device, the method including:

obtaining a to-be-detected target facial image;

performing a hierarchical fitting training by using a face alignment algorithm and a sample data set to obtain a target face alignment model;

invoking the target face alignment model to perform a face alignment detection on the target facial image, to obtain a target key point set of the target facial image; and

determining a feature area of the target facial image according to the target key point set.



[0006] According to another aspect, an embodiment of this disclosure provides a service processing method, performed by a terminal device, the method including:

invoking, in a case that a service request requiring a face alignment detection is detected, a camera apparatus of the terminal device to obtain a target facial image of a requester;

performing the face alignment detection on the target facial image by using a face detection method to obtain a feature area of the target facial image; and

processing a requested service according to the feature area of the target facial image to respond to the service request.



[0007] According to still another aspect, an embodiment of this disclosure provides a face detection apparatus, including:

an obtaining unit, configured to obtain a to-be-detected target facial image;

a training unit, configured to perform a hierarchical fitting training by using a face alignment algorithm and a sample data set to obtain a target face alignment model;

a detection unit, configured to invoke the target face alignment model to perform a face alignment detection on the target facial image, to obtain a target key point set of the target facial image; and

a determination unit, configured to determine a feature area of the target facial image according to the target key point set.



[0008] According to still another aspect, an embodiment of this disclosure provides a terminal device, including a processor, an input device, an output device, and a memory, the processor, the input device, the output device, and the memory being connected to each other, the memory being configured to store a computer program, the computer program including a first program instruction, the processor being configured to invoke the first program instruction to perform the face detection method; or the computer program including a second program instruction, the processor being configured to invoke the second program instruction to perform the service processing method.

[0009] According to still another aspect, an embodiment of this disclosure provides a computer storage medium, the computer storage medium storing a first computer program instruction, the first computer program instruction, when executed, implementing the foregoing face detection method; or the computer storage medium storing a second computer program instruction, the second computer program instruction, when executed, implementing the foregoing service processing method.

BRIEF DESCRIPTION OF THE DRAWINGS



[0010] To describe the technical solutions of the embodiments of this disclosure or the related art more clearly, the following briefly describes the accompanying drawings required for describing the embodiments or the related art. Apparently, the accompanying drawings in the following description show only some embodiments of this disclosure, and a person skilled in the art may still derive other drawings from the accompanying drawings without creative efforts.

FIG. 1a is a schematic diagram of a target facial image according to an embodiment of this disclosure.

FIG. 1b is a schematic diagram of another target facial image according to an embodiment of this disclosure.

FIG. 2 is a schematic flowchart of a face detection method according to an embodiment of this disclosure.

FIG. 3 is a schematic flowchart of a face detection method according to another embodiment of this disclosure.

FIG. 4a is a schematic diagram of a displacement according to an embodiment of this disclosure.

FIG. 4b is a schematic diagram of a rotation according to an embodiment of this disclosure.

FIG. 4c is a schematic diagram of a mirroring according to an embodiment of this disclosure.

FIG. 4d is a schematic diagram of a compression according to an embodiment of this disclosure.

FIG. 5 is a schematic diagram of a division of a face area according to an embodiment of this disclosure.

FIG. 6 is a schematic flowchart of a service processing method according to an embodiment of this disclosure.

FIG. 7 is a diagram of an application scenario of a service processing method according to an embodiment of this disclosure.

FIG. 8 is a diagram of an application scenario of another service processing method according to an embodiment of this disclosure.

FIG. 9 is a schematic structural diagram of a face detection apparatus according to an embodiment of this disclosure.

FIG. 10 is a schematic structural diagram of a service processing apparatus according to an embodiment of this disclosure.

FIG. 11 is a schematic structural diagram of a terminal according to an embodiment of this disclosure.

FIG. 12 is a schematic structural diagram of an implementation environment according to an embodiment of this disclosure.


DESCRIPTION OF EMBODIMENTS



[0011] The following clearly and completely describes the technical solutions in the embodiments of this disclosure with reference to the accompanying drawings in the embodiments of this disclosure.

[0012] A face key point (a key point for short), also referred to as a facial feature point, usually includes points that constitute facial features (the eyebrows, eyes, nose, mouth, and ears) and a facial profile. A method for detecting a facial image and labeling one or more key points in the facial image may be referred to as a face key point detection method or a face alignment detection method. Feature areas in the facial image may be determined by performing a face alignment detection on the facial image. The feature areas herein may include, but are not limited to, an eyebrow area, an eye area, a nose area, a mouth area, and an ear area.

[0013] In an embodiment of this disclosure, a target face alignment model (which may also be referred to as a target face key point detection model) may be provided to implement the face alignment detection. After a to-be-detected target facial image is obtained, the target face alignment model may be invoked to perform the face alignment detection on the target facial image, to determine a plurality of key points and label information of the key points in the target facial image. The key points herein may include, but are not limited to, mouth key points, eyebrow key points, eye key points, nose key points, and ear key points. The label information of the key points may include, but is not limited to, position information (for example, labeling positions of the key points), shape information (for example, being labeled as a dot shape), and feature information. The feature information is used for representing categories of the key points. For example, if the feature information is feature information of eyes, it represents that the key points are the key points of the eyes. In another example, if the feature information is feature information of the nose, it represents that the key points are the key points of the nose.

[0014] The plurality of key points determined in the target facial image may be shown as gray dots in FIG. 1a. After the plurality of key points are determined, feature areas of the target facial image may be determined based on the label information of the key points. For example, according to labeled positions of gray spots in FIG. 1a, an eyebrow area 11, an eye area 12, a nose area 13, a mouth area 14, and an ear area 15 may be separately determined, as shown in FIG. 1b.

[0015] Based on the above descriptions, an embodiment of this disclosure provides a face detection method, implemented by a terminal device, for example, a mobile terminal such as a smartphone or a tablet computer. As shown in FIG. 2, the method may include the following steps S201 to S203.

[0016] S201. Obtain a to-be-detected target facial image.

[0017] The target facial image may be a facial image obtained by a terminal invoking a camera apparatus (for example, a camera) to take an environmental image in real time, or a stored facial image obtained by a terminal from a local gallery or a cloud photo album. The cloud photo album herein is a web album based on a cloud computing platform.

[0018] In an embodiment, if the terminal detects a triggering event of the face alignment detection, the to-be-detected target facial image may be obtained. The triggering event of the face alignment detection herein may be used as a service request.

[0019] In particular, in a case that it is detected that a user is using an application program based on the face alignment detection, a service request requiring the face alignment detection is monitored; and in a case that the service request is detected, a camera apparatus of the terminal device is invoked to obtain a facial image of a requester as the target facial image.

[0020] The application programs based on the face alignment detection may include, but are not limited to, a facial expression recognition application program, a face changing effect application program, and a smart mapping application program. When the user uses the application programs, the terminal needs to obtain the target facial image, and perform the face alignment detection on the target facial image to determine feature areas. Therefore, operations such as facial expression recognition, face changing effect, and smart mapping are performed based on the feature areas.

[0021] Optionally, the triggering event of the face alignment detection may be alternatively an event that it is detected that the terminal performs identity verification according to the target facial image. When the terminal performs the identity verification according to the target facial image, the face alignment detection first needs to be performed on the target facial image to determine the feature areas. Therefore, operations such as information matching are performed based on the determined feature areas and preset facial information.

[0022] In another embodiment, if the terminal detects that the user sends an instruction of performing the face alignment detection, the to-be-detected target facial image may be obtained. The instruction may be a speech instruction, a press/click instruction, an instruction of enabling face alignment detection function or the like.

[0023] S202. Perform a hierarchical fitting training by using a face alignment algorithm and a sample data set to obtain a target face alignment model; and invoke the target face alignment model to perform a face alignment detection on the target facial image, to obtain a target key point set of the target facial image.

[0024] After the terminal obtains the to-be-detected target facial image, the target facial image may be inputted into the target face alignment model, so that the target face alignment model may perform the face alignment detection on the target facial image, thereby obtaining the target key point set of the target facial image.

[0025] The target key point set herein may include a plurality of target key points and label information of the target key points. The target key points may be any one of the following: mouth key points, eyebrow key points, eye key points, nose key points, ear key points, and the like. The label information of the target key points may include position information, shape information, feature information, and the like of the target key points.

[0026] The target face alignment model is obtained by using the face alignment algorithm and the sample data set to perform the hierarchical fitting training. The face alignment algorithm herein may include, but is not limited to, a machine learning regression algorithm such as a supervised descent method (SDM) algorithm and a local binary features (LBF) algorithm or a convolutional neural network (CNN) algorithm such as a facial landmark detection by deep multi-task learning (TCDCN) algorithm and a 3D dense face alignment (3DDFA) algorithm. Based on the algorithms, an original model may be designed. A training is then performed based on the original model and the sample data set, so that the target face alignment model may be eventually obtained.

[0027] In an embodiment, before the obtaining a to-be-detected target facial image, the method further includes: obtaining the sample data set, the sample data set including a plurality of sample facial images and reference key point sets of the sample facial images, the reference key point set of each sample facial image including a plurality of reference key points and label information of the reference key points; and determining a plurality of feature areas used for representing the sample facial images according to the plurality of reference key points and the label information of the reference key points.

[0028] The feature areas include any one of the following: the eyebrow area, the eye area, the nose area, the mouth area, and the ear area. The face alignment algorithm includes a machine learning regression algorithm or a CNN algorithm.

[0029] The hierarchical fitting training is: determining training priorities of the feature areas of the sample facial images according to loss weights of the feature areas; and performing a fitting training on the feature areas of the sample facial images by using the face alignment algorithm and according to the training priorities.

[0030] Specifically, detection difficulties of the feature areas are different. Different loss weights are set for the feature areas according to the detection difficulties of the feature areas. The feature area with a larger loss weight has a higher priority during training. According to the training priority, the fitting training is performed on the feature areas by using the face alignment algorithm.

[0031] In an embodiment, a difficult sample facial image is selected from the sample data set; an iterative training is performed according to the face alignment algorithm and the sample data set; and a result of the iterative training is optimized according to the difficult sample facial image to obtain the target face alignment model. The difficult sample facial image is a sample facial image that is selected from the sample data set and has a relatively high detection difficulty.

[0032] The feature area in which the key points have a higher detection difficulty has a larger loss weight. When the loss weight is larger, the impact on the value of the loss function is larger, and the value of the loss function may be used for describing loss values of the face alignment model under different model parameters.

[0033] In the training process, the model parameters may be continuously changed to reduce the value of the loss function, thereby achieving the objective of model training and optimization. When the value of the loss function meets a preset condition, it indicates that the training is completed. In this case, an obtained face alignment model is the target face alignment model. The preset condition herein may include, but is not limited to, that the value of the loss function is within a preset value range or the value of the loss function is the smallest value.

[0034] Therefore, in the training process, in order to avoid the impact of the loss weight on the value of the loss function, it is more likely to perform the fitting training on a feature area with a larger loss weight. Therefore, the target face alignment model obtained by training may accurately perform a key point detection on the feature area with a larger loss weight (which is the feature area in which the key point has a detection difficulty). It can be learned that the target face alignment model obtained through the hierarchical fitting training has relatively high accuracy.

[0035] S203. Determine a feature area of the target facial image according to the target key point set.

[0036] After the target key point set is obtained, the feature area of the target facial image is determined according to label information of the target key points in the target key point set. It can be learned from the above that the label information may include feature information, position information, and the like.

[0037] In an embodiment, the feature area may be determined according to the feature information of the target key points. Specifically, categories of the target key points are determined according to the feature information of the target key points. An area formed by the target key points in the same category is used as a feature area, and the category is used as the category of the feature area. For example, target key points whose feature information is all feature information of the nose are selected, categories of the target key points are all nose key points. An area formed by the target key points is used as a nose area.

[0038] In another embodiment, the feature area may be determined according to the position information of the target key points. Specifically, label positions of the target key points may be first determined according to the position information, and target key points in adjacent positions are connected. If a shape obtained by connection is similar to the shape of any one of the facial features (the eyebrows, eyes, nose, mouth, and ears), an area formed by the target key points in the adjacent positions is determined as a feature area, and the category of the feature area is determined according to the shape. For example, if the shape obtained by connecting the target key points in adjacent positions is similar to the shape of nose, an area formed by the target key points in adjacent positions is determined as a nose area.

[0039] In this embodiment of this disclosure, a face alignment detection is performed by using a target face alignment model. Because the target face alignment model is obtained by using a hierarchical fitting training, the target face alignment model may accurately perform a key point detection on feature areas, thereby improving the accuracy of a detection result. The target face alignment model has relatively small memory and a fast running speed, thereby improving the efficiency of the face alignment detection.

[0040] As shown in FIG. 3, another embodiment of this disclosure provides a face detection method, implemented by a terminal device, for example, a mobile terminal such as a smartphone or a tablet computer. Based on the embodiment shown in FIG. 2, the embodiment includes specific steps of the hierarchical fitting training. As shown in FIG. 3, the method may include following steps S301 to S307.

[0041] S301. Obtain a sample data set.

[0042] The sample data set herein may include a plurality of sample facial images and reference key point sets of the sample facial images. The reference key point set of each sample facial image includes a plurality of reference key points and label information of the reference key points. The plurality of reference key points and the label information of the reference key points may be used for representing a plurality of feature areas of the facial images. The feature areas may include any one of the following: an eyebrow area, an eye area, a nose area, a mouth area, and an ear area.

[0043] In an embodiment, the plurality of key points in the reference key point set and the label information of the key points may be obtained by performing a pre-labeling on the sample facial images by a professional labeling worker.

[0044] S302. Perform an iterative training according to a face alignment algorithm and the sample data set.

[0045] A specific process of the iterative training may include the following steps S3021 and S3022.

[0046] S3021. Perform a pre-processing on the sample data set to obtain a plurality of training data sets, each training data set including a plurality of pre-processed sample facial images.

[0047] The terminal may perform the pre-processing on the sample data set by using different augmentation parameters. The pre-processing may include an augmentation and a normalization, thereby obtaining the plurality of training data sets. The plurality of training data sets may include a first training data set. The first training data set may be any one of the plurality of training data sets. Correspondingly, specific implementations of performing a pre-processing on the sample data set to obtain a plurality of training data sets may be:
obtaining a first augmentation parameter, and performing an augmentation on the sample data set according to the first augmentation parameter to obtain a first augmented data set. The obtained first augmented data set may include a plurality of augmented sample facial images.

[0048] The augmentation herein includes at least one of the following: a displacement, a rotation, a mirroring, and a compression. A corresponding augmentation parameter includes at least one of the following: a displacement parameter, a rotation angle parameter, and a compression ratio parameter.

[0049] The displacement is changing the position of a facial part in the sample facial image. Specifically, a formula shown in Formula 1.1 may be used to perform the displacement on the sample facial image:

where Rect is used for storing parameters that appear in pairs, Rect(x, y, w, h) represents initial coordinates of the sample facial image, x is a horizontal coordinate, y is a vertical coordinate, w is a width value of the sample facial image, and h is a length value of the sample facial image; and Rect(x+dx, y+dy, w, h) represents the coordinates of the sample facial image after the displacement, dx is a variation of the horizontal coordinate, dy is a variation of the vertical coordinate, and both dx and dy may be used as displacement parameters.

[0050] The initial coordinates of the sample facial image may be coordinates in the upper left corner of the sample facial image, or coordinates in the upper right corner of the sample facial image, or the coordinates of the center point of the sample facial image, which is not limited herein. For example, the initial coordinates of the sample facial image are the coordinates of the center point of the sample facial image. For a schematic diagram of the displacement, reference may be made to FIG. 4a.

[0051] The rotation is rotating the sample facial image clockwise (θ is positive) or counterclockwise (θ is negative) by a rotation angle θ with the center point of the sample facial image as the origin. θ may be used as a rotation angle parameter. Specifically, it is assumed that the coordinates of the center point of the sample facial image are (x, y), for any pixel (x0,y0) in the sample facial image, a rotation transformation matrix shown in Formula 1.2 may be used to perform the rotation to obtain rotated pixel coordinates (x', y'), where x' =(x-x0)cosθ+(y-y0)(-sinθ)+x0 and y' =(x-x0)sinθ+(y-y0)cosθ +y 0. For a schematic diagram of the rotation, reference may be made to FIG. 4b:



[0052] The mirroring may include a horizontal mirroring and a vertical mirroring. The horizontal mirroring is switching the left part and the right part of the sample facial image with the vertical central axis of the sample facial image as the center. The vertical mirroring is switching the upper part and the lower part of the sample facial image with the horizontal central axis of the sample facial image as the center. For example, in this embodiment of this disclosure, the horizontal mirroring is performed on the sample facial image. Specifically, for any pixel (x0,y0) in the sample facial image, a formula shown in Formula 1.3 may be used to perform the horizontal mirroring. The coordinates of the pixel obtained after the horizontal mirroring are (x1,y1). w in Formula 1.3 is the width value of the sample facial image. For a schematic diagram of the horizontal mirroring, reference may be made to FIG. 4c:



[0053] In other embodiments, when the mirroring is performed on the sample facial image, the vertical mirroring may be performed on the sample facial image, or both the horizontal mirroring and the vertical mirroring may be performed on the sample facial image.

[0054] The compression is saving the sample facial image according to a specified image quality parameter in the process of saving the sample facial image in an image format. The specified quality parameter may be determined from a preset quality parameter range, and the preset quality parameter range may be [0, 100%]. When the image quality parameter is higher, the definition of the saved sample facial image is higher. The image quality parameter herein may be used as a compression ratio parameter. For example, the image quality parameter is 85%. For example, for a schematic diagram of performing the compression on the sample facial image, reference may be made to FIG. 4d.

[0055] Second, after the first augmented data set is obtained, the sample data set and the first augmented data set may combined to obtain a combined data set. The plurality of augmented sample facial images in the first augmented data set may be the sample facial images obtained by sequentially performing the displacement, the rotation, the mirroring, and the compression on the sample facial images in the sample data set.

[0056] In other embodiments, the plurality of augmented sample facial images in the first augmented data set may be alternatively the sample facial images obtained by performing part of the foregoing augmentation on the sample facial images in the sample data set, for example, sample facial images obtained after performing only the displacement, sample facial images obtained after performing only the rotation or sample facial images obtained after performing only the displacement and the compression.

[0057] Finally, the normalization may be performed on the combined data set to obtain the first training data set. The normalization includes an image normalization and/or a label information normalization.

[0058] The image normalization is a normalization of performing a rotation floating point decentralization on the sample facial images. Specifically, a data type of the sample facial image needs to be transformed first, and the data type is transformed into a floating point type, so as to perform the normalization on the sample facial image. Because an image is usually composed of a plurality of image channels, for example, a JPG image is usually composed of three image channels Red Green Blue (RGB). Therefore, when the normalization is performed on the sample facial image, for any image channel CO of the sample facial image, an average value m and variance d of all pixel values of the image channel may be calculated. The normalization is then performed on a value COi of a pixel i of the image channel by using a formula shown in Formula 1.4 to obtain a new image channel Ci:



[0059] The normalization is performed on the sample facial image, so that pixel values in the normalized sample facial image may be within a preset interval, thereby improving the stability of the sample facial image and the accuracy of the subsequent model training. The preset interval may be determined according to an actual service requirement, and is, for example, [0, 1].

[0060] The normalization of the label information is performing the normalization on position information in the label information of the reference key points in the sample facial image. Specifically, the formula shown in Formula 1.5 may be used to perform the normalization on the position information (coordinates) of the reference key points:

where
(x, y) represents the coordinates of any one of the reference key point in the sample facial image, w is the width value of the sample facial image, and h is the length value of the sample facial image. After performing the normalization of the label information on the sample facial image, the accuracy of the subsequent model training may be improved.

[0061] S3022. Perform the iterative training by using the face alignment algorithm and the plurality of training data sets to obtain a first face alignment model.

[0062] The plurality of training data sets obtained in step s11 may be generally referred to as a first training data set, or the plurality of training data sets may further be divided into a second training data set and a third training data set. During the iterative training, the second training data set is chosen over the third training data set. An augmentation parameter corresponding to the second training data set is greater than an augmentation parameter corresponding to the third training data set. For example, the augmentation parameter corresponding to the second training data set may be: displacement parameters dx = 20 and dy = 20; and a rotation angle parameter θ = 40°; and the augmented parameter corresponding to the third training data set may be: displacement parameters dx = 5 and dy = 5; and a rotation angle parameter θ = 10° .

[0063] Correspondingly, the specific implementation of performing the iterative training by using the face alignment algorithm and the plurality of training data sets to obtain the first face alignment model may be:
performing training by using the face alignment algorithm and the first training data set to obtain an initial face alignment model. The face alignment algorithm may include, but is not limited to, a machine learning regression algorithm or a CNN algorithm.

[0064] Specifically, an original model may be constructed by using the face alignment algorithm. A training optimization is performed on the original model by using the first training data set to obtain the initial face alignment model, so that the training optimization is further performed on the initial face alignment model based on the second training data set, the third training data set, and even more training data sets. Different augmentation parameters are used for different training data sets.

[0065] The training optimization of the original model may be implemented by using a supervised machine learning optimization algorithm. That is, the known reference key points in the sample facial images based on the first training data set are compared with detection key points that are detected by the original model to obtain position differences. If the difference is large, it is more necessary to adjust the model parameter of the original model until the difference between the detected key point and the reference key point is minimized or the difference is less than the preset threshold. In this case, the initial face alignment model is obtained.

[0066] Next, a loss function of the initial face alignment model is set according to a hierarchical fitting rule. The hierarchical fitting rule may be a rule that is set based on at least one feature area and the loss weight of the feature area. The loss weight is positively correlated to a fitting training order, and a fitting training is preferentially performed on the feature area with a larger loss weight.

[0067] Practice has shown that when the face alignment detection is performed on the facial image, average errors of the mouth key points of the mouth area are usually large. That is, it is more difficult to detect the mouth area, and the accuracy thereof is low. Therefore, during the model training, a fitting training may be preferentially performed on feature areas with high detection difficulties, for example, the mouth area. Therefore, the target face alignment model may accurately perform the key point detection on the feature areas with high detection difficulties.

[0068] Based on this, the plurality of feature areas used for representing the sample facial images may be determined according to the plurality of reference key points and the label information of the reference key points in the sample facial image. For example, a quantity of the reference key points is 51. A schematic diagram of a division of an area may be shown as FIG. 5. It is to be understood that the quantity of the reference key points is only an example, and is not limited to 51. The quantity may be alternatively 48, 86 or the like.

[0069] Different loss weights are set for the feature areas according to the detection difficulties of the feature areas. The feature area with a higher detection difficulty has a larger loss weight. The hierarchical fitting rule is determined according to the set loss weight. The hierarchical fitting rule may represent that the loss weight is positively correlated to a fitting training order, and a fitting training is preferentially performed on the feature area with a larger loss weight.

[0070] Finally, a loss function shown in Formula 1.6 may be set according to the hierarchical fitting rule:

where xj and yj respectively represent label coordinates of the reference key points, x'j and y'j respectively represent label coordinates of the detection key points; and ωj represents the loss weight of the reference key points, and the value may be determined according to the loss weight of the feature area to which the reference key point belongs. For example, if the loss weight of the mouth area in the hierarchical fitting rule is 0.6, the loss weights of the mouth key points of all the mouth areas are 0.6.

[0071] Finally, the second training data set and the third training data set are sequentially chosen to perform the training on the initial face alignment model according to a principle of reducing the value of the loss function to obtain the first face alignment model.

[0072] In a specific implementation, the second training data set may be first chosen to perform the training on the initial face alignment model according to the principle of reducing the value of the loss function to obtain the intermediate face alignment model.

[0073] Specifically, when the training is performed based on a target sample facial image in the second training data set, after the face key point detection is performed on the target sample facial image through the initial face alignment model this time, the value of a loss function is obtained based on the foregoing Formula 1.6. At this moment, the model parameter of the initial face alignment model is adjusted to make the value of a new obtained loss function become smaller after the face key point detection is performed on the target sample facial image next time. Therefore, the face key point detection, the calculation of the value of the loss function, and the adjustment of the model parameter are repeatedly performed on all the sample facial images in the second training data set to obtain the intermediate face alignment model.

[0074] The third training data set is then chosen to perform the training on the intermediate face alignment model to obtain the first face alignment model. For a process of performing the training on the intermediate face alignment model to obtain the first face alignment model, reference may be made to the foregoing descriptions of the training process from the initial face alignment model to the intermediate face alignment model.

[0075] When the augmentation parameter is larger, the complexity of the sample facial image in the corresponding training data set is higher. Therefore, the second training data set with a larger augmentation parameter is first used for training, so that a trained face alignment model may first adapt to a more complex facial image, and the third training data set with a smaller augmentation parameter is used for training, so that a trained face alignment model may then adapt to a simper facial image. A training process with descending complexity can improve the efficiency of the model training.

[0076] In each training process, a fitting training is preferentially performed on key points with relatively large loss weights according to the loss weights of the key points in the loss function. For example, among the plurality of feature areas, the mouth area has the largest loss weight. Therefore, during each training, a fitting training is preferentially performed on the key points of the mouth area.

[0077] In other embodiments, the third training data set may be first chosen to perform the training on the initial face alignment model according to a principle of reducing the value of the loss function to obtain the intermediate face alignment model. The second training data set is then chosen to perform the training on the intermediate face alignment model to obtain the first face alignment model.

[0078] According to accuracy requirement of the face alignment model, in other embodiments, the iterative training may be performed by using more training data sets. For example, the plurality of training data sets may include a second training data set, a third training data set, a fourth training data set, and a fifth training data set. The iterative training may be performed by using the plurality of training data sets to obtain the first face alignment model. A descending order of augmentation parameters corresponding to the plurality of training data sets is: an augmentation parameter corresponding to the second training data set>an augmentation parameter corresponding to the third training data set>an augmentation parameter corresponding to the fourth training data set>an augmentation parameter corresponding to the fifth training data set.

[0079] It is shown in tests that after the training optimization of the model is performed sequentially based on the training data sets obtained by using different augmentation parameters, the eventually obtained target face alignment model can more accurately perform the face key point detection, to achieve higher robustness.

[0080] Step 303. Select a difficult sample facial image from the sample data set.

[0081] The specific process of selecting includes the following S3031 and S3032:

[0082] S3031. Invoke the first face alignment model to perform the face alignment detection on the sample data set, to obtain detection key point sets of the sample facial images in the sample data set. The detection key point set includes a plurality of detection key points and label information of the detection key points.

[0083] S3032. Select the difficult sample facial image from the sample data set according to a difference between the reference key point set and the detection key point set.

[0084] In a specific implementation, for the each sample facial image, the difference between the reference key point set and the detection key point set may be calculated. A sample facial image whose difference is greater than a preset threshold is selected from the sample data set as the difficult sample facial image.

[0085] The preset threshold may be determined according to a service requirement of the target face alignment model: if the accuracy requirement of the target face alignment model is high, the preset threshold may be a relative small value; and if the accuracy requirement of the target face alignment model is low, the preset threshold may be a relative large value.

[0086] In an embodiment, the Euclidean distance formula shown in Formula 1.7 may be used to calculate the difference between the reference key point set and the detection key point set of the each sample facial image:

where pi represents any reference key point in the sample facial image, qi represents any detection key point in the sample facial image, d(p,q) represents the difference between the reference key point set and the detection key point set, and d(p, q)=d(q, p).

[0087] In another embodiment, the difference between the reference key point set and the detection key point set of the each sample facial image is calculated by using a cosine similarity. Specifically, the coordinates of the reference key points in the reference key point set may be represented by a vector to obtain a reference vector set and the coordinates of the detection key points in the detection key point set may be represented by a vector to obtain a detection vector set. The difference between the reference vector set and the detection vector set is then calculated by using the formula of cosine similarity, thereby determining the difference between the reference key point set and the detection key point set of the each sample facial image.

[0088] In another embodiment, the difference between the reference key point set and the detection key point set of the each sample facial image is calculated by using a Manhattan distance, a Hamming distance, and a Chebyshev distance.

[0089] S304. Optimize the first face alignment model by using the difficult sample facial image to obtain the target face alignment model.

[0090] Specifically, an augmentation such as a displacement, a rotation, a mirroring, and a compression may be first performed on the difficult sample facial image. A normalization such as an image normalization and a label information normalization may then be performed on the difficult sample facial image and the augmented difficult sample facial image, to obtain a difficult training data set. Next, the first face alignment model is optimized by using the difficult training data set according to a principle of reducing the value of the loss function to obtain the target face alignment model. That is, after the difficult training data set is obtained, the first face alignment model may further be optimized based on the difficult training data set.

[0091] An optimization of the first face alignment model is mainly to optimize the model parameter in the first face alignment model according to the value of the loss function. For the process of optimization based on the value of the loss function, reference may be made to the foregoing Formula 1.6 and other related descriptions. In the process of optimizing the first face alignment model based on the difficult training data set, the model parameter of the first face alignment model may be continuously changed to reduce the value of the loss function of the first face alignment model, to make the value of the loss function of the first face alignment model meet the preset condition, thereby achieving the objective of optimizing the first face alignment model.

[0092] The target face alignment model trained by using steps S302 to S304 has a fast running speed and small memory, thereby reducing the difficulty of deployment on a mobile terminal and improving the detection accuracy of the key point.

[0093] S305. Obtain a to-be-detected target facial image.

[0094] S306. Invoke the target face alignment model to perform a face alignment detection on the target facial image, to obtain a target key point set of the target facial image. The target key point set includes a plurality of target key points and label information of the target key points.

[0095] S307. Determine a feature area of the target facial image according to the target key point set.

[0096] For steps S305 to S307, reference may be made to steps S201 to S203 in the foregoing embodiments of this disclosure, and details are not described in the embodiments of this disclosure.

[0097] In this embodiment of this disclosure, a face alignment detection is performed by using a target face alignment model. Because the target face alignment model is obtained by using a hierarchical fitting training, the target face alignment model may accurately perform a key point detection on feature areas, thereby improving the accuracy of a detection result. The target face alignment model has relatively small memory and a fast running speed, thereby improving the efficiency of the face alignment detection.

[0098] Based on the above embodiments of the face detection method, an embodiment of this disclosure further provides a service processing method, implemented by a terminal device, for example, a mobile terminal such as a smartphone or a tablet computer. As shown in FIG. 6, the method may include following steps S601 to S603:

[0099] S601. Invoke, in a case that a service request requiring a face alignment detection is detected, a camera apparatus of the terminal device to obtain a target facial image of a requester.

[0100] The service request may be automatically generated by the terminal. For example, when the terminal detects that a user turns on a face alignment detection function of the terminal, or a user uses an application program based on the face alignment detection, a service request may be automatically generated.

[0101] Different application programs may correspond to different service requests. For example, a service request corresponding to a smart mapping application program is a smart mapping request, a service request corresponding to a face recognition application program is an identity verification request, and a service request corresponding to a face changing effect application program is a face changing effect processing request. After the service request is detected, the camera apparatus (for example, a camera) of the terminal is invoked to take a photo of the requester to obtain a target facial image of the requester.

[0102] In other embodiments, after the service request is detected, a stored facial image obtained from a local gallery or a cloud photo album is used as the target facial image. Alternatively, when the service request is detected, a facial image displayed on a screen of the terminal is used as the target facial image.

[0103] After receiving the service request, the terminal analyzes the service request to determine a requested service corresponding to the service request. The requested service may include, but is not limited to, any one or more of a face recognition service, an expression recognition service, an age analysis service, a face changing effect service, and a smart mapping service.

[0104] S602. Perform the face alignment detection on the target facial image by using a face detection method to obtain a feature area of the target facial image.

[0105] The face detection method may correspond to the face detection method described in the embodiments shown in FIG. 2 or FIG. 3. When the face alignment detection is performed on the target facial image by using the face detection method, the face alignment detection may be performed by using the target face alignment model mentioned in the foregoing embodiments to obtain the feature area such as the mouth area, the eyebrow area, the eye area, the nose area, and the ear area of the target facial image.

[0106] S603. Process the requested service according to the feature area of the target facial image to respond to the service request.

[0107] After the feature area of the target facial image is determined, the requested service is processed to respond to the service request according to the feature area.

[0108] Specifically, if the requested service is the face changing effect service, after the feature area is determined, information such as the position and size of one or more key points in the feature area is transformed to change a facial shape in the target facial image. For example, the face changing effect service is a service of enlarging the eyes and shrinking the nose. Information such as the position and size of a plurality of key points in the eye area may be transformed to enlarge the eye area, and information such as the position and size of a plurality of key points in the nose area may be transformed to shrink the nose area, to complete the face changing effect service, as shown in FIG. 7.

[0109] If the requested service is the smart mapping service, after the feature area and a target mapping template are determined, each map in the mapping template is correspondingly added to each feature area to obtain the target facial image processed by the smart mapping. For example, the target mapping template is a mapping template of a dog image. After the feature area is determined, maps such as "dog's ears", "dog's nose", and "dog's mouth" are correspondingly added to the feature areas to complete the smart mapping service, as shown in FIG. 8.

[0110] In this embodiment of this disclosure, after a target facial image is obtained, a face alignment detection is performed by using a face detection method to obtain a feature area of the target facial image, and a requested service is processed to respond to the service request according to the feature area. Because a target face alignment model used in the face detection method is obtained by performing a hierarchical fitting training, a key point detection may be accurately performed on feature areas, thereby improving the accuracy of a service processing result.

[0111] Based on the descriptions of the foregoing face detection method, an embodiment of this disclosure further provides a schematic structural diagram of a face detection apparatus shown in FIG. 9. The face detection apparatus may perform the methods shown in FIG. 2 and FIG. 3. Referring to FIG. 9, the face detection apparatus in an embodiment of this disclosure may include:

an obtaining unit 101, configured to obtain a to-be-detected target facial image;

a training unit 102, configured to perform a hierarchical fitting training by using a face alignment algorithm and a sample data set to obtain a target face alignment model;

a detection unit 103, configured to invoke the target face alignment model to perform a face alignment detection on the target facial image, to obtain a target key point set of the target facial image; and

a determination unit 104, configured to determine a feature area of the target facial image according to the target key point set.



[0112] In an embodiment, the obtaining unit 101 may further be configured to: obtain the sample data set, the sample data set including a plurality of sample facial images and reference key point sets of the sample facial images, the reference key point set of each sample facial image including a plurality of reference key points and label information of the reference key points; and determine a plurality of feature areas used for representing the sample facial images according to the plurality of reference key points and the label information of the reference key points.

[0113] The training unit 102 is specifically configured to: determine training priorities of the feature areas of the sample facial images according to loss weights of the feature areas; and perform a fitting training on the feature areas of the sample facial images by using the face alignment algorithm and according to the training priorities.

[0114] In another embodiment, the feature areas include any one of the following: an eyebrow area, an eye area, a nose area, a mouth area, and an ear area, and the face alignment algorithm includes a machine learning regression algorithm or a CNN algorithm.

[0115] In another embodiment, the obtaining unit 101 is specifically configured to: monitor, in a case that it is detected that a user is using an application program based on the face alignment detection, a service request requiring the face alignment detection; and invoke, in a case that the service request is detected, a camera apparatus of the terminal device to obtain a facial image of a requester as the target facial image.

[0116] In another embodiment, the sample data set includes a plurality of sample facial images, and the training unit 102 is specifically configured to: perform an iterative training according to the face alignment algorithm and the sample data set; select a difficult sample facial image from the sample data set; and optimize a result of the iterative training according to the difficult sample facial image to obtain the target face alignment model.

[0117] In another embodiment, the sample data set further includes reference key point sets of the sample facial images, and the training unit 102 is specifically configured to: perform a pre-processing on the sample data set to obtain a plurality of training data sets, each training data set including a plurality of pre-processed sample facial images; perform the iterative training by using the face alignment algorithm and the plurality of training data sets to obtain a first face alignment model; invoke the first face alignment model to perform the face alignment detection on the sample data set, to obtain detection key point sets of the sample facial images in the sample data set; select the difficult sample facial image from the sample data set according to a difference between the reference key point set and the detection key point set; and optimize the first face alignment model by using the difficult sample facial image to obtain the target face alignment model.

[0118] In another embodiment, the plurality of training data sets include a first training data set, and the first training data set is any one of the plurality of training data sets; and the training unit 102 may be specifically configured to: obtain a first augmentation parameter, and perform an augmentation on the sample data set according to the first augmentation parameter to obtain a first augmented data set, the first augmented data set including a plurality of augmented sample facial images; combine the sample data set and the first augmented data set; and perform a normalization on a combined data set to obtain the first training data set.

[0119] In another embodiment, the plurality of training data sets include a second training data set and a third training data set, and in a case of the iterative training, the second training data set is chosen over the third training data set; and the training unit 102 may be specifically configured to: perform a training by using the face alignment algorithm and the first training data set to obtain an initial face alignment model; set a loss function of the initial face alignment model according to a hierarchical fitting rule; and sequentially choose the second training data set and the third training data set to perform the training on the initial face alignment model according to a principle of reducing the value of the loss function, to obtain the first face alignment model.

[0120] In another embodiment, an augmentation parameter corresponding to the second training data set is greater than an augmentation parameter corresponding to the third training data set; and the augmentation parameter includes at least one of the following: a displacement parameter, a rotation angle parameter, and a compression ratio parameter.

[0121] In another embodiment, the reference key point set of the each sample facial image includes a plurality of reference key points and label information of the reference key points, and the training unit 102 is specifically configured to: determine a plurality of feature areas used for representing the sample facial images according to the plurality of reference key points and the label information of the reference key points; set different loss weights for the feature areas according to detection difficulties of the feature areas; and set the hierarchical fitting rule based on at least one feature area and the loss weights of the feature areas, a fitting training being preferentially performed on the feature area with a larger loss weight.

[0122] In another embodiment, the training unit 102 is specifically configured to: choose the second training data set to perform the training on the initial face alignment model to obtain an intermediate face alignment model; and choose the third training data set to perform the training on the intermediate face alignment model to obtain the first face alignment model.

[0123] In another embodiment, the training unit 102 is specifically configured to: perform an augmentation on the difficult sample facial image; perform a normalization on the difficult sample facial image and the augmented difficult sample facial image to obtain a difficult training data set; and optimize the first face alignment model by using the difficult training data set to obtain the target face alignment model.

[0124] In another embodiment, the training unit 102 is specifically configured to: calculate the difference between the reference key point set and the detection key point set for the each sample facial image; and select a sample facial image whose difference is greater than a preset threshold from the sample data set as the difficult sample facial image.

[0125] In another embodiment, the target key point set includes a plurality of target key points and label information of the target key points, and the training unit 102 is specifically configured to: determine the feature area of the target facial image according to the label information of the target key points.

[0126] In another embodiment, the label information includes feature information, and the training unit 102 is specifically configured to: determine categories of the target key points according to the feature information of the target key points, use an area formed by the target key points in the same category as a feature area, and use the category as the category of the feature area.

[0127] Specifically, the label information includes position information; and the training unit 102 is specifically configured to: determine label positions of the target key points according to the position information, and connect target key points in adjacent positions; and determine, in a case that a shape obtained by connection is similar to the shape of any one of the facial features, an area formed by the target key points in the adjacent positions as a feature area, and determine the category of the feature area according to the shape.

[0128] In this embodiment of this disclosure, a face alignment detection is performed by using a target face alignment model. Because the target face alignment model is obtained by using a hierarchical fitting training, the target face alignment model may accurately perform a key point detection on feature areas, thereby improving the accuracy of a detection result. The target face alignment model has relatively small memory and a fast running speed, thereby improving the efficiency of the face alignment detection.

[0129] Based on the descriptions of the foregoing service processing method, an embodiment of this disclosure further provides a schematic structural diagram of a service processing apparatus shown in FIG. 10. The service processing apparatus may perform the method shown in FIG. 6. Referring to FIG. 10, the service processing apparatus in an embodiment of this disclosure may include:

an obtaining unit 201, configured to invoke, in a case that a service request requiring a face alignment detection is detected, a camera apparatus to obtain a target facial image of a requester;

a detection unit 202, configured to perform the face alignment detection on the target facial image by using the face detection method shown in FIG. 2 or FIG. 3, to obtain a feature area of the target facial image; and

a processing unit 203, configured to process a requested service according to the feature area of the target facial image to respond to the service request.



[0130] In this embodiment of this disclosure, after a target facial image is obtained, a face alignment detection is performed by using a face detection method to obtain a feature area of the target facial image, and a requested service is processed to respond to the service request according to the feature area. Because a target face alignment model used in the face detection method is obtained by performing a hierarchical fitting training, a key point detection may be accurately performed on feature areas, thereby improving the accuracy of a service processing result.

[0131] Based on the descriptions of the foregoing method embodiments and apparatus embodiments, an embodiment of this disclosure further provides a terminal. Referring to FIG. 11, the internal structure of the terminal includes at least a processor 301, an input device 302, an output device 303, and a memory 304. The processor 301, the input device 302, the output device 303, and the memory 304 of the terminal may be connected by a bus or in other manners, for example, by a bus 305 in FIG. 11 in this embodiment of this disclosure. The memory 304 may be configured to store a computer program. The computer program includes a first program instruction and/or a second program instruction. The processor 301 is configured to execute the first program instruction stored in the memory 304 to implement the face detection method shown in FIG. 2 or FIG. 3. In an embodiment, the processor 301 may further be configured to execute the second program instruction stored in the memory 304 to implement the service processing method shown in FIG. 6.

[0132] In an embodiment, the processor 301 may be a central processing unit (CPU). The processor may be alternatively another general-purpose processor, that is, a microprocessor or any conventional processor. The memory 304 may include a read-only memory (ROM) and a random access memory (RAM), and provides instructions and data to the processor 301. Therefore, the processor 301 and the memory 304 are not limited herein.

[0133] In the embodiments of this disclosure, a non-transitory computer storage medium is further provided, and the computer storage medium is a memory device in the server and is configured to store programs and data. The computer storage medium herein may include an internal storage medium of the terminal and may also include an extended storage medium supported by the terminal. The computer storage medium provides a storage space, storing an operating system of the terminal. In addition, the storage space further stores computer program instructions suitable for being loaded and executed by the processor 301, and the instructions may be one or more computer programs (including program code). The computer storage medium herein may be a high speed RAM or a non-volatile memory, for example, at least one magnetic disk memory. Optionally, the computer storage medium may further be at least one computer storage medium located away from the foregoing processor.

[0134] In an embodiment, the processor 301 may load and execute the first computer program instruction stored in the computer storage medium to implement the corresponding steps of the method in the foregoing face detection embodiments. In a specific implementation, the first computer program instruction of the computer storage medium is loaded by the processor 301 to perform the following steps:

obtaining a to-be-detected target facial image;

performing a hierarchical fitting training by using a face alignment algorithm and a sample data set to obtain a target face alignment model;

invoking the target face alignment model to perform a face alignment detection on the target facial image, to obtain a target key point set of the target facial image; and

determining a feature area of the target facial image according to the target key point set.



[0135] In another embodiment, the processor 301 may load and execute the second computer program instruction stored in the computer storage medium to implement the corresponding steps of the method in the foregoing service processing embodiment. In a specific implementation, the second computer program instruction of the computer storage medium is loaded by the processor 301 to perform the following steps:

invoking, in a case that a service request requiring a face alignment detection is detected, a camera apparatus to obtain a target facial image of a requester;

performing the face alignment detection on the target facial image by using the face detection method in FIG. 2 or FIG. 3, to obtain a feature area of the target facial image; and

processing a requested service according to the feature area of the target facial image to respond to the service request.



[0136] FIG. 12 is a schematic structural diagram of an implementation environment according to an embodiment of this disclosure. As shown in FIG. 12, a face detection system 100 includes a user 101 and a terminal device 102. The terminal device 102 includes a camera apparatus 1021, an application program 1022, a face detection apparatus 1023, and an operation button 1024. The application program 1022 has a requirement of a face alignment detection, and is, for example, a facial expression recognition application program, a face changing effect application program, a smart mapping application program or an identity verification application program.

[0137] According to this embodiment of this disclosure, when the terminal device 102 detects that the user 101 is using the application program 1022 based on the face alignment detection, as shown by an arrow 1031, whether the user 101 sends a service request that requires the face alignment detection. When the service request is detected, the terminal device 102 invokes the camera apparatus 1021 to obtain a facial image of a requester (for example, the user 101, or any user other than the user 101) as the target facial image, as shown by an arrow 1032.

[0138] The face detection apparatus performs a hierarchical fitting training by using the face alignment algorithm and a sample data set, to obtain the target face alignment model; and invokes the target face alignment model to perform the face alignment detection on the target facial image, to obtain a target key point set of the target facial image, and determines a feature area of the target facial image according to the target key point set, for example, the feature areas shown in FIG. 1b.

[0139] In this embodiment of this disclosure, after a target facial image is obtained, a face alignment detection is performed by using a face detection method to obtain a feature area of the target facial image, and a requested service is processed to respond to the service request according to the feature area. Because a target face alignment model used in the face detection method is obtained by performing a hierarchical fitting training, a key point detection may be accurately performed on feature areas, thereby improving the accuracy of a service processing result.

[0140] For a specific working process of the terminal and units described above, reference may be made to the related descriptions in the foregoing embodiments, and details are not described herein again.

[0141] A person of ordinary skill in the art may understand that all or some of the processes of the methods in the foregoing embodiments may be implemented by a non-transitory computer program instructing relevant hardware. The program may be stored in a computer-readable storage medium. When the program is executed, the processes of the foregoing method embodiments are performed. The storage medium may be a magnetic disk, an optical disc, a ROM, a RAM or the like.

[0142] The foregoing descriptions are merely some embodiments of this disclosure, and are not intended to limit the scope of this disclosure. A person skilled in the art may understand all or some processes of the foregoing embodiments, and equivalent modifications made according to the claims of this disclosure shall still fall within the scope of the present disclosure.


Claims

1. A face detection method, performed by a terminal device, comprising:

obtaining a to-be-detected target facial image;

performing a hierarchical fitting training by using a face alignment algorithm and a sample data set to obtain a target face alignment model;

invoking the target face alignment model to perform a face alignment detection on the target facial image, to obtain a target key point set of the target facial image; and

determining a feature area of the target facial image according to the target key point set.


 
2. The method according to claim 1, wherein before the obtaining a to-be-detected target facial image, the method further comprises:

obtaining the sample data set, the sample data set comprising a plurality of sample facial images and reference key point sets of the sample facial images, the reference key point set of each sample facial image comprising a plurality of reference key points and label information of the reference key points; and

determining a plurality of feature areas used for representing the sample facial images according to the plurality of reference key points and the label information of the reference key points; and

the performing a hierarchical fitting training by using a face alignment algorithm and a sample data set to obtain a target face alignment model comprises:

determining training priorities of the feature areas of the sample facial images according to loss weights of the feature areas; and

performing a fitting training on the feature areas of the sample facial images by using the face alignment algorithm and according to the training priorities.


 
3. The method according to claim 2, wherein the feature area comprises any one of the following: an eyebrow area, an eye area, a nose area, a mouth area, and an ear area, and the face alignment algorithm comprises a machine learning regression algorithm or a convolutional neural network (CNN) algorithm.
 
4. The method according to claim 1, wherein the obtaining a to-be-detected target facial image comprises:

monitoring, in a case that it is detected that a user is using an application program based on the face alignment detection, a service request requiring the face alignment detection; and

invoking, in a case that the service request is detected, a camera apparatus of the terminal device to obtain a facial image of a requester as the target facial image.


 
5. The method according to claim 1, wherein the sample data set comprises a plurality of sample facial images, and the performing a hierarchical fitting training by using a face alignment algorithm and a sample data set to obtain a target face alignment model comprises:

performing an iterative training according to the face alignment algorithm and the sample data set;

selecting a difficult sample facial image from the sample data set; and

optimizing a result of the iterative training according to the difficult sample facial image to obtain the target face alignment model.


 
6. The method according to claim 5, wherein the performing an iterative training according to the face alignment algorithm and the sample data set comprises:

performing a pre-processing on the sample data set to obtain a plurality of training data sets, each training data set comprising a plurality of pre-processed sample facial images; and

performing the iterative training by using the face alignment algorithm and the plurality of training data sets to obtain a first face alignment model;

the sample data set further comprises reference key point sets of the sample facial images, and the selecting a difficult sample facial image from the sample data set comprises:

invoking the first face alignment model to perform the face alignment detection on the sample data set, to obtain detection key point sets of the sample facial images in the sample data set; and

selecting the difficult sample facial image from the sample data set according to a difference between the reference key point set and the detection key point set; and

the optimizing a result of the iterative training according to the difficult sample facial image to obtain the target face alignment model comprises:
optimizing the first face alignment model by using the difficult sample facial image to obtain the target face alignment model.


 
7. The method according to claim 6, wherein the plurality of training data sets comprise a first training data set, and the first training data set is any one of the plurality of training data sets; and
the performing a pre-processing on the sample data set to obtain a plurality of training data sets comprises:

obtaining a first augmentation parameter, and performing an augmentation on the sample data set according to the first augmentation parameter to obtain a first augmented data set, the first augmented data set comprising a plurality of augmented sample facial images;

combining the sample data set and the first augmented data set; and

performing a normalization on a combined data set to obtain the first training data set.


 
8. The method according to claim 7, wherein the plurality of training data sets comprise a second training data set and a third training data set, and in a case of the iterative training, the second training data set is chosen over the third training data set; and
the performing the iterative training by using the face alignment algorithm and the plurality of training data sets to obtain a first face alignment model comprises:

performing a training by using the face alignment algorithm and the first training data set to obtain an initial face alignment model;

setting a loss function of the initial face alignment model according to a hierarchical fitting rule; and

sequentially choosing the second training data set and the third training data set to perform the training on the initial face alignment model according to a principle of reducing the value of the loss function, to obtain the first face alignment model.


 
9. The method according to claim 8, wherein an augmentation parameter corresponding to the second training data set is greater than an augmentation parameter corresponding to the third training data set; and
the augmentation parameter comprises at least one of the following: a displacement parameter, a rotation angle parameter, and a compression ratio parameter.
 
10. The method according to claim 8, wherein the reference key point set of each sample facial image comprises a plurality of reference key points and label information of the reference key points; and
the method further comprises:

determining a plurality of feature areas used for representing the sample facial images according to the plurality of reference key points and the label information of the reference key points;

setting different loss weights for the feature areas according to detection difficulties of the feature areas; and

setting the hierarchical fitting rule based on at least one feature area and the loss weights of the feature areas, a fitting training being preferentially performed on the feature area with a larger loss weight.


 
11. The method according to claim 8, wherein the sequentially choosing the second training data set and the third training data set to perform the training on the initial face alignment model according to a principle of reducing the value of the loss function, to obtain the first face alignment model comprises:

choosing the second training data set to perform the training on the initial face alignment model to obtain an intermediate face alignment model; and

choosing the third training data set to perform the training on the intermediate face alignment model to obtain the first face alignment model.


 
12. The method according to claim 6, wherein the optimizing the first face alignment model by using the difficult sample facial image to obtain the target face alignment model comprises:

performing an augmentation on the difficult sample facial image;

performing a normalization on the difficult sample facial image and the augmented difficult sample facial image to obtain a difficult training data set; and

optimizing the first face alignment model by using the difficult training data set to obtain the target face alignment model.


 
13. The method according to any one of claims 6 to 12, wherein the selecting the difficult sample facial image from the sample data set according to a difference between the reference key point set and the detection key point set comprises:

calculating the difference between the reference key point set and the detection key point set for the each sample facial image; and

selecting a sample facial image whose difference is greater than a preset threshold from the sample data set as the difficult sample facial image.


 
14. The method according to claim 1, wherein the target key point set comprises a plurality of target key points and label information of the target key points; and
the determining a feature area of the target facial image according to the target key point set comprises:
determining the feature area of the target facial image according to the label information of the target key points.
 
15. The method according to claim 14, wherein the label information comprises feature information; and
the determining the feature area of the target facial image according to the label information of the target key points comprises:
determining categories of the target key points according to the feature information of the target key points, using an area formed by the target key points in the same category as a feature area, and using the category as the category of the feature area.
 
16. The method according to claim 14, wherein the label information comprises position information; and
the determining the feature area of the target facial image according to the label information of the target key points comprises:

determining label positions of the target key points according to the position information, and connecting target key points in adjacent positions; and

determining, in a case that a shape obtained by connection is similar to the shape of any one of the facial features, an area formed by the target key points in the adjacent positions as a feature area, and determining the category of the feature area according to the shape.


 
17. A service processing method, performed by a terminal device, the method comprising:

invoking, in a case that a service request requiring a face alignment detection is detected, a camera apparatus of the terminal device to obtain a target facial image of a requester;

performing the face alignment detection on the target facial image by using the face detection method according to any one of claims 1 to 16 to obtain a feature area of the target facial image; and

processing a requested service according to the feature area of the target facial image to respond to the service request.


 
18. A face detection apparatus, comprising:

an obtaining unit, configured to obtain a to-be-detected target facial image;

a training unit, configured to perform a hierarchical fitting training by using a face alignment algorithm and a sample data set to obtain a target face alignment model;

a detection unit, configured to invoke the target face alignment model to perform a face alignment detection on the target facial image, to obtain a target key point set of the target facial image; and

a determination unit, configured to determine a feature area of the target facial image according to the target key point set.


 
19. A terminal device, comprising a processor, an input device, an output device, and a memory, the processor, the input device, the output device, and the memory being connected to each other, the memory being configured to store a computer program, the computer program comprising a first program instruction, the processor being configured to invoke the first program instruction to perform the face detection method according to any one of claims 1 to 16; or the computer program comprising a second program instruction, the processor being configured to invoke the second program instruction to perform the service processing method according to claim 17.
 
20. A computer storage medium, the computer storage medium storing a first computer program instruction, the first computer program instruction being suitable to be loaded by a processor to perform the face detection method according to any one of claims 1 to 16; or the computer storage medium storing a second computer program instruction, the second computer program instruction being suitable to be loaded by a processor to perform the service processing method according to claim 17.
 




Drawing































Search report










Cited references

REFERENCES CITED IN THE DESCRIPTION



This list of references cited by the applicant is for the reader's convenience only. It does not form part of the European patent document. Even though great care has been taken in compiling the references, errors or omissions cannot be excluded and the EPO disclaims all liability in this regard.

Patent documents cited in the description