(19)
(11)EP 3 598 736 B1

(12)EUROPEAN PATENT SPECIFICATION

(45)Mention of the grant of the patent:
13.09.2023 Bulletin 2023/37

(21)Application number: 19184943.9

(22)Date of filing:  08.07.2019
(51)International Patent Classification (IPC): 
H04N 23/60(2023.01)
H04N 23/611(2023.01)
H04N 23/80(2023.01)
G06V 10/82(2022.01)
H04N 23/61(2023.01)
H04N 23/63(2023.01)
G06V 10/764(2022.01)
G06V 20/10(2022.01)
(52)Cooperative Patent Classification (CPC):
G06V 20/10; G06V 10/82; G06V 10/764; H04N 23/61; H04N 23/64; H04N 23/611; H04N 23/80; H04N 23/632; H04N 23/635; H04N 23/63

(54)

METHOD AND APPARATUS FOR PROCESSING IMAGE

VERFAHREN UND VORRICHTUNG ZUR VERARBEITUNG EINES BILDES

PROCÉDÉ ET APPAREIL POUR LE TRAITEMENT D'UNE IMAGE


(84)Designated Contracting States:
AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

(30)Priority: 16.07.2018 CN 201810777660

(43)Date of publication of application:
22.01.2020 Bulletin 2020/04

(73)Proprietor: GUANGDONG OPPO MOBILE TELECOMMUNICATIONS CORP., LTD.
Wusha, Chang'an Dongguan, Guangdong 523860 (CN)

(72)Inventor:
  • LIU, Yaoyong
    Dongguan, Guangdong 523860 (CN)

(74)Representative: Manitz Finsterwald Patent- und Rechtsanwaltspartnerschaft mbB 
Martin-Greif-Strasse 1
80336 München
80336 München (DE)


(56)References cited: : 
EP-A1- 2 207 341
CN-A- 105 991 925
CN-A- 107 818 313
US-A1- 2015 010 239
US-A1- 2017 374 246
EP-A1- 3 654 625
CN-A- 107 257 439
US-A1- 2010 329 552
US-A1- 2015 036 921
  
  • BAPPY JAWADUL HASAN ET AL: "Inter-dependent CNNs for joint scene and object recognition", 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), IEEE, 4 December 2016 (2016-12-04), pages 3386-3391, XP033086101, DOI: 10.1109/ICPR.2016.7900157 [retrieved on 2017-04-13]
  
Note: Within nine months from the publication of the mention of the grant of the European patent, any person may give notice to the European Patent Office of opposition to the European patent granted. Notice of opposition shall be filed in a written reasoned statement. It shall not be deemed to have been filed until the opposition fee has been paid. (Art. 99(1) European Patent Convention).


Description

TECHNICAL FIELD



[0001] The present disclosure relates to the field of computer applications, and in particular to a method and an apparatus for processing an image.

BACKGROUND



[0002] Nowadays, almost all smart mobile terminals are equipped with a camera. However, users without professional photography skills generally cannot take high-ornamental value photos due to improper composition or camera settings.

[0003] A method for processing an image, executed by a mobile terminal, in accordance with the preamble of claim 1 is known from US 2010/329552 A1.

[0004] US 2017/374246 A1 discloses an image capturing apparatus and a photo composition method thereof.

[0005] US 2015/010239 A1 discloses a photographing method.

[0006] CN 105 991 925 A discloses a scene composition indicating method.

[0007] US 2015/036921 A1 discloses an image composition evaluating apparatus.

SUMMARY



[0008] The invention is defined in the independent claims.

[0009] The aspects of the present disclosure provide a method and an apparatus for processing an image, which can improve an ornamental value of an image.

[0010] A first aspect of the disclosure provides a method for processing an image, which includes operations as follows.

[0011] A preview image to be processed is acquired.

[0012] Scene information is identified from the preview image.

[0013] A composition mode corresponding to the scene information is determined.

[0014] The preview image is composed according to the composition mode.

[0015] A second aspect of the disclosure provides an apparatus for processing an image. The apparatus includes an acquisition module, an identification module, a determination module, and a composition module.

[0016] The acquisition module may be configured to acquire a preview image to be processed.

[0017] The identification module may be configured to identify scene information from the preview image.

[0018] The determination module may be configured to determine a composition mode corresponding to the scene information.

[0019] The composition module may be configured to compose the preview image according to the composition mode.

[0020] According to the aspects of the disclosure, a preview image to be processed is acquired; scene information is identified from the preview image; a composition mode corresponding to the scene information is determined; and the preview image is composed according to the composition mode. Thus, the composition mode for the preview image can be determined based on the scene information, and therefore a higher ornamental value of the composed image can be improved.

BRIEF DESCRIPTION OF DRAWINGS



[0021] In order to describe technical solutions in the embodiments of the present disclosure or in the related technology more clearly, the drawings to be used in descriptions about the embodiments or the related technology will be simply introduced below. It is apparent that the drawings merely illustrate some of the embodiments of the present disclosure. Those of ordinary skilled in the art may further obtain other drawings according to these drawings without creative work.

FIG. 1 illustrates a flowchart of a method for processing an image according to an embodiment.

FIG. 2 illustrates an architecture diagram of a neural network according to an embodiment.

FIG. 3 illustrates a diagram of categories of shooting scenes according to an embodiment.

FIG. 4 illustrates a flowchart of a method for identifying scene information from a preview image based on a neural network according to an embodiment.

FIG. 5 illustrates an architecture diagram of a neural network according to another embodiment.

FIG. 6 illustrates a flowchart of a method for identifying scene information from a preview image based on a neural network according to another embodiment.

FIG. 7 illustrates a border diagram of a foreground object in a preview image according to one embodiment.

FIG. 8 illustrates a flowchart of a method of determining a composition mode for a preview image based on scene information according to an embodiment.

FIG. 9 illustrates a flowchart of a method of determining a composition mode for a preview image based on scene information according to another embodiment.

FIG. 10 illustrates a flowchart of a method of composing a preview image based on scene information and a composition mode according to an embodiment.

FIG. 11 illustrates a block diagram of an apparatus for processing an image according to an embodiment.

FIG. 12 illustrates an internal structure diagram of a mobile terminal according to an embodiment.

FIG. 13 illustrates an internal structure diagram of a server according to an embodiment.

FIG. 14 illustrates a diagram of an image processing circuit according to an embodiment.


DETAILED DESCRIPTION



[0022] In order to make the objectives, technical solutions and advantages of the present disclosure clearer, the present disclosure will be further elaborated below in conjunction with the drawings and the embodiments. It will be appreciated that specific embodiments described here are only used to explain the present disclosure, and not intended to limit the present disclosure.

[0023] FIG. 1 illustrates a flowchart of a method for processing an image according to an embodiment. As illustrated in FIG. 1, the method for processing image includes operations illustrated in blocks 102 to 108.

[0024] At block 102, a preview image to be processed is acquired.

[0025] In the present embodiment, the preview image to be processed may include multiple consecutive frames of preview images. The multiple consecutive frames of preview images may be two or more consecutive frames of preview images. The multiple consecutive frames of preview images may be multiple frames of preview images captured by a camera of a computer device within a preset time. For example, if the camera of the computer device captures three frames of preview images within 0.1 seconds, the three frames of preview images may be used as the multiple consecutive frames of preview images.

[0026] In an embodiment, the computer device is further provided with multiple preview windows, each of which presents a respective frame of preview image.

[0027] At block 104, scene information is identified from the preview image.

[0028] In the present embodiment, scene information is identified from the preview image based on a neural network. It will be appreciated that the neural network may be a convolutional neural network (CNN). CNN is a neural network model developed for image classification and recognition based on a traditional multi-layer neural network. Compared with the traditional multi-layer neural network, the CNN introduces a convolution algorithm and a pooling algorithm. The convolution algorithm is a mathematical algorithm for performing a weighted superposition on data in a local region. The pooling algorithm is a mathematical algorithm for sampling data in a local region.

[0029] Specifically, a CNN model consists of alternative convolution layers and pooling layers. As illustrated in FIG. 2, a preview image is input at an input layer 210, an image feature extraction is performed at a convolution layer 220 on each local region of the image input at the input layer, and image features extracted at the convolutional layer are sampled at a pooling layer 230 to reduce the number of dimensions, and then the sampled image features are connected together at a number of fully connected layers 240, and final extracted features are output at a last hidden layer 250. Scene information is identified based on the final extracted features. The scene information includes background category information and foreground object category information. Herein, the background category information may include information related to a category of a background region of the preview image, which indicates which category the background region of the preview image belongs to. The background region may be classified into the following categories: landscape, beach, snow, blue sky, green space, night scene, darkness, backlight, sunrise/sunset, indoor, fireworks, spotlights, etc. The foreground object category information includes information related to a category of a foreground object of the preview image, which indicates which category the foreground object of the preview image belongs to. The foreground objects may be portraits, babies, cats, dogs, foods, etc.

[0030] In an embodiment, a softmax analyzer is configured after the last hidden layer 250 of the CNN, and the final extracted features are analyzed via the softmax analyzer to obtain the probability of a category corresponding to a background in the image and the probability of a category corresponding to a foreground object.

[0031] Before identifying the background category and the foreground object of the preview image using a neural network, the neural network needs to be trained. The training process may include the following operations.

[0032] First, a training image including at least one background training object (including landscape, beach, snow, blue sky, green space, night scene, darkness, backlight, sunrise/sunset, indoor, fireworks, spotlights, etc.) and at least one foreground training object (including main objects: portraits, babies, cats, dogs, foods, etc.) is input into the neural network. The neural network performs feature extraction according to the background training object and the foreground training object. For example, features may be extracted using scale-invariant feature transform (SIFT) features and histogram of oriented gradient (HOG) features. The background training object is then detected according to an object detection algorithm such as a single shot multibox detector (SSD) or a visual geometry group (VGG) to obtain a first prediction confidence. The foreground training object is detected according to the above object detection algorithm to obtain a second prediction confidence. The first prediction confidence is a degree of confidence that a pixel of a background region in the training image predicted using the neural network belongs to the background training object. The second prediction confidence is a degree of confidence that a pixel of a foreground region in the training image predicted using the neural network belongs to the foreground training object. The background training object and the foreground training object may be pre-labeled in the training image to obtain a first real confidence and a second real confidence. The first real confidence represents a degree of confidence that the pixel pre-labeled in the training image belongs to the background training object. The second real confidence represents a degree of confidence that the pixel pre-labeled in the training image belongs to the foreground training object. For each pixel in the image, the real confidence may be expressed as 1 (or positive) to indicate that the pixel belongs to a training object, or 0 (or negative) to indicate that the pixel does not belong to the training object.

[0033] Secondly, a difference between the first prediction confidence and the first real confidence is calculated to obtain a first loss function, and a difference between the second prediction confidence and the second real confidence is calculated to obtain a second loss function. Each the first loss function and the second loss function may be in a form of a logarithmic function, a hyperbolic function, an absolute value function, and the like.

[0034] Finally, the first loss function and the second loss function are weighted and summed to obtain an objective loss function, and the parameters of the neural network are adjusted according to the objective loss function to realize the training on the neural network.

[0035] In an embodiment, as illustrated in FIG. 3, the shooting scene of the training image may include a category of the background region, one or more foreground objects, and others. The background region may be classified into the following categories: landscape, beach, snow, blue sky, green space, night scene, darkness, backlight, sunrise/sunset, indoor, fireworks, spotlights, etc. The foreground objects may be portraits, babies, cats, dogs, foods, etc. Others may be text documents, macros, etc.

[0036] At block 106, a composition mode corresponding to the scene information is determined.

[0037] In an embodiment, the scene information includes background category information and foreground object category information. The background category information includes landscape, beach, snow, blue sky, green space, night scene, darkness, backlight, sunrise/sunset, indoor, fireworks, spotlights, etc. The foreground object category information includes portraits, babies, cats, dogs, foods, etc.

[0038] In an embodiment, the composition mode includes a nine-square lattice composition, a cross-shaped composition, a triangular composition, a diagonal composition, etc.

[0039] Specifically, at least one composition mode for multiple pieces of scene information is pre-stored in the computer device, and each piece of scene information corresponds to a respective composition mode. After determining the scene information of the preview image, the computer device calls the composition mode corresponding to the scene information. For example, when the scene information is landscape plus portrait (i.e., the background category information is landscape, and the foreground object category information is a portrait), the computer device may call the nine-square lattice composition mode to make the portrait at a golden section position in the preview image. When the scene information is landscape plus food (i.e., the background category information is landscape, and the foreground object category information is food), the computer device may call the triangular composition mode to highlight the foreground object, i.e., the food.

[0040] In an embodiment, for a same piece of scene information, multiple composition modes may be provided. For example, the scene information of landscape plus portrait may correspond to the nine-square lattice composition mode, and may also correspond to the triangular composition mode. Specifically, the final composition mode may be selected based on the foreground object category information. For example, in the scene information of landscape plus portrait, if there are a large number (three or more) of portraits, the nine-square lattice composition mode may be selected to make each portrait at a display position required by the nine-square lattice composition mode; and if there is only one portrait, the triangular composition mode may be selected to highlight the portrait.

[0041] At block 108, the preview image is composed according to the composition mode.

[0042] In the present embodiment, different pieces of scene information correspond to the same or different composition modes. Different compositions of the preview image may be implemented according to different composition modes. For example, the composition mode includes a nine-square lattice composition, a cross-shaped composition, a triangular composition, a diagonal composition, etc. The nine-square lattice composition mode is a form of golden section. That is, the preview image is equally divided into nine blocks, and a main object may be arranged on any one of four corners of the center block. The cross-shaped composition is implemented by dividing the preview image into four blocks with a horizontal line and a vertical line passing through a center of the preview image. A main object may be arranged at an intersection of the horizontal and vertical lines, that is, at the center of the preview image. The triangular composition is implemented by arranging a main object at a center of preview image and placing the main object into a triangle block. The diagonal composition is implemented by arranging the main object (for example, bridge, character, car, etc.) on a diagonal of the preview image.

[0043] Different composition modes corresponding to different pieces of scene information are pre-stored in the computer device, and the preview image is composed based on the detected scene information and the composition mode corresponding to the detected scene information.

[0044] According to the above image processing method, a preview image to be processed is acquired; scene information is identified from the preview image; a composition mode corresponding to the scene information is determined; and the preview image is composed according to the composition mode. In such a manner, the scene information of the preview image can be automatically identified, and each piece of scene information can be matched automatically with one or more respective composition modes, so that a subsequent shooting adjustment prompt for the preview image is provided based on scene information and the corresponding composition mode, and the processed image has a higher ornamental value.

[0045] In an embodiment, the image processing method further includes that: the composed preview images are presented respectively using multiple preview windows. Specifically, multiple preview windows presenting images are provided in a screen of the computer device, and each of which is for presenting one frame of preview image. More specifically, each of the multiple preview windows presents a respective frame of preview image. In an embodiment, the preview images adopt different composition modes, each frame of preview image is presented on a preview window after the composition process, and a user can compare the composition effects of the preview images based on the image presented in each preview window, and store one frame of preview image according to the comparison result.

[0046] In an embodiment, the scene information includes background category information and foreground object category information. As illustrated in FIG. 4, the operation of identifying scene information from the preview image includes actions illustrated in blocks 402 to 410.

[0047] At block 402, feature extraction is performed on the preview image using a basic network in a neural network to obtain feature data.

[0048] At block 404, the feature data is input into a classification network in the neural network to perform classification detection on a background of the preview image, and a first confidence map is output. Each pixel in the first confidence map represents a degree of confidence that the pixel of the preview image belongs to a background of the preview image.

[0049] At block 406, the feature data is input into an object detection network in the neural network to detect a foreground object from the preview image, and a second confidence map is output. Each pixel in the second confidence map represents a degree of confidence that the pixel of the preview image belongs to a foreground object.

[0050] At block 408, weighting is performed on the first confidence map and the second confidence map to obtain a final confidence map of the preview image.

[0051] At block 410, background category information and foreground object category information of the preview image are determined according to the final confidence map.

[0052] In the present embodiment, as illustrated in FIG. 5, the neural network includes a basic network 510, a classification network 520 and an object detection network 530. The basic network 510 extracts feature data of the preview image and inputs the feature data into the classification network 520 and the object detection network 530 respectively. The classification network 520 performs classification detection on a background of the preview image to obtain a first confidence map. The object detection network 530 detects a foreground object of the preview image to obtain a second confidence map. Weighting is performed on the first confidence map and the second confidence map to obtain a final confidence map of the preview image. Background category information and foreground object category information of the preview image are determined according to the final confidence map.

[0053] In statistics, a confidence interval of a probability sample is a type of interval estimate of a population parameter of the sample. The confidence interval illustrates that the extent to which the true value of the population parameter has a certain probability of falling around a measurement result. The confidence is the credibility of a measured value of the measured parameter.

[0054] In an embodiment, as claimed, the scene information further includes foreground object position information. Here, the foreground object position information includes information about a position of a foreground object, including a position of a foreground object in the preview image. As illustrated in FIG. 6, the operation of identifying scene information from the preview image includes actions illustrated in blocks 602 to 606.

[0055] At block 602, a position of a foreground object in the preview image is detected using an object detection network in the neural network, and a border detection map of a detected border is output. The border detection map of the detected border includes a vector for each pixel in the preview image. The vector represents a position of the corresponding pixel relative to the detected border. The detected border is a border of the foreground object detected in the preview image using the neural network.

[0056] At block 604, weighting is performed on the first confidence map, the second confidence map and the border detection map to obtain a final confidence map of the preview image.

[0057] At block 606, background category information, foreground object category information and foreground object position information of the preview image are determined according to the final confidence map.

[0058] Specifically, as illustrated in FIG. 7, the border detection map 710 of the detected border includes a vector for each pixel in the detected border, and the vector represents a position of the corresponding pixel relative to the detected border. The vectors for the corresponding pixel in the border detection map 710 can be represented as a first four-dimensional vector and a second four-dimensional vector. The first four-dimensional vector is x=(x1, x2, x3, x4), and elements in the first four-dimensional vector are respectively distances from the pixel to the upper, lower, left and right boundaries of the border detection map 710 of the detected border of the foreground object. The second four-dimensional vector is x'=(x1', x2', x3', x4'), and elements in the second four-dimensional vector are respectively distances from the pixel to the upper, lower, left and right boundaries of the border detection map 700 of a detected border of a preview image to which the pixel is located. It will be appreciated that the position of the foreground object in the preview image may be determined by detecting the second four-dimensional vectors for all the pixels of the border detection map 710. In an embodiment, the object detection network in the neural network detects a foreground object of the preview image, outputs the second confidence map and the border detection map 710. Weighting is performed on the first confidence map, the second confidence map and the border detection map 710 to obtain a final confidence map of the preview image. Background category information, foreground object category information and foreground object position information of the preview image may be determined based on the final confidence map. Further, the area of the detected border of the foreground object corresponding to the border detection map 710 is X=(x1+x2)(x3+x4). The border detection map 710 in the present embodiment is a rectangular block diagram. In other embodiments, the border detection map is a block diagram of an arbitrary shape, which is not specifically limited herein.

[0059] In an embodiment, as illustrated in FIG. 8, the operation of determining a composition mode corresponding to the scene information includes actions illustrated in blocks 802 to 804.

[0060] At block 802, composition feature data related to scene information is generated based on the scene information.

[0061] At block 804, a composition mode corresponding to the composition feature data is acquired from preset composition modes when the composition feature data matches preset composition feature data.

[0062] In an embodiment, the scene information includes background category information and foreground object category information. The composition feature data includes background category data, the size and location of a foreground object, a background environment, etc. Specifically, the computer device pre-stores a large number of preset composition modes, and each of the preset composition modes matches a respective one piece of preset composition feature data. A composition mode corresponding to composition feature data is acquired from the preset composition modes when the composition feature data matches preset composition feature data. For example, when the scene information of the preview image is landscape plus portrait, the composition feature data (such as the size and location of a portrait, and a category of the landscape) related to the scene information is generated. The generated composition feature data and the preset composition feature data stored in advance are compared, and when the generated composition feature data matches the preset composition feature data, the composition mode for the scene of landscape plus portrait corresponding to the composition feature data is acquired from the preset composition modes. Specifically, the computer device pre-stores a great number of excellent composition modes corresponding to different pieces of scene information (for example, landscape plus portrait). Each of the composition modes corresponds to a group of composition feature data. Therefore, the best composition mode for the preview image may be determined by comparing the composition feature data.

[0063] In an embodiment, the operation of determining a composition mode corresponding to the scene information includes that: a composition mode for the preview image is determined based on the background category information and the foreground object category information. Specifically, the computer device pre-stores at least one type of scene in the memory. The computer device calls the composition mode corresponding to a type of scene based on the type of the scene when the type of the scene is determined. For example, when the background category information is landscape and the foreground object category information is a portrait, that is, a scene type of landscape plus portrait, the corresponding composition mode is a nine-square lattice composition mode; and the composition processing result based on the scene information and the composition mode is: a position at one-third of the preview image is determined as the position of each portrait in a composition. When the background category information is landscape and the foreground object category information is food, that is, a scene type of landscape plus food, the corresponding composition mode is: a nine-square lattice composition mode; and the composition processing result based on the scene information and the composition mode is: the central position of the preview image is determined as the position of food in a composition.

[0064] In an embodiment, as illustrated in FIG. 9, the scene information includes foreground object category information, and the operation of determining a composition mode corresponding to the scene information includes actions illustrated in blocks 902 to 906.

[0065] At block 902, a main object of the preview image is determined based on the foreground object category information.

[0066] At block 904, an area of the main object in the preview image is acquired.

[0067] At block 906, a composition mode for the preview image is determined based on the area of the main object in the preview image.

[0068] In the present embodiment, the category of the foreground object is detected using the object detection network in the neural network to determine a main object of the preview image. The border detection map of a detected border of the main object is output to acquire an area of the main object in the preview image. A position of the main object in a composed image is determined based on the area of the main object in the preview image. Specifically, referring to FIG. 7, the area of the main object may be determined based on the border detection map of the detected border of the main object. When the area of the main object is larger than a preset area, the preview image may be determined to be an image taken in close-range. The composition mode for the preview image may be determined at this time. For example, a triangular composition mode is adopted to arrange a main object at a center of the preview image to highlight the main object. In other embodiments, a tripartite composition mode may also be adopted, the main object is arranged at the golden section line of the preview image, and other foreground objects are arranged near the golden section line to make the preview image compact and powerful.

[0069] In an embodiment, the image processing method further includes that: the preview image is composed based on the scene information and the composition mode. Specifically, different pieces of scene information correspond to the same or different composition modes. The preview image may be composed based on the scene information and the composition mode. For example, when the scene information is landscape plus portrait (multiple), and the composition mode corresponding to the scene information is a nine-square lattice composition mode, the composition processing result based on the scene information and the composition mode is that a position at one-third of the preview image is determined as the position of each portrait in a composition. When the scene information is landscape plus food, and the corresponding composition mode is a nine-square lattice composition mode, the composition processing result based on the scene information and the composition mode is that the central position of the preview image is determined as the position of food in a composition.

[0070] Here, different composition modes corresponding to different pieces of scene information are pre-stored in the computer device, and the preview image is composed based on detected scene information and a composition mode corresponding to the detected scene information.

[0071] In an embodiment, the composition modes include a nine-square lattice composition, a cross-shaped composition, a triangular composition, a diagonal composition, etc.

[0072] In an embodiment, as illustrated in FIG. 10, the scene information includes foreground object category information and foreground object position information, and the operation of composing the preview image based on the scene information and the composition mode includes actions illustrated in blocks 1002 to 1006.

[0073] At block 1002, a preset position of a foreground object in a composition is determined according to the foreground object category information and the composition mode.

[0074] At block 1004, a real position of the foreground object in the composition is determined based on the preset position and the foreground object position information.

[0075] At block 1006, the foreground object is arranged at the real position of the foreground object in the composition.

[0076] Specifically, preset positions are different for different foreground objects and composition modes. For example, when the foreground object category is a portrait, the preset position of the portrait may be at the one-third of an image according to the nine-square lattice composition mode; and when the foreground object category is food, the preset position of the food may be at the center of the image.

[0077] A real position of the foreground object in a composition may be determined based on the preset position in the composition and the foreground object position information. For example, the foreground object position information (x1', x2', x3', x4') (see the second four-dimensional vector in FIG. 7) may be acquired based on the border detection map, the determined preset position of the foreground object is (y1', y2', y3', y4'), and the real position of the foreground object (z1', z2', z3', z4') in the composition may be calculated according to the following formulas (1), (2), (3) and (4):









[0078] In the present embodiment, the real position of the foreground object in the composition is calculated based on the foreground object position information (the coordinate of the four-dimensional vector) and the preset position of the foreground object in the composition. Thus, the composition guiding schemes of different composition modes for different foreground objects are unified as a scheme, so that a photographer can learn and operate more easily, thereby improving the user experience.

[0079] FIG. 11 illustrates a block diagram of an apparatus for processing an image according to an embodiment. As illustrated in FIG. 11, the image processing apparatus includes an acquisition module 1110, an identification module 1120, a determination module 1130 and a composition module 1140.

[0080] The acquisition module 1110 is configured to acquire a preview image to be processed.

[0081] The identification module 1120 is configured to identify scene information from the preview image.

[0082] The determination module 1130 is configured to determine a composition mode corresponding to the scene information.

[0083] The composition module 1140 is configured to compose the preview image according to the composition mode.

[0084] In the embodiments of the present disclosure, a preview image to be processed is acquired by the acquisition module 1110; scene information is identified from the preview image by the identification module 1120; a composition mode corresponding to the scene information is determined by the determination module 1130; and the preview image is composed by the composition module 1140 according to the composition mode. The scene information of the preview image can be automatically identified, and each piece of scene information can be matched automatically with a corresponding composition mode, so that subsequent shooting adjustment prompts for the preview image are provided based on different composition modes, and the processed image has a higher ornamental value.

[0085] In an embodiment, the identification module 1120 further includes a feature extraction unit, a classification unit, an object detection unit, a calculation unit and a first determination unit.

[0086] The feature extraction unit is configured to perform feature extraction on the preview image using a basic network in a neural network to obtain feature data.

[0087] The classification unit is configured to perform classification detection on a background of the preview image using a classification network in the neural network, and output a first confidence map. Each pixel in the first confidence map represents a degree of confidence that the pixel in the preview image belongs to the background of the preview image.

[0088] The object detection unit is configured to detect a foreground object of the preview image using an object detection network in the neural network, and output a second confidence map. Each pixel in the second confidence map represents a degree of confidence that the pixel in the preview image belongs to the foreground object.

[0089] The calculation unit is configured to perform weighting on the first confidence map and the second confidence map to obtain a final confidence map of the preview image.

[0090] The first determination unit is configured to determine background category information and foreground object category information of the preview image according to the final confidence map.

[0091] In an embodiment, as claimed, the object detection unit further includes an object position detection sub-unit.

[0092] The object position detection sub-unit is configured to detect a position of a foreground object in the preview image using an object detection network in the neural network, and output a border detection map of a detected border. The border detection map includes a vector for each pixel in the preview image. The vector represents a position of the corresponding pixel relative to the detected border. The detected border is a border of the foreground object detected in the image to be detected using the neural network.

[0093] In this embodiment, the calculation unit is further configured to perform weighting on the first confidence map, the second confidence map and the border detection map to obtain a final confidence map of the preview image.

[0094] In this embodiment, the first determination unit is further configured to determine background category information, foreground object category information and foreground object position information of the preview image according to the final confidence map.

[0095] In an embodiment, the determination module 1130 further includes a generation unit and a second determination unit.

[0096] The generation unit is configured to generate composition feature data related to scene information based on the scene information.

[0097] The second determination unit is configured to acquire a composition mode corresponding to the composition feature data from preset composition modes when the composition feature data matches preset composition feature data.

[0098] In an embodiment, the determination module 1130 further includes a third determination unit.

[0099] The third determination unit is configured to determine a composition mode for the preview image based on the background category information and the foreground object category information.

[0100] In an embodiment, the determination module 1130 further includes a fourth determination unit, an area acquisition unit and a fifth determination unit.

[0101] The fourth determination unit is configured to determine a main object of the preview image based on the foreground object category information.

[0102] The area acquisition unit is configured to acquire an area of the main object in the preview image.

[0103] The fifth determination unit is configured to determine a composition mode for the preview image based on the area of the main object in the preview image.

[0104] In an embodiment, the composition module 1140 is further configured to compose a preview image according to scene information and a composition mode.

[0105] In an embodiment, the composition module 1140 further includes a sixth determination unit, a seventh determination unit and a composition unit.

[0106] The sixth determination unit is configured to determine a preset position of a foreground object in a composition according to the foreground object category information and the composition mode.

[0107] The seventh determination unit is configured to determine a real position of the foreground object in the composition based on the preset position and the foreground object position information.

[0108] The composition unit is configured to arrange the foreground object at the real position of the foreground object in the composition.

[0109] Although various operations in the flowchart in FIG. 1, FIG. 4, FIG. 6, FIG. 8, FIG. 9 and 10 are displayed in sequence according to the indication of an arrow, these operations are not necessarily performed in the sequence indicated by the arrow. Unless expressly stated herein, there is no strict sequence limitation to these operations, which may be performed in other sequences. Moreover, at least some operations in FIG. 1, FIG. 4, FIG. 6, FIG. 8, FIG. 9 and FIG. 10 may include multiple sub-operations or multiple stages. These sub-operations or stages are not necessarily completed at the same moment but may be performed at different moments, and these sub-operations or stages are not necessarily performed in a sequence but may be performed in turns or alternately with at least some of other operations or sub-operations or stages of the other operations.

[0110] The division of modules in the above image processing apparatus is only for illustration, and in other embodiments, the image processing apparatus may be divided into different modules as needed to complete all or some functions of the above image processing apparatus.

[0111] The embodiment of the present disclosure also provides a device for processing an image, which is located in a mobile terminal. The device for processing an image includes a processor, and a memory coupled to the processor. The processor is configured to: acquire a preview image to be processed; identify scene information from the preview image; determine a composition mode corresponding to the scene information; and compose the preview image according to the composition mode.

[0112] In some embodiments, the processor may be further configured to generate composition feature data related to the scene information based on the scene information; and acquire a composition mode corresponding to the composition feature data from preset composition modes when the composition feature data matches preset composition feature data.

[0113] In some embodiments, the scene information may include foreground object category information. Accordingly, the processor may be further configured to: determine a main object from the preview image based on the foreground object category information; acquire an area of the main object in the preview image; and determine the composition mode for the preview image based on the area of the main object in the preview image.

[0114] In some embodiments, the scene information includes background category information and foreground object category information. Accordingly, the processor may be further configured to: determine a category of a background of the preview image based on the background category information; determine a category of a foreground object of the preview image based on the foreground object category information; and determine the composition mode for the preview image based on the category of the background of the preview image and the category of the foreground object of the preview image.

[0115] In some embodiments, the composition mode corresponding to the scene information includes a nine-square lattice composition mode and a triangular composition mode. Accordingly, the processor may be further configured to: determine a number of foreground objects of the preview image based on foreground object category information in the scene information; compose, responsive to determining that the number of the foreground objects of the preview image is equal to or greater than a threshold, the preview image according to the nine-square lattice composition mode; and compose, responsive to determining that the number of the foreground objects of the preview image is less than the threshold, the preview image according to the triangular composition mode.

[0116] In some embodiments, the scene information include foreground object category information and foreground object position information. Accordingly, the processor may be configured to: determine a preset position of a foreground object in a composition according to the foreground object category information and the composition mode; determine a real position of the foreground object in the composition based on the preset position of the foreground object and the foreground object position information; and arrange the foreground object at the real position of the foreground object in the composition.

[0117] In some embodiments, the scene information includes background category information and foreground object category information. Accordingly, the processor may be configured to: perform feature extraction on the preview image using a basic network in a neural network to obtain feature data; input the feature data into a classification network in the neural network to perform classification detection on a background of the preview image, and output a first confidence map, wherein each pixel in the first confidence map represents a degree of confidence that the pixel in the preview image belongs to the background of the preview image; input the feature data into an object detection network in the neural network to detect a foreground object from the preview image, and output a second confidence map, wherein each pixel in the second confidence map represents a degree of confidence that the pixel in the preview image belongs to the foreground object; perform weighting on the first confidence map and the second confidence map to obtain a final confidence map of the preview image; and determine the background category information and the foreground object category information of the preview image according to the final confidence map.

[0118] In some embodiments, the scene information further includes foreground object position information. Accordingly, the processor may be configured to: detect a position of the foreground object in the preview image using the object detection network in the neural network, and output a border detection map of a detected border, wherein the border detection map of the detected border comprises a vector for each pixel in the preview image, the vector represents a position of the corresponding pixel relative to the detected border, and the detected border is a border of the foreground object detected in the preview image to be processed using the neural network; perform weighting on the first confidence map, the second confidence map and the border detection map to obtain the final confidence map of the preview image; and determine the background category information, the foreground object category information and the foreground object position information of the preview image based on the final confidence map.

[0119] The embodiment of the present disclosure also provides a mobile terminal. The mobile terminal includes a memory and a processor. The memory stores a computer program. When the computer program is executed by the processor, the processor is enabled to perform the operations of the image processing method.

[0120] The embodiment of the present disclosure also provides a computer-readable storage medium. A computer-readable storage medium has a computer program stored thereon, the computer program is executed by a processor to implement the operations of the image processing method.

[0121] FIG. 12 illustrates an internal structure diagram of a mobile terminal according to an embodiment. As illustrated in FIG. 12, the mobile terminal includes a processor 1210, a memory 1220 and a network interface 1230, connected through a system bus. The processor 1210 is configured to provide computing and control capabilities for supporting the operation of the entire mobile terminal. The memory 1220 is configured to store data, programs, or the like. The memory 1220 stores at least one computer program 12224, and the computer program 12224 may be executed by the processor to implement a wireless network communication method applied to the mobile terminal provided in the embodiments of the present disclosure. The memory 1220 may include a non-transitory storage medium 1222 and an internal memory 1224. The non-transitory storage medium 1222 stores an operating system 12222 and a computer program 12224. The computer program 12224 may be executed by the processor 1210 to implement a neural network model processing method or image processing method provided in each of the above embodiments. The internal memory 1224 provides a cache operation environment for the operating system and the computer program in the non-transitory storage medium. The network interface 1230 may be an Ethernet card or a wireless network card for communicating with an external mobile terminal. The mobile terminal may be a mobile phone, a tablet, a personal digital assistant, a wearable device, or the like.

[0122] FIG. 13 illustrates an internal structure diagram of a server (or a cloud, etc.) according to an embodiment. As illustrated in FIG. 13, the server includes a processor 1310, a non-transitory storage medium1322, an internal memory 1324 and a network interface 1330, connected through a system bus. The processor 1310 is configured to provide computing and control capabilities for supporting the operation of the entire mobile terminal. The memory 1320 is configured to store data, programs, or the like. The memory 1320 stores at least one computer program 13224, and the computer program 13224 may be executed by the processor 1310 to implement a wireless network communication method applied to the mobile terminal provided in the embodiments of the present disclosure. The memory 1320 may include a non-transitory storage medium 1322 and an internal memory 1324. The non-transitory storage medium 1322 stores an operating system 13222 and a computer program 13224. The computer program 13224 may be executed by the processor 1310 to implement a neural network processing method or image processing method provided in each of the above embodiments. The internal memory provides a cache operation environment for the operating system and the computer program in the non-transitory storage medium. The network interface 1330 may be an Ethernet card or a wireless network card for communicating with an external mobile terminal. The server may be implemented with a stand-alone server or a server cluster consisting of multiple servers. A person skilled in the art may understand that the structure illustrated in FIG. 13 is only a partial structure block diagram associated with the solution of the disclosure, and is not limitative to the server to which the solution of the disclosure is applied. Specifically, the server may include parts more or fewer than those illustrated in the figure, or combine some parts, or have different part arrangements.

[0123] Each module in the neural network model processing apparatus or image processing apparatus provided in the embodiments of the present disclosure may be implemented in the form of a computer program. The computer program may operate on a mobile terminal or a server. A program module formed by the computer program may be stored on the memory of the mobile terminal or the server. The computer program is executed by a processor to implement the operations of the method described in the embodiments of the present disclosure.

[0124] A computer program product including an instruction is provided. When the computer program product operates on a computer, the computer is enabled to perform the neural network model processing method or image processing method.

[0125] The embodiment of the present disclosure also provides a mobile terminal. The mobile terminal includes an image processing circuit. The image processing circuit may be implemented through hardware and/or software components, and may include various processing units defining an image signal processing (ISP) pipeline. FIG. 14 illustrates a diagram of an image processing circuit according to an embodiment. As illustrated in FIG. 14, for convenience of explanation, only various aspects of the image processing technology related to the embodiments of the present disclosure are illustrated.

[0126] As illustrated in FIG. 14, the image processing circuit includes an ISP processor 1440 and a control logic device 1450. Image data captured by an imaging device 1410 is first processed by the ISP processor 1440, and the ISP processor 1440 analyzes the image data to capture image statistics information that can be used to determine one or more control parameters of the imaging device 1410. The imaging device 1410 may include a camera having one or more lenses 1412 and image sensors 1414. The image sensor 1414 may include a color filter array (for example, Bayer filter). The image sensor 1414 may acquire light intensity and wavelength information captured by each of the imaging pixels in the image sensor 1414 and provide a set of original image data that can be processed by the ISP processor 1440. A sensor 1420 (for example, a gyroscope) may provide captured image processing parameters (for example, anti-shake parameters) for the ISP processor 1440 based on a sensor 1420 interface type. The sensor 1420 interface may be a standard mobile imaging architecture (SMIA) interface, another serial or parallel camera interface, or a combination of the above interfaces.

[0127] In addition, the image sensor 1414 may also send original image data to the sensor 1420. The sensor 1420 may provide the original image data for the ISP processor 1440 based on the sensor 1420 interface type, or the sensor 1420 may store the original image data into an image memory 1430.

[0128] The ISP processor 1440 processes the original image data pixel by pixel in a variety of formats. For example, each image pixel may have a bit depth of 8, 10, 12, or 14 bits. The ISP processor 1440 may perform one or more image processing operations on the original image data, and may collect statistical information about the image data. The image processing operations may be performed according to the same or different bit depths.

[0129] The ISP processor 1440 may also receive image data from the image memory 1430. For example, the sensor 1420 interface sends the original image data to the image memory 1430, and the original image data in the image memory 1430 is then provided for the ISP processor 1440 for processing. The image memory 1430 may be part of a memory apparatus, a storage device, or a separate dedicated memory within a mobile terminal, and may include direct memory access (DMA) features.

[0130] In response to receiving the original image data from the image sensor 1414 interface or from the sensor 1420 interface or from the image memory 1430, the ISP processor 1440 may perform one or more image processing operations, such as time domain filtering. The processed image data may be sent to the image memory 1430 for additional processing prior to being displayed. The ISP processor 1440 receives processed data from the image memory 1430 and performs image data processing on the processed data in an original domain and in RGB and YCbCr color spaces. The image data processed by the ISP processor 1440 may be output to a display 1470, so as to be viewed by a user and/or further processed by a graphics engine or a graphics processing unit (GPU). Additionally, the data output by the ISP processor 1440 may also be sent to the image memory 1430, and the display 1470 may read image data from the image memory 1430. In an embodiment, the image memory 1430 may be configured to implement one or more frame buffers. Additionally, the data output by the ISP processor 1440 may be sent to an encoder/decoder 1460 to encode/decode image data. The encoded image data may be saved and decompressed before being displayed on the display 1470 device. The encoder/decoder 1460 may be implemented by a CPU or GPU or coprocessor.

[0131] Statistical data determined by the ISP processor 1440 may be sent to a control logic device 1450. For example, the statistical data may include image sensor 1414 statistical information such as auto exposure, auto white balance, auto focus, flicker detection, black level compensation, second lens 1412 shading correction. The control logic device 1450 may include a processor and/or a micro controller that executes one or more routines (such as firmware). The one or more routines may determine control parameters of the imaging device 1410 and control parameters of the ISP processor 1440 according to the received statistical data. For example, the control parameters of the imaging device 1410 may include sensor 1420 control parameters (such as gain, integration time of exposure control, and anti-shake parameters), camera flash control parameters, lens 1412 control parameters (such as focus or zoom focal length), or a combination of these parameters, etc. The control parameters of the ISP processor may include a gain level and color correction matrix for automatic white balance and color adjustment (e.g., during RGB processing), and shading correction parameters of the lens 1412.

[0132] In some embodiments, the image processing circuit may be configured to: generate composition feature data related to the scene information based on the scene information; and acquire a composition mode corresponding to the composition feature data from preset composition modes when the composition feature data matches preset composition feature data.

[0133] In some embodiments, the scene information may include foreground object category information. Accordingly, the image processing circuit may be configured to: determine a main object from the preview image based on the foreground object category information; acquire an area of the main object in the preview image; and determine the composition mode for the preview image based on the area of the main object in the preview image.

[0134] Any reference used in the present disclosure to a memory, storage, a database or other media may include non-transitory and/or transitory memories. The appropriate non-transitory memory may include a read only memory (ROM), a programmable ROM (PROM), an electrically programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), or a flash memory. The transitory memory may include a random access memory (RAM), used as an external cache memory. As being illustrative instead of being limitative, the RAM may be obtained in multiple forms such as a static RAM (SRAM), a dynamic RAM (DRAM), a synchronous DRAM (SDRAM), a dual data rate SDRAM (DDR SDRAM), an enhanced SDRAM (ESDRAM), a synchlink DRAM (SLDRAM), a rambus direct RAM (RDRAM), a direct rambus dynamic RAM (DRDRAM), and a rambus dynamic RAM (RDRAM).

[0135] The above embodiments only describe several implementations of the present disclosure more specifically and in more detail, but cannot be thus understood as limitation to the scope of the present disclosure. Those of ordinary skill in the art may also make several variations and improvements without departing from the concept of the present disclosure. These variations and improvements fall within the protection scope of the present disclosure. Therefore, the protection scope of the present disclosure should be determined by the appended claims.


Claims

1. A method for processing an image, executed by a mobile terminal, comprising:

acquiring (102) a preview image to be processed;

identifying (104) scene information from the preview image, wherein the scene information comprises background category information, foreground object category information and foreground object position information, the foreground object category information indicating which category a foreground object of the preview image belongs to;

determining (106) a composition mode corresponding to the scene information; and

composing (108) the preview image according to the composition mode,

characterized in that identifying (104) the scene information from the preview image comprises:

performing (402) feature extraction on the preview image using a basic network (510) in a neural network to obtain feature data;

inputting (404) the feature data into a classification network (520) in the neural network to perform classification detection on a background of the preview image, and outputting a first confidence map, wherein each pixel in the first confidence map represents a degree of confidence that the pixel in the preview image belongs to the background of the preview image;

inputting (406) the feature data into an object detection network (530) in the neural network to detect the foreground object from the preview image, and outputting a second confidence map, wherein each pixel in the second confidence map represents a degree of confidence that the pixel in the preview image belongs to the foreground object;

detecting (602) a position of the foreground object in the preview image using the object detection network in the neural network, and outputting a border detection map of a detected border, wherein the border detection map of the detected border comprises a vector for each pixel in the preview image, the vector represents a position of the corresponding pixel relative to the detected border, and the detected border is a border of the foreground object detected in the preview image to be processed using the neural network;

performing (604) weighting on the first confidence map, the second confidence map and the border detection map to obtain the final confidence map of the preview image; and

determining (606) the background category information, the foreground object category information and the foreground object position information of the preview image based on the final confidence map,

wherein determining (106) the composition mode corresponding to the scene information comprises:

generating (802) composition feature data related to the scene information based on the scene information;

comparing the generated composition feature data with preset composition feature data; and

acquiring (804) a composition mode corresponding to the generated composition feature data from preset composition modes when the generated composition feature data matches the preset composition feature data, and

wherein the preset composition modes comprise a nine-square lattice composition mode and a triangular composition mode.


 
2. The method of claim 1, wherein the scene information comprises foreground object category information, and
wherein determining (106) the composition mode corresponding to the scene information comprises:

determining (902) a main object from the preview image based on the foreground object category information;

acquiring (904) an area of the main object in the preview image; and

determining (906) the composition mode for the preview image based on the area of the main object in the preview image.


 
3. The method of claim 1, wherein the scene information comprises background category information and foreground object category information, and
wherein determining (106) the composition mode corresponding to the scene information comprises:

determining a category of a background of the preview image based on the background category information;

determining a category of the foreground object of the preview image based on the foreground object category information; and

determining the composition mode for the preview image based on the category of the background of the preview image and the category of the foreground object of the preview image.


 
4. The method of any one of claims 1 to 3, wherein the scene information comprises foreground object category information and foreground object position information, and
wherein composing (108) the preview image according to the composition mode comprises:

determining (1002) a preset position of a foreground object in a composition according to the foreground object category information and the composition mode;

determining (1004) a real position of the foreground object in the composition based on the preset position of the foreground object and the foreground object position information; and

arranging (1006) the foreground object at the real position of the foreground object in the composition.


 
5. An apparatus for processing an image, comprising:

an acquisition module (1110), configured to acquire a preview image to be processed;

an identification module (1120), configured to identify scene information from the preview image, wherein the scene information comprises background category information, foreground object category information and foreground object position information, the foreground object category information indicating which category a foreground object of the preview image belongs to;

a determination module (1130), configured to determine a composition mode corresponding to the scene information; and

a composition module (1140), configured to compose the preview image according to the composition mode,

characterized in that the identification module (1120) comprises:

a feature extraction unit, configured to perform feature extraction on the preview image using a basic network (510) in a neural network to obtain feature data;

a classification unit, configured to perform classification detection on a background of the preview image using a classification network (520) in the neural network, and output a first confidence map, wherein each pixel in the first confidence map represents a degree of confidence that the pixel in the preview image belongs to the background of the preview image;

an object detection unit, configured to detect the foreground object from the preview image using an object detection network (530) in the neural network, and output a second confidence map, wherein each pixel in the second confidence map represents a degree of confidence that the pixel in the preview image belongs to the foreground object;

the object detection unit comprises an object position detection sub-unit, which is configured to detect a position of the foreground object in the preview image using the object detection network in the neural network, and outputting a border detection map of a detected border, wherein the border detection map of the detected border comprises a vector for each pixel in the preview image, the vector represents a position of the corresponding pixel relative to the detected border, and the detected border is a border of the foreground object detected in the preview image to be processed using the neural network; and

the calculation unit is further configured to perform weighting on the first confidence map, the second confidence map and the border detection map to obtain the final confidence map of the preview image; and

the first determination unit is further configured to determine the background category information, the foreground object category information and the foreground object position information of the preview image according to the final confidence map,

the determination module (1130) comprises:

a generation unit, configured to generate composition feature data related to the scene information based on the scene information;

a comparing unit, configured to compare the generated composition feature data with preset composition feature data; and

a second determination unit, configured to acquire a composition mode corresponding to the generated composition feature data from preset composition modes when the generated composition feature data matches the preset composition feature data,

wherein the preset composition modes comprise a nine-square lattice composition mode and a triangular composition mode.


 
6. The apparatus of claim 5, wherein the determination module (1130) comprises:
a third determination unit, configured to determine the composition mode for the preview image based on background category information and foreground object category information.
 
7. The apparatus of claim 5, wherein the determination module (1130) comprises:

a fourth determination unit, configured to determine a main object from the preview image based on the foreground object category information;

an area acquisition unit, configured to acquire an area of the main object in the preview image; and

a fifth determination unit, configured to determine the composition mode for the preview image based on the area of the main object in the preview image.


 
8. The apparatus of any one of claims 5 to 7, wherein the composition module (1140) comprises:

a sixth determination unit, configured to determine a preset position of a foreground object in a composition according to foreground object category information and the composition mode;

a seventh determination unit, configured to determine a real position of the foreground object in the composition based on the preset position of the foreground object and foreground object position information; and

a composition unit, arrange the foreground object at the real position of the foreground object in the composition.


 


Ansprüche

1. Verfahren zur Verarbeitung eines Bildes, ausgeführt durch ein mobiles Endgerät, umfassend:

Erfassen (102) eines zu verarbeitenden Vorschaubildes;

Identifizieren (104) von Szeneninformationen aus dem Vorschaubild, wobei die Szeneninformationen Hintergrundkategorieinformationen, Vordergrundobjektkategorie-Informationen und Vordergrundobjekt-Positionsinformationen umfassen, und die Vordergrundobjektkategorie-Informationen angeben, zu welcher Kategorie ein Vordergrundobjekt des Vorschaubildes gehört;

Bestimmen (106) eines den Szeneninformationen entsprechenden Kompositionsmodus; und

Zusammenstellen (108) des Vorschaubildes gemäß dem Kompositionsmodus;

dadurch gekennzeichnet, dass das Identifizieren (104) der Szeneninformationen aus dem Vorschaubild umfasst:

Durchführen (402) von Merkmalsextraktion an dem Vorschaubild unter Verwendung eines Basisnetzwerks (510) in einem neuronalen Netzwerk, zum Erhalten von Merkmalsdaten;

Eingeben (404) der Merkmalsdaten in ein Klassifizierungsnetzwerk (520) in dem neuronalen Netzwerk zum Durchführen von Klassifizierungsdetektion an einem Hintergrund des Vorschaubildes und Ausgeben einer ersten Konfidenzkarte, wobei jedes Pixel in der ersten Konfidenzkarte einen Konfidenzgrad dafür wiedergibt, dass das Pixel in dem Vorschaubild zu dem Hintergrund des Vorschaubildes gehört;

Eingeben (406) der Merkmalsdaten in ein Objektdetektionsnetzwerk (530) in dem neuronalen Netzwerk, um das Vordergrundobjekt aus dem Vorschaubild zu detektieren, und Ausgeben einer zweiten Konfidenzkarte, wobei jedes Pixel in der zweiten Konfidenzkarte einen Konfidenzgrad dafür wiedergibt, dass das Pixel in dem Vorschaubild zu dem Vordergrundobjekt gehört;

Detektieren (602) einer Position des Vordergrundobjekts in dem Vorschaubild unter Verwendung des Objektdetektionsnetzwerks in dem neuronalen Netzwerk und Ausgeben einer Randdetektionskarte eines detektierten Randes, wobei die Randdetektionskarte des detektierten Randes einen Vektor für jedes Pixel in dem Vorschaubild umfasst, der Vektor eine Position des entsprechenden Pixels relativ zu dem detektierten Rand wiedergibt und der detektierte Rand ein Rand des in dem Vorschaubild detektierten Vordergrundobjekts ist, das unter Verwendung des neuronalen Netzwerks verarbeitet werden soll;

Durchführen (604) von Gewichtung an der ersten Konfidenzkarte, der zweiten Konfidenzkarte und der Randdetektionskarte, um die endgültige Konfidenzkarte des Vorschaubildes zu erhalten; und

Bestimmen (606) der Hintergrundkategorieinformationen, der Vordergrundobjektkategorie-Informationen und der Vordergrundobjekt-Positionsinformationen des Vorschaubildes, basierend auf der endgültigen Konfidenzkarte,

wobei das Bestimmen (106) des Kompositionsmodus entsprechend den Szeneninformationen umfasst:

Erzeugen (802) von Kompositionsmerkmalsdaten mit Bezug auf die Szeneninformationen, basierend auf den Szeneninformationen;

Vergleichen der erzeugten Kompositionsmerkmalsdaten mit vorgegebenen Kompositionsmerkmalsdaten; und

Erfassen (804) eines Kompositionsmodus entsprechend den erzeugten Kompositionsmerkmalsdaten aus vorgegebenen Kompositionsmodi, wenn die erzeugten Kompositionsmerkmalsdaten den vorgegebenen Kompositionsmerkmalsdaten entsprechen, und

wobei die vorgegebenen Kompositionsmodi einen Neunquadrat-Gitter-Kompositionsmodus und einen Dreiecks-Kompositionsmodus umfassen.


 
2. Verfahren nach Anspruch 1, wobei die Szeneninformationen Vordergrundobjektkategorie-Informationen umfassen, und
wobei das Bestimmen (106) des Kompositionsmodus entsprechend den Szeneninformationen umfasst:

Bestimmen (902) eines Hauptobjekts aus dem Vorschaubild, basierend auf den Vordergrundobjektkategorie-Informationen;

Erfassen (904) eines Gebiets des Hauptobjekts in dem Vorschaubild; und

Bestimmen (906) des Kompositionsmodus für das Vorschaubild, basierend auf dem Gebiet des Hauptobjekts in dem Vorschaubild.


 
3. Verfahren nach Anspruch 1, wobei die Szeneninformationen Hintergrundkategorieinformationen und Vordergrundobjektkategorie-Informationen umfassen, und
wobei das Bestimmen (106) des Kompositionsmodus entsprechend den Szeneninformationen umfasst:

Bestimmen einer Kategorie eines Hintergrunds des Vorschaubildes, basierend auf den Hintergrundkategorieinformationen;

Bestimmen einer Kategorie des Vordergrundobjekts des Vorschaubildes, basierend auf den Vordergrundobjektkategorie-Informationen; und

Bestimmen des Kompositionsmodus für das Vorschaubild, basierend auf der Kategorie des Hintergrunds des Vorschaubildes und der Kategorie des Vordergrundobjekts des Vorschaubildes.


 
4. Verfahren nach einem der Ansprüche 1 bis 3, wobei die Szeneninformationen Vordergrundobjektkategorie-Informationen und Vordergrundobjekt-Positionsinformationen umfassen, und
wobei das Zusammenstellen (108) des Vorschaubildes gemäß dem Kompositionsmodus umfasst:

Bestimmen (1002) einer vorgegebenen Position eines Vordergrundobjekts in einer Komposition gemäß den Vordergrundobjektkategorie-Informationen und dem Kompositionsmodus;

Bestimmen (1004) einer tatsächlichen Position des Vordergrundobjekts in der Komposition, basierend auf der vorgegebenen Position des Vordergrundobjekts und den Vordergrundobjekt-Positionsinformationen; und

Anordnen (1006) des Vordergrundobjekts an der tatsächlichen Position des Vordergrundobjekts in der Komposition.


 
5. Vorrichtung zum Verarbeiten eines Bildes, umfassend:

ein Erfassungsmodul (1110), ausgelegt zum Erfassen eines zu verarbeitenden Vorschaubildes;

ein Identifizierungsmodul (1120), ausgelegt zum Identifizieren von Szeneninformationen aus dem Vorschaubild, wobei die Szeneninformationen Hintergrundkategorieinformationen, Vordergrundobjektkategorie-Informationen und Vordergrundobjekt-Positionsinformationen umfassen, wobei die Vordergrundobjektkategorie-Informationen angeben, zu welcher Kategorie ein Vordergrundobjekt des Vorschaubildes gehört;

ein Bestimmungsmodul (1130), ausgelegt zum Bestimmen eines Kompositionsmodus entsprechend den Szeneninformationen; und

ein Kompositionsmodul (1140), ausgelegt zum Zusammenstellen des Vorschaubildes gemäß dem Kompositionsmodus,

dadurch gekennzeichnet, dass das Identifizierungsmodul (1120) umfasst:

eine Merkmalsextraktionseinheit, ausgelegt zum Durchführen von Merkmalsextraktion an dem Vorschaubild unter Verwendung eines Basisnetzwerks (510) in einem neuronalen Netzwerk, um Merkmalsdaten zu erhalten;

eine Klassifizierungseinheit, ausgelegt zum Durchführen von Klassifizierungsdetektion an einem Hintergrund des Vorschaubildes unter Verwendung eines Klassifizierungsnetzwerks (520) in dem neuronalen Netzwerk und Ausgeben einer ersten Konfidenzkarte, wobei jedes Pixel in der ersten Konfidenzkarte einen Konfidenzgrad dafür wiedergibt, dass das Pixel in dem Vorschaubild zu dem Hintergrund des Vorschaubildes gehört;

eine Objektdetektionseinheit, ausgelegt zum Detektieren des Vordergrundobjekts aus dem Vorschaubild unter Verwendung eines Objektdetektionsnetzwerks (530) in dem neuronalen Netzwerk und Ausgeben einer zweiten Konfidenzkarte, wobei jedes Pixel in der zweiten Konfidenzkarte einen Konfidenzgrad dafür wiedergibt, dass das Pixel in dem Vorschaubild zu dem Vordergrundobjekt gehört;

wobei die Objektdetektionseinheit eine Objektpositionsdetektions-Untereinheit umfasst, die ausgelegt ist zum Detektieren einer Position des Vordergrundobjekts in dem Vorschaubild unter Verwendung des Objektdetektionsnetzwerks in dem neuronalen Netzwerk und Ausgeben einer Randdetektionskarte eines detektierten Randes, wobei die Randdetektionskarte des detektierten Randes einen Vektor für jedes Pixel in dem Vorschaubild umfasst, der Vektor eine Position des dazugehörigen Pixels relativ zu dem detektierten Rand wiedergibt und der detektierte Rand ein Rand des in dem Vorschaubild detektierten Vordergrundobjekts ist, das unter Verwendung des neuronalen Netzwerks verarbeitet werden soll; und

die Berechnungseinheit ferner dazu ausgelegt ist, Gewichtung an der ersten Konfidenzkarte, der zweiten Konfidenzkarte und der Randdetektionskarte durchzuführen, um die endgültige Konfidenzkarte des Vorschaubildes zu erhalten; und

die erste Bestimmungseinheit ferner dazu ausgelegt ist, die Hintergrundkategorieinformationen, die Vordergrundobjektkategorie-Informationen und die Vordergrundobjekt-Positionsinformationen des Vorschaubildes gemäß der endgültigen Konfidenzkarte zu bestimmen,

das Bestimmungsmodul (1130) umfasst:

eine Erzeugungseinheit, ausgelegt zum Erzeugen von Kompositionsmerkmalsdaten mit Bezug auf die Szeneninformationen, basierend auf den Szeneninformationen;

eine Vergleichseinheit, ausgelegt zum Vergleichen der erzeugten Kompositionsmerkmalsdaten mit vorgegebenen Kompositionsmerkmalsdaten; und

eine zweite Bestimmungseinheit, ausgelegt zum Erfassen eines Kompositionsmodus entsprechend den erzeugten Kompositionsmerkmalsdaten aus vorgegebenen Kompositionsmodi, wenn die erzeugten Kompositionsmerkmalsdaten den vorgegebenen Kompositionsmerkmalsdaten entsprechen,

wobei die vorgegebenen Kompositionsmodi einen Neunquadrat-Gitter-Kompositionsmodus und einen Dreiecks-Kompositionsmodus umfassen.


 
6. Vorrichtung nach Anspruch 5, wobei das Bestimmungsmodul (1130) umfasst:
eine dritte Bestimmungseinheit, ausgelegt zum Bestimmen des Kompositionsmodus für das Vorschaubild, basierend auf Hintergrundkategorieinformationen und Vordergrundobjektkategorie-Informationen.
 
7. Vorrichtung nach Anspruch 5, wobei das Bestimmungsmodul (1130) umfasst:

eine vierte Bestimmungseinheit, ausgelegt zum Bestimmen eines Hauptobjekts aus dem Vorschaubild, basierend auf den Vordergrundobjektkategorie-Informationen;

eine Gebietserfassungseinheit, ausgelegt zum Erfassen eines Gebiets des Hauptobjekts in dem Vorschaubild; und

eine fünfte Bestimmungseinheit, ausgelegt zum Bestimmen des Kompositionsmodus für das Vorschaubild, basierend auf dem Gebiet des Hauptobjekts in dem Vorschaubild.


 
8. Vorrichtung nach einem der Ansprüche 5 bis 7, wobei das Kompositionsmodul (1140) umfasst:

eine sechste Bestimmungseinheit, ausgelegt zum Bestimmen einer vorgegebenen Position eines Vordergrundobjekts in einer Komposition gemäß Vordergrundobjektkategorie-Informationen und dem Kompositionsmodus;

eine siebte Bestimmungseinheit, ausgelegt zum Bestimmen einer tatsächlichen Position des Vordergrundobjekts in der Komposition, basierend auf der vorgegebenen Position des Vordergrundobjekts und Vordergrundobjekt-Positionsinformationen; und

eine Kompositionseinheit, Anordnen des Vordergrundobjekts an der tatsächlichen Position des Vordergrundobjekts in der Komposition.


 


Revendications

1. Procédé pour le traitement d'une image, exécuté par un terminal mobile, comprenant :

l'acquisition (102) d'une image de prévisualisation à traiter ;

l'identification (104) d'informations de scène à partir de l'image de prévisualisation, les informations de scène comprenant des informations de catégorie d'arrière-plan, des informations de catégorie d'objet de premier plan et des informations de position d'objet de premier plan, les informations de catégorie d'objet de premier plan indiquant à quelle catégorie un objet de premier plan de l'image de prévisualisation appartient ;

la détermination (106) d'un mode de composition correspondant aux informations de scène ; et

la composition (108) de l'image de prévisualisation selon le mode de composition,

caractérisé en ce que l'identification (104) des informations de scène à partir de l'image de prévisualisation comprend :

la réalisation (402) d'une extraction de caractéristiques sur l'image de prévisualisation à l'aide d'un réseau basique (510) dans un réseau neuronal pour obtenir des données de caractéristiques ;

l'entrée (404) des données de caractéristiques dans un réseau de classification (520) dans le réseau neuronal pour réaliser une détection de classification sur un arrière-plan de l'image de prévisualisation, et la fourniture en sortie d'une première carte de confiance,

chaque pixel dans la première carte de confiance représentant un degré de confiance que le pixel dans l'image de prévisualisation appartient à l'arrière-plan de l'image de prévisualisation ;

l'entrée (406) des données de caractéristiques dans un réseau de détection d'objet (530) dans le réseau neuronal pour détecter l'objet de premier plan à partir de l'image de prévisualisation, et la fourniture en sortie d'une deuxième carte de confiance, chaque pixel dans la deuxième carte de confiance représentant un degré de confiance que le pixel dans l'image de prévisualisation appartient à l'objet de premier plan ;

la détection (602) d'une position de l'objet de premier plan dans l'image de prévisualisation à l'aide du réseau de détection d'objet dans le réseau neuronal, et

la fourniture en sortie d'une carte de détection de bordure d'une bordure détectée, la carte de détection de bordure de la bordure détectée comprenant un vecteur pour chaque pixel dans l'image de prévisualisation, le vecteur représentant une position du pixel correspondant relativement à la bordure détectée, et la bordure détectée étant une bordure de l'objet de premier plan détectée dans l'image de prévisualisation à traiter à l'aide du réseau neuronal ;

la réalisation (604) d'une pondération sur la première carte de confiance, la deuxième carte de confiance et

la carte de détection de bordure pour obtenir la carte de confiance finale de l'image de prévisualisation ; et

la détermination (606) des informations de catégorie d'arrière-plan, des informations de catégorie d'objet de premier plan et des informations de position d'objet de premier plan de l'image de prévisualisation sur la base de la carte de confiance finale,

la détermination (106) du mode de composition correspondant aux informations de scène comprenant :

la génération (802) de données de caractéristiques de composition relatives aux informations de scène sur la base des informations de scène ;

la comparaison des données de caractéristiques de composition générées avec des données de caractéristiques de composition prédéfinies ; et

l'acquisition (804) d'un mode de composition correspondant aux données de caractéristiques de composition générées à partir de modes de composition prédéfinis lorsque les données de caractéristiques de composition générées sont appariées aux données de caractéristiques de composition prédéfinies, et

les modes de composition prédéfinis comprenant un mode de composition en grille à neuf carrés et un mode de composition triangulaire.


 
2. Procédé selon la revendication 1, dans lequel les informations de scène comprennent des informations de catégorie d'objet de premier plan, et
dans lequel la détermination (106) du mode de composition correspondant aux informations de scène comprend :

la détermination (902) d'un objet principal à partir de l'image de prévisualisation sur la base des informations de catégorie d'objet de premier plan ;

l'acquisition (904) d'une zone de l'objet principal dans l'image de prévisualisation ; et

la détermination (906) du mode de composition pour l'image de prévisualisation sur la base de la zone de l'objet principal dans l'image de prévisualisation.


 
3. Procédé selon la revendication 1, dans lequel les informations de scène comprennent des informations de catégorie d'objet d'arrière plan et des informations de catégorie d'objet de premier plan, et
dans lequel la détermination (106) du mode de composition correspondant aux informations de scène comprend :

la détermination d'une catégorie d'un arrière-plan de l'image de prévisualisation sur la base des informations de catégorie d'arrière-plan ;

la détermination d'une catégorie de l'objet de premier plan de l'image de prévisualisation sur la base des informations de catégorie d'objet de premier plan ; et

la détermination du mode de composition pour l'image de prévisualisation sur la base de la catégorie de l'arrière-plan de l'image de prévisualisation et de la catégorie de l'objet de premier plan de l'image de prévisualisation.


 
4. Procédé selon l'une quelconque des revendications 1 à 3, dans lequel les informations de scène comprennent des informations de catégorie d'objet de premier plan et des informations de position d'objet de premier plan, et
dans lequel la composition (108) de l'image de prévisualisation selon le mode de composition comprend :

la détermination (1002) d'une position prédéfinie d'un objet de premier plan dans une composition selon les informations de catégorie d'objet de premier plan et le mode de composition ;

la détermination (1004) d'une position réelle de l'objet de premier plan dans la composition sur la base de la position prédéfinie de l'objet de premier plan et des informations de position d'objet de premier plan ; et

l'agencement (1006) de l'objet de premier plan à la position réelle de l'objet de premier plan dans la composition.


 
5. Appareil pour le traitement d'une image, comprenant :

un module d'acquisition (1110), configuré pour acquérir une image de prévisualisation à traiter ;

un module d'identification (1120), configuré pour identifier des informations de scène à partir de l'image de prévisualisation, les informations de scène comprenant des informations de catégorie d'arrière-plan, des informations de catégorie d'objet de premier plan et des informations de position d'objet de premier plan, les informations de catégorie d'objet de premier plan indiquant à quelle catégorie un objet de premier plan de l'image de prévisualisation appartient ;

un module de détermination (1130), configuré pour déterminer un mode de composition correspondant aux informations de scène ; et

un module de composition (1140), configuré pour composer l'image de prévisualisation selon le mode de composition,

caractérisé en ce que le module d'identification (1120) comprend :

une unité d'extraction de caractéristiques, configurée pour réaliser une extraction de caractéristiques sur l'image de prévisualisation à l'aide d'un réseau basique (510) dans un réseau neuronal pour obtenir des données de caractéristiques ;

une unité de classification, configurée pour réaliser une détection de classification sur un arrière-plan de l'image de prévisualisation à l'aide d'un réseau de classification (520) dans le réseau neuronal, et

fournir en sortie une première carte de confiance, chaque pixel dans la première carte de confiance représentant un degré de confiance que le pixel dans l'image de prévisualisation appartient à l'arrière-plan de l'image de prévisualisation ;

une unité de détection d'objet, configurée pour détecter l'objet de premier plan à partir de l'image de prévisualisation à l'aide d'un réseau de détection d'objet (530) dans le réseau neuronal, et fournir en sortie une deuxième carte de confiance, chaque pixel dans la deuxième carte de confiance représentant un degré de confiance que le pixel dans l'image de prévisualisation appartient à l'objet de premier plan ;

l'unité de détection d'objet comprend une sous-unité de détection de position d'objet, qui est configurée pour détecter une position de l'objet de premier plan dans l'image de prévisualisation à l'aide du réseau de détection d'objet dans le réseau neuronal, et fournir en sortie une carte de détection de bordure d'une bordure détectée, la carte de détection de bordure de la bordure détectée comprenant un vecteur pour chaque pixel dans l'image de prévisualisation, le vecteur représentant une position du pixel correspondant par rapport à la bordure détectée, et la bordure détectée étant une bordure de l'objet de premier plan détectée dans l'image de prévisualisation à traiter à l'aide du réseau neuronal ; et

l'unité de calcul est en outre configurée pour réaliser une pondération sur la première carte de confiance, la deuxième carte de confiance et la carte de détection de bordure pour obtenir la carte de confiance finale de l'image de prévisualisation ; et

la première unité de détermination est en outre configurée pour déterminer les informations de catégorie d'arrière-plan, les informations de catégorie d'objet de premier plan et les informations de position d'objet de premier plan de l'image de prévisualisation selon la carte de confiance finale,

le module de détermination (1130) comprend :

une unité de génération, configurée pour générer des données de caractéristiques de composition relatives aux informations de scène sur la base des informations de scène ;

une unité de comparaison, configurée pour comparer les données de caractéristiques de composition générées avec des données de caractéristiques de composition prédéfinies ; et

une deuxième unité de détermination, configurée pour acquérir un mode de composition correspondant aux données de caractéristiques de composition générées à partir de modes de composition prédéfinis lorsque les données de caractéristiques de composition générées sont appariées aux données de caractéristiques de composition prédéfinies, et

les modes de composition prédéfinis comprenant un mode de composition en treillis à neuf carrés et un mode de composition triangulaire.


 
6. Appareil selon la revendication 5, dans lequel le module de détermination (1130) comprend :
une troisième unité de détermination, configurée pour déterminer le mode de composition pour l'image de prévisualisation sur la base d'informations de catégorie d'arrière-plan et d'informations de catégorie d'objet de premier plan.
 
7. Appareil selon la revendication 5, dans lequel le module de détermination (1130) comprend :

une quatrième unité de détermination, configurée pour déterminer un objet principal à partir de l'image de prévisualisation sur la base des informations de catégorie d'objet de premier plan ;

une unité d'acquisition de zone, configurée pour acquérir une zone de l'objet principal dans l'image de prévisualisation ; et

une cinquième unité de détermination, configurée pour déterminer le mode de composition pour l'image de prévisualisation sur la base de la zone de l'objet principal dans l'image de prévisualisation.


 
8. Appareil selon l'une quelconque des revendications 5 à 7, dans lequel le module de composition (1140) comprend :

une sixième unité de détermination, configurée pour déterminer une position prédéfinie d'un objet de premier plan dans une composition selon les informations de catégorie d'objet de premier plan et le mode de composition ;

une septième unité de détermination, configurée pour déterminer une position réelle de l'objet de premier plan dans la composition sur la base de la position prédéfinie de l'objet de premier plan et des informations de position d'objet de premier plan ; et

une unité de composition, agencer l'objet de premier plan à la position réelle de l'objet de premier plan dans la composition.


 




Drawing





























Cited references

REFERENCES CITED IN THE DESCRIPTION



This list of references cited by the applicant is for the reader's convenience only. It does not form part of the European patent document. Even though great care has been taken in compiling the references, errors or omissions cannot be excluded and the EPO disclaims all liability in this regard.

Patent documents cited in the description