(19)
(11)EP 3 516 581 B1

(12)EUROPEAN PATENT SPECIFICATION

(45)Mention of the grant of the patent:
06.09.2023 Bulletin 2023/36

(21)Application number: 17777131.8

(22)Date of filing:  15.09.2017
(51)International Patent Classification (IPC): 
H04N 19/12(2014.01)
H04N 21/44(2011.01)
H04N 19/137(2014.01)
G06F 16/783(2019.01)
H04N 19/172(2014.01)
H04N 21/854(2011.01)
H04N 19/177(2014.01)
G06T 7/246(2017.01)
(52)Cooperative Patent Classification (CPC):
H04N 21/44008; H04N 21/85406; H04N 19/172; H04N 19/12; H04N 19/137; H04N 19/177; G06T 7/246; G06F 16/786
(86)International application number:
PCT/US2017/051680
(87)International publication number:
WO 2018/057402 (29.03.2018 Gazette  2018/13)

(54)

AUTOMATIC SELECTION OF CINEMAGRAPHS

AUTOMATISCHE AUSWAHL VON CINEMAGRAMMEN

SÉLECTION AUTOMATIQUE DE CINÉMAGRAPHES


(84)Designated Contracting States:
AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

(30)Priority: 23.09.2016 US 201615275297

(43)Date of publication of application:
31.07.2019 Bulletin 2019/31

(73)Proprietor: Microsoft Technology Licensing, LLC
Redmond, WA 98052-6399 (US)

(72)Inventor:
  • TRÄFF, Gustav
    Redmond Washington 98052-6399 (US)

(74)Representative: Murgitroyd & Company 
Murgitroyd House 165-169 Scotland Street
Glasgow G5 8PL
Glasgow G5 8PL (GB)


(56)References cited: : 
  
  • JAMES TOMPKIN ET AL: "Towards Moment Imagery: Automatic Cinemagraphs", VISUAL MEDIA PRODUCTION (CVMP), 2011 CONFERENCE FOR, IEEE, 16 November 2011 (2011-11-16), pages 87-93, XP032074521, DOI: 10.1109/CVMP.2011.16 ISBN: 978-1-4673-0117-6
  • ZICHENG LIAO ET AL: "Automated video looping with progressive dynamism", ACM TRANSACTIONS ON GRAPHICS (TOG), vol. 32, no. 4, 1 July 2013 (2013-07-01), page 1, XP055361362, US ISSN: 0730-0301, DOI: 10.1145/2461912.2461950
  
Note: Within nine months from the publication of the mention of the grant of the European patent, any person may give notice to the European Patent Office of opposition to the European patent granted. Notice of opposition shall be filed in a written reasoned statement. It shall not be deemed to have been filed until the opposition fee has been paid. (Art. 99(1) European Patent Convention).


Description

BACKGROUND



[0001] Visual imagery commonly can be classified as either a static image (photograph, painting, etc.) or dynamic imagery (video, animation, etc.). A static image captures a single instant in time. A video provides a temporal narrative through time.

[0002] Another category of visual media that mixes a static image with moving elements has recently become more prevalent. A classic example is an animated Graphics Interchange Format (GIF), originally created to encode short vector-graphics animations within a still image format. Another example of visual media that juxtaposes still and moving images, which has more recently become popular, is referred to as a cinemagraph. Cinemagraphs commonly combine static scenes with a small repeating movement (e.g., a blinking eye or hair motion). In a cinemagraph, the dynamic element normally loops in a sequence of frames.
J.Tompkin et al. "Towards Moment Imagery: Automatic Cinemagraphs" 2011, CVMP, p87 describes a system to assist in the production of seamlessly looping cinemagraphs. The system may be automatic and only require the user to select which regions of motion to keep in the output. The user can edit the motion mask by varying a threshold on the motion detection and by correcting mask regions manually. The user may select non-moving motions to freeze at any frame.
L. Liao et al. "Automated Video Looping with Progressive Dynamism" ACM Transactions on Graphics, 32(4), 2013 describes representations that captures a spectrum of looping videos with varying levels of dynamism, ranging from a static image to a highly animated loop. Scene liveliness can be adjusted interactively using a slider control. Applications include background images and slideshows, where the desired level of activity may depend on personal taste or mood. The representation also provides a segmentation of the scene into independently looping regions, enabling interactive local adjustment over dynamism. For a landscape scene, this control might correspond to selective animation and de-animation of grass motion, water ripples, and swaying trees.

SUMMARY



[0003] The present invention is defined by the independent claims. Further optional features are defined in the dependent claims.

[0004] This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

[0005] A method and apparatus is disclosed which may be used to create still images and cinemagraphs. The method and apparatus allow automatic selection between at least these two types of media files from a sequence of digital images that capture a scene. The criteria for selection is based on object classification of objects detected in at least one image of the scene, and on detection of motion throughout the scene. In embodiments, additional criteria may weigh in on the selection, such as camera shake detection and object tracking.

[0006] Many of the attendant features will be more readily appreciated as the same becomes better understood by reference to the following detailed description considered in connection with the accompanying drawings.

DESCRIPTION OF THE DRAWINGS



[0007] The present description will be better understood from the following detailed description read in light of the accompanying drawings, wherein:

FIG. 1 illustrates a flow chart of a method according to an aspect;

FIG. 2 illustrates a flow chart of a method according to an embodiment;

FIG. 3A illustrates a flow chart of a selection according to an embodiment;

FIG. 3B illustrates a flow chart of a selection according to another embodiment;

FIG. 3C illustrates a flow chart of a selection according to a further embodiment;

FIG. 4 illustrates an apparatus according to an aspect.



[0008] The drawings of the FIGs. are not in scale.

DETAILED DESCRIPTION



[0009] The detailed description provided below in connection with the appended drawings is intended as a description of the present embodiments and is not intended to represent the only forms in which the present embodiments may be constructed or utilized. The description sets forth the functions of the embodiments and the steps for constructing and operating the embodiments. However, the same or equivalent functions and sequences may be accomplished by different embodiments.

[0010] The method of FIG. 1 may be used for selecting a suitable media file type, between file types such as a still image and a cinemagraph. The method may also be used for automatically producing media files of the appropriate type. The method may be carried out, for example, by a device comprising a processor, an image processing unit or by a separate processing unit.

[0011] A digital image refers to data captured via exposure of pixels or some other light-sensing element(s) of an image sensor. A media file may be selected from a still image, a video sequence, and any combination thereof as disclosed further.

[0012] The method of FIG. 1 starts by receiving, in operation 101, a sequence of digital images of a scene. "Scene" refers to the whole content in the object area shot by a camera, comprising general background of the scene and any movable or stationary objects therein.

[0013] In general, the sequence of digital images may comprise any appropriate number of images or image frames. The number of images may correspond to, for example, a displayable image sequence having a length of about 1 to 5 seconds. The actual number of frames then depends on the frame rate to be used. For example, with a constant frame rate of 30 fps (frames per second), the number of frames may be about 30 to 150 frames.

[0014] Images of a scene are originally captured by a digital camera during shooting a scene. Being captured during shooting a scene means that the images and frames of the digital image sequence represent sequential moments of the scene and are captured chronologically. The images of the digital image sequence may represent sequential moments separated from each other, for example, by a time interval of 5 to 100 milliseconds.

[0015] Receiving a sequence of digital images of a scene may include receiving them from a memory in the form of a pre-recorded video or any other image sequence Receiving may also refer to receiving a sequence of digital images from an active digital camera used to capture the sequence, as discussed further in relation to FIG. 2.

[0016] Object classification in at least one of the received digital images is performed in operation 102. Object classification may include object detection, analysis and association to one or more pre-determined object classes. Objects may have any suitable classifications, and examples may include vehicles, buildings, landmarks, activities, people, faces, facial features etc. These are only a few examples of a wide variety of possible object classes. A list of object classes may be stored in a local memory or accessed remotely.

[0017] Results of the object classification may include a list of objects determined in one or more digital images, their location on the image, and the class of the detected object. Object classification is performed to determine and establish presence and class of an object in the scene, as marked by operation 103.

[0018] The method also comprises performing, in operation 104, motion detection on the received sequence of digital images. Motion detection may include a plurality of techniques aimed at detecting a change in the images of the sequence. The change may only register when the change is significant enough. A map of motions and changes that happen throughout the scene captured in the sequence of digital images can be provided as a result The motions may include motions of individual objects against a background, areas of an image, the whole image frame and other motions. The motion detection 104 is performed to determine if there is any motion in the scene, and whether this motion is salient. "Salience" of motion may be determined if a motion fulfils a predetermined parameter that can be adjusted by a user or device manufacturer. For example, if a blink of an eye of a person is to be captured as a cinemagraph, then small motions of the eyelids against a substantially stationary face of that person should fulfil the predetermined parameter for a salient motion. In combination with an object classification of the eyes and other facial features, a precise determination of objects and movements surrounding the object can be made.

[0019] In an embodiment, motion detection may also be supplemented with image stabilization or correction that occur due to camera movement (as opposed to actual movement of objects in the frame). The correction may include, for example, changing angles of the image frame to adapt to camera shake.

[0020] Object classification, including consequent operations 102 followed by 103, and motion detection, including operations 104 followed by 105, may be performed in any order or in parallel.

[0021] In an embodiment, the method comprises performing image categorization on at least one of the received images. The categories of images may comprise nighttime, daytime, indoors, outdoors, nature is inside or outside and a more abstract image description such as nature, office urban.

[0022] Based on the determined presence and class of an object in the scene, as well as the detected motion and its salience, an automatic selection of media file type is made automatically in operation 106. The media file type is selected from at least a still image and a cinemagraph. "Cinemagraph" herein refers to a still image comprising at least one moving area, or dynamic area, which includes continuous or momentary movement. The moving area can be formed of parts of digital images from the sequence. Parts of digital images from the sequence can be located approximately in the same area as the moving area of the cinemagraph. The movement in the moving area may be played back on repeat, looped, or randomized. The area that remains dynamic is determined automatically based on the salience of movement and object classification (including object location). In an embodiment, the user may be prompted to adjust or define dynamic areas of the cinemagraph.

[0023] "Automatically" refers generally to performing the operation(s) at issue, for example, selecting the media file type, by an appropriate data processing unit or module according to predetermined rules and procedures, without need for any contribution provided or determination performed by a user of a device incorporating such unit or module.

[0024] After the file type selection in 106, a media file of the selected type is created in operation 107. According to an embodiment, the created media file is stored in a memory in a further operation of the method. In an embodiment, if the selected media file type is a cinemagraph, the method may also provide the user with an option to change it into a still image later. In an embodiment, the media file type is selected between a still image, a cinemagraph and a living image, wherein a "living image" comprises a still image and a preceding image sequence.

[0025] "Living image" refers to a collection of images displayed as a combination of a still image and a short video or other type of sequentially displayed image sequence preceding the still image. By forming such living image, a representation of a captured moment may be generated which corresponds to the general nature of the scene. The length of such short preceding image sequence displayed in connection with the still image may vary, for example, from 200 to 300 ms to one or a couple of seconds.

[0026] The automatic selection based on the presence and class of an object in the scene, and presence and salience of motion in the scene, can be generally made depending on the various combinations of these and other parameters. Examples of criteria for such selection are discussed below. A class of the object may be sufficient to make a selection of cinemagraph, a still image or a living image. A spatial location of the object in the scene, relationship between the detected object and other objects in the scene, and trajectory of the moving object may be factors for making the selection.

[0027] In an embodiment, objects may be additionally classified as "objects more suitable for cinemagraph", "objects more suitable for still image" and, for example, "objects more suitable for living image". Similarly, the additional classification may include "objects not suitable for cinemagraph", and other classes not suitable for e.g. living images. With this additional classification, the selection of media file type can be made on the basis of the object class, the additional object class and the motion that happens in the scene.

[0028] In an embodiment wherein the sequence of digital images is received from a pre-recorded video in operation 101, the method can further comprise creating two or more media files of the selected type from two or more parts of the received pre-recorded video. The method may be used to produce additional content such as select cinemagraphs or living images from an existing video.

[0029] "Receiving" the image sequence refers to any appropriate way of providing available, for automatic processing purposes, data content(s) corresponding to those images. For example, such data may be fully or partially received via any data transmission path from a device, data server or, for example, a cloud service. It may also be stored on any appropriate data storage medium or device. Receiving may also comprise generating the data content at issue, for example, via analysis of some appropriate data, such as a plurality of frames.

[0030] FIG. 2 shows a method according to an embodiment, wherein the operations are prompted by a user command. In this embodiment, the sequence of digital images of a scene is received from an active digital camera. If the method is performed by a device comprising an image processor, the digital camera may be part of this device, or may be a connected a standalone digital camera. The camera may be activated by a user or automatically, and continuously capture a video feed while active. The digital camera may be of any type capable of performing such capture of sequential images with short intervals. It may be a stand-alone camera apparatus, such as a compact camera, a digital SLR (single-lens reflex) camera, or a digital mirrorless interchangeable-lens camera. Alternatively, it may be a camera module or element incorporated in an apparatus or device, such as a mobile or wearable device.

[0031] The method comprises buffering 201 three or more frames of the received sequence in a memory, and receiving 202 a user command to create a media file. The digital camera may be configured to create a video stream when active, and the video stream can be received and buffered as a sequence of images according to the method.

[0032] The image frames may be buffered captured using, for example, a video capture mode, or a burst capture mode, or a continuous high speed still image capture mode. The interval of capture may correspond, for example, to any standard video displaying frame rate. In general, the preliminary frames may be captured with an interval of 5 to 100 ms between the consecutive frames.

[0033] The first-in-first-out type buffer sequence having a predetermined number of frames forms an image sequence with a continuously changing set of images. First-in-first-out refers to a principle according to which, when a new image is captured and stored in the buffer, the oldest image is removed from it. Thereby, the buffer holds a predetermined number of most recent images at all times. The FIFO buffer may be, for example, a ring buffer.

[0034] The sequential capturing of frames and the storing of the captured frames into the FIFO buffer may be carried out as a continuous operation always when the camera is in use and ready for image capturing initiated by the user of the camera. Thus, the FIFO buffer may be maintained and updated continuously also when no actual image capturing is initiated by the user. Updating the FIFO buffer sequence by storing the new images is stopped when an image capturing user input is received, whereby the content of the buffer sequence is fixed. Possible reception of an image capturing user input is checked after capturing and storing each new preliminary frame.

[0035] The user command 202 may be a press of a shutter button on an active camera, a touch of a touchscreen in a location assigned to image capture, a voice command to take a picture, or any other command indicating that the user wishes to create a media file.

[0036] The method further includes four operations, grouped as 204, which can be performed in any order or in parallel. The operations include object classification and motion detection, as in the embodiment described with regard to FIG. 1. The additional two operations are monitoring stability and/or tilting of the camera used to capture the digital images, and, if an object is detected in the scene, tracking the detected object. If the image stabilization and monitoring tilt is performed before the other operations of 204, it can provide information on the overall movement and shake of the camera. In case the movement intensity exceeds a certain threshold, a cinemagraph or living image selection may be blocked in advance before the other operations are performed. The object tracking, combined with object classification and movement detection, can give a more specific identification of a salient movement that can be included into a cinemagraph as a moving area. A technical effect of the combination of operations 204 is that they can be used in synergy to make a more accurate selection of an appropriate media file type, also in real time. Other operations may be performed to determine conditions for selection of media file types, according to embodiments.

[0037] The operations 204 may be carried out continuously over the buffered sequence of digital images, or over a subset of the buffered sequence of digital images. Data produced with these operations may also be buffered or stored.

[0038] After the results of the four operations 204 are analyzed in operation 205, the method further comprises selecting a media file type in 206 based on the combined results of the analysis 205. Once the media file type is selected, a media file of the selected type is created next in operation 207.

[0039] A technical effect of any of the above methods and embodiments can consist in improved user experience in selecting and creating media files of the listed types. This can be achieved by removing the necessity of manual selection between a cinemagraph, a still image or a living image as an output. This can be useful both in automatic selection of capture mode, if the methods are used in devices with a camera during capture; and to automatic video editing if the received image sequence is from a pre-recorded video.

[0040] FIGs. 3A-3C illustrate conditions that can serve as a basis for the automatic selection between media file types according to the methods described above.

[0041] In an embodiment illustrated on FIG. 3A, the condition 301 is established by the results of object classification and salient motion. Namely, a cinemagraph is selected and created at 303 if the scene includes an object of a predetermined class, and a motion in at least one area of the scene. The conditions may be specified further according to various implementations. In a cinemagraph created this way at 303, the salient motion associated with an object of a known class may constitute the dynamic (moving) area, wherein the rest of the scene may be captured in a still image part. If no motion is determined to be present in the scene, and no objects can be classified, a still image is selected and created instead at 302.

[0042] As an example only, a fountain may be recognized as an object class, and the motion of water may be detected as salient motion, which can result in selecting a cinemagraph as a media file type, and creating a cinemagraph in which the water coming out of the fountain would constitute the moving area. In this and other examples, salient motion may be determined as motion in the dynamic areas of a scene that is repeatable, and may be recognized based on the object classes. This provides an example of criteria for selecting a cinemagraph as a media file type.

[0043] In an embodiment, two or more areas of the scene can include detected salient motion. In this case, once the salient motion is detected 311 in localized areas of the scene, a cinemagraph can be automatically selected as the preferred media file type. This is illustrated in FIG. 3B. The cinemagraph may be created at 313 with multiple moving areas. After creating the cinemagraph 313, the device may prompt the user to enter a selection of moving areas that are to remain in a cinemagraph. This can be prompted, for example, on a screen of the device, to be selected by any suitable input means. When this selection is received at 314, changes to the cinemagraph are saved, and the cinemagraph is stored in a memory 315.

[0044] Receiving further user input 314 on cinemagraph dynamic areas can have an effect on the overall area accuracy when a cinemagraph is created, and provide the ability for a user to customize it to his or her taste.

[0045] FIG. 3C shows more criteria related to motion in the scene. The determination of motion intensity made at 321 may be based on detected stability and/or tilting state of the camera exceeding a predetermined threshold at the time when the user command to create a media file is received. This results in creating 322 a still image and cutting off the selection, which can help preserve resources of a device. In another embodiment, a still image is still selected as the media file type if the motion is detected in a majority of the scene and/or exceeds a predetermined intensity. For example, intense motion in the majority of a scene can be achieved if the camera used to capture the image sequence is moving at a high speed. Even if the camera itself is stable and has no tilt, a scene captured from the window of a fast train is likely to have plenty of intense motion in most of the frame. A still image is created in this scenario as well.

[0046] In an embodiment, a still image may be selected as the media file type on the basis of an object class detection, wherein the detected object class is pre-determined to limit the selection to a still image when no other objects, or no objects of a different object class, are detected in the same scene.

[0047] In an embodiment not covered by the claims, the media file type is selected from a still image, a cinemagraph and a living image of the scene, the living image comprising a still image and a preceding image sequence. The additional media file type may be selected based on the same measurements and results as described above in relation to selection between a still image and a cinemagraph. In an alternative embodiment, a living image of the scene is selected as the media file type if a motion in the scene fulfils at least one predetermined parameter, as shown in 323. The predetermined parameters may be based on the motion of a detected object, and the object class. For example, if motion of an object does not constitute a repeatable motion, but the movement intensity and object class is suitable for a living image, then a living image may be selected as the media file type. In an embodiment, if the motion is detected in a majority of a scene, for example due to movement of the camera or a zooming action, a living image may be selected as the media file type. The selection may also be based on the direction and trajectory of the movement, its intensity and objects that can be tracked in the moving scene.

[0048] After a successful selection, the living image is created at 324.

[0049] In the above, aspects mainly related to method embodiments are discussed In the following, more emphasis will be given on device and apparatus aspects.

[0050] What is described above with regard to definitions, details, ways of implementation, and advantageous effects of the methods apply, mutatis mutandis, to the device and apparatus aspects discussed below. The same apply vice versa. Further, the following apparatuses and devices are examples of equipment for performing the methods described above. The other way around, the previous methods are examples of possible ways of operation of the apparatuses and devices described below.

[0051] The apparatus 400 of FIG. 4 comprises an image processing unit 401. The image processing unit is configured to select, from a received image sequence of a scene, a media file type that will be created, wherein the selection is between a cinemagraph and a still image. In the following, the operation of the image processing unit, when in use, is discussed.

[0052] When in use, the image processing unit 401 receives a sequence 410 of digital images of a scene 420. The sequence 410 may be a received from a video or an active digital camera. The apparatus 400 may comprise a digital camera or store the video in a memory (not shown in the Figure). The most recent image 411 is highlighted in the figure for exemplary purposes only, and comprises an approximate "moment of interest". This may be, for example, a point of capture if the device 400 comprises a camera and a user gives a command to make a picture.

[0053] The image processing unit 401 performs object classification 421 in at least one of the received images to determine the presence and class of an object in the scene, and motion detection 422 through the sequence of received images 410 to determine the presence and salience of motion in the scene. The intermediary images in which the object and motion are also present are denoted by 420'.

[0054] The functions 421 and 422 are illustrated as icons with arrows on the left side of the image sensor. In an embodiment, each of the functions 421, 422 and 423 can be implemented in separate units.

[0055] Additional functions may also be implemented in the apparatus 400, wherein the image processing unit 401 can be configured to detect stability and/or tilting 423 of the camera, and to perform object tracking (not shown in the Figure).

[0056] The objects 430 and 431 are provided in FIG. 4 as examples only. The detected object 431 is a silhouette of a moving person, and the object 430 is a source of light illustrated schematically as a four point star. For example purposes, the source of light 430 has varying intensity and looks slightly different on different images.

[0057] In this example, after obtaining the results of motion detection 422 and object classification 421, as well as other functions 423, the image processing unit 401 determines the presence of a moving silhouette 431 and recognizes the object class as "person". The movement is quite fast and directional, which is not reversible and would not form a loop. Therefore, area of the scene 420 with a moving silhouette 431 does not include salient motion suitable for a cinemagraph. The image processing unit 420 may be configured to select and create a still image 415 as the resulting media file. In an embodiment, the image processing unit may be configured to track the moving silhouette and to select a living image 413 as the resulting media file. For example, the living image may highlight a particular movement of the person against an otherwise quiet background.

[0058] However, the star 430 may also be classified as "light source" after object classification. The light is produced unevenly, creating minor change in the shape of the light source 430 between different image frames. This change can be detected after the image processing unit performs motion detection 422 of the images 420. Since the change in shape can be displayed in a repeatable manner and does not constitute intense movement, the image processing unit 420 may be configured to select and create a cinemagraph 414 wherein the silhouette 431 is still, but the light source 430 is part of a dynamic area of the cinemagraph 414.

[0059] In the example shown in FIG. 4, the camera that was used to capture the image sequence 420 is static, hence detecting the stability and/or tilting 423 of the camera does not limit the outcome to a still image 415. In other examples, wherein the majority of the image is moving, or excessive camera shake is detected, the image processing unit 401 may be configured to skip the selection according to other parameters and only create a still image 415.

[0060] Being "configured to" perform the above operations when in use refers to the capability of and suitability of the image processing unit for such operations. This may be achieved in various ways. For example, the image processing unit may comprise at least one processor and at least one memory coupled to the at least one processor, the memory storing program code instructions which, when run on the at least one processor, cause the processor to perform the action(s) at issue. Alternatively, or in addition, the functionally described features can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include Field-programmable Gate Arrays (FPGAs), Application-specific Integrated Circuits (ASICs), Application-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), Graphics Processing Units (GPUs) etc.

[0061] The apparatus 400 may be implemented as an apparatus of any type capable of being configured to carry out the operation steps as discussed above. Examples include a laptop, a personal computer, and any other types of general purpose data processing apparatuses and devices. It may also be implemented as a mobile device, such as a mobile phone, a smart phone, a tablet computer, or a wearable device of any appropriate type.

[0062] Being illustrated as one unit in the schematic drawing of FIG. 4 does not necessitate that the image processing unit 401 is implemented as a single element or component. It may comprise two or more sub-units or sub-systems which each may be implemented using one or more physical components or elements.

[0063] Instead of, or in addition to the operations described above, the image processing unit 401 of the apparatus 400 of FIG. 4 may be configured to operate, when in use, according to any of the methods discussed above with reference to FIGs. 1 to 3C.

[0064] By the methods and the apparatuses shown in FIGs. 1 to 4, automatic decision on cinemagraph capture can be achieved.

[0065] Although some of the present embodiments may be described and illustrated herein as being implemented in a smartphone, a mobile phone, a digital camera or a tablet computer, these are only examples of a device and not a limitation. As those skilled in the art will appreciate, the present embodiments are suitable for application in a variety of different types of devices that comprise a digital camera and/or are capable of processing digital images.

[0066] Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter, defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

[0067] It will be understood that the benefits and advantages described above may relate to one embodiment or may relate to several embodiments. The embodiments are not limited to those that solve any or all of the stated problems or those that have any or all of the stated benefits and advantages. It will further be understood that reference to 'an' item refers to one or more of those items.

[0068] The operations of the methods described herein may be carried out in any suitable order, or simultaneously where appropriate.

[0069] The operations of the methods described herein may be performed by an apparatus comprising means for performing each of the operations.

[0070] The term "comprising" is used in this specification to mean including the features followed thereafter, without excluding the presence of one or more additional features.


Claims

1. A method, comprising:

receiving (101) a sequence of digital images of a scene;

performing, by a processor, object classification (102) in at least one of the received digital images to determine the presence, location, and class of an object in the scene;

performing, by a processor, motion detection (104) in the received sequence of digital images to determine the presence and salience of motion in the scene;

automatically selecting (106), by a processor, a media file type on the basis of: the determined presence and class of an object in the scene, and presence and salience of motion in the scene; and

creating (107) a media file of the selected media file type from the received sequence of digital images;

wherein the media file type is selected from a still image and a cinemagraph, and

wherein the cinemagraph comprises a still image and moving images, combining static scenes with repeating motion associated with an object, and

characterized in that:

cinemagraph is selected (303, 313) as the media file type if an object of at least one predetermined class is present in the scene and the salient motion in the location of the object repeats and is below a predetermined intensity level; and

still image is selected (302, 312) as the media file type if no objects can be classified in the scene, or no motion is determined to be present in the scene, or if motion is detected in a majority of the scene and exceeds the predetermined intensity level.


 
2. A method as defined in claim 1, wherein the sequence of digital images of a scene is received from a digital camera used to capture the sequence of digital images.
 
3. A method as defined in claim 2, comprising:

buffering (201) three or more frames of the received sequence of digital images of a scene in a memory, and

receiving (202) a user command to create a media file,

wherein the object classification and motion detection are performed in the buffered five or more frames preceding the user command to create a media file.
 
4. A method as defined in claim 3, comprising monitoring (204) stability and/or tilting state of the camera used to capture the digital images,
wherein the conditions for selecting a media file type also comprise the measured stability and/or tilting of the camera used to capture the digital images.
 
5. A method as defined in claim 4, wherein a still image (302) is selected as the media file type if the monitored stability and/or tilting state of the camera exceed a predetermined threshold at the time when the user command to create a media file is received.
 
6. A method as defined in claim 1, wherein a cinemagraph is selected as the media file type if salient motion is detected in two or more areas of the scene, the method further comprising receiving a user selection of at least one of the two or more areas of the scene wherein salient motion was detected, and creating a cinemagraph with localized movement in the one or more areas of the scene selected by the user.
 
7. A method as defined in claim 1, wherein the sequence of digital images of a scene is received from a pre-recorded video.
 
8. A method as defined in claim 7, comprising creating two or more media files of the selected media file type from two or more parts the received pre-recorded video.
 
9. A method as defined in claim 1, wherein, if the presence of an object is detected in the scene, the method further comprises tracking the detected object in the received sequence of digital images.
 
10. An apparatus (400), comprising an image processing unit (401) configured to:

receive a sequence of digital images (410) of a scene (420);

perform object classification (421) in at least one of the received digital images to determine the presence, location, and class of an object in the scene;

perform motion detection (422) in the received sequence of digital images to determine the presence and salience of motion in the scene;

select a media file type on the basis of: the determined presence and class of an object in the scene, and presence and salience of motion of the object in the scene; and

create a media file of the selected media file type from the received sequence of digital images,

wherein the media file type is selected from a still image and a cinemagraph, and

wherein the cinemagraph comprises a still image and moving images, combining static scenes with repeating motions associated with an object, and

characterized in that:

cinemagraph is selected as the media file type if an object of at least one predetermined class is present in the scene and the salient motion in the location of the object repeats and is below a predetermined intensity level; and

still image is selected as the media file type if no objects can be classified in the scene, or no motion is determined to be present in the scene, or if motion is detected in a majority of the scene and exceeds the predetermined intensity level.


 
11. An apparatus as defined in claim 10, comprising a digital camera unit comprising a viewfinder configured to capture digital images of a scene, the image processing unit being connected to the digital camera unit to receive the captured digital images of a scene.
 


Ansprüche

1. Verfahren, umfassend:

Empfangen (101) einer Abfolge digitaler Bilder einer Szene;

Durchführen, durch einen Prozessor, einer Objektklassifizierung (102) in mindestens einem der empfangenen digitalen Bilder, um das Vorhandensein, den Ort und die Klasse eines Objekts in der Szene zu bestimmen;

Durchführen, durch einen Prozessor, einer Bewegungserkennung (104) in der empfangenen Abfolge digitaler Bilder, um das Vorhandensein und Auffälligkeiten von Bewegung in der Szene zu bestimmen;

automatisches Auswählen (106), durch einen Prozessor, eines Mediendateityps auf Basis von: dem bestimmten Vorhandensein und der Klasse eines Objekts in der Szene und dem Vorhandensein und Auffälligkeiten von Bewegung in der Szene; und

Erstellen (107) einer Mediendatei des ausgewählten Mediendateityps aus der empfangenen Abfolge digitaler Bilder;

wobei der Mediendateityp aus einem Standbild und einem Cinemagramm ausgewählt wird, und

wobei das Cinemagramm ein Standbild und bewegte Bilder umfasst, die statische Szenen mit sich wiederholender Bewegung, die einem Objekt zugeordnet ist, kombinieren, und

dadurch gekennzeichnet, dass:

Cinemagramm als Mediendateityp ausgewählt wird (303, 313), wenn ein Objekt mindestens einer vorgegebenen Klasse in der Szene vorhanden ist und die auffällige Bewegung am Ort des Objekts sich wiederholt und unter einem vorgegebenen Intensitätsgrad liegt; und

Standbild als Mediendateityp ausgewählt wird (302, 312), wenn keine Objekte in der Szene klassifiziert werden können, oder bestimmt wird, dass keine Bewegung in der Szene vorhanden ist, oder wenn in einem Großteil der Szene Bewegung erkannt wird und den vorgegebenen Intensitätsgrad überschreitet.


 
2. Verfahren nach Anspruch 1, wobei die Abfolge digitaler Bilder einer Szene von einer Digitalkamera empfangen wird, die zum Aufnehmen der Abfolge digitaler Bilder verwendet wird.
 
3. Verfahren nach Anspruch 2, umfassend:

Puffern (201) von drei oder mehr Einzelbildern der empfangenen Abfolge digitaler Bilder einer Szene in einem Speicher, und

Empfangen (202) eines Benutzerbefehls zum Erstellen einer Mediendatei,

wobei die Objektklassifizierung und Bewegungserkennung in den gepufferten fünf oder mehr Einzelbildern durchgeführt werden, die dem Benutzerbefehl zum Erstellen einer Mediendatei vorangehen.


 
4. Verfahren nach Anspruch 3, umfassend Überwachen (204) von Stabilität und/oder Neigungszustand der Kamera, die zum Aufnehmen der digitalen Bilder verwendet wird,
wobei die Bedingungen zum Auswählen eines Mediendateityps auch die gemessene Stabilität und/oder die Neigung der Kamera umfassen, die zum Aufnehmen der digitalen Bilder verwendet wird.
 
5. Verfahren nach Anspruch 4, wobei ein Standbild (302) als der Mediendateityp ausgewählt wird, wenn die überwachte Stabilität und/oder der Neigungszustand der Kamera zu dem Zeitpunkt, wenn der Benutzerbefehl zum Erstellen einer Mediendatei empfangen wird, eine vorgegebene Schwelle überschreitet.
 
6. Verfahren nach Anspruch 1, wobei ein Cinemagramm als Mediendateityp ausgewählt wird, wenn in zwei oder mehr Bereichen der Szene eine auffällige Bewegung erkannt wird, wobei das Verfahren weiter Empfangen einer Benutzerauswahl von mindestens einem der zwei oder mehr Bereiche der Szene, in dem auffällige Bewegung erkannt wurde, und Erstellen eines Cinemagramms mit lokalisierter Bewegung in einem oder mehreren Bereichen der Szene, die von dem Benutzer ausgewählt wurden, umfasst.
 
7. Verfahren nach Anspruch 1, wobei die Abfolge digitaler Bilder einer Szene aus einem zuvor aufgezeichneten Video empfangen wird.
 
8. Verfahren nach Anspruch 7, das Erstellen von zwei oder mehr Mediendateien des ausgewählten Mediendateityps aus zwei oder mehr Teilen des empfangenen, zuvor aufgezeichneten Videos umfasst.
 
9. Verfahren nach Anspruch 1, wobei, wenn das Vorhandensein eines Objekts in der Szene erkannt wird, das Verfahren weiter Verfolgen des erkannten Objekts in der empfangenen Abfolge digitaler Bilder umfasst.
 
10. Einrichtung (400), die eine Bildverarbeitungseinheit (401) umfasst, die für Folgendes konfiguriert ist:

Empfangen einer Abfolge digitaler Bilder (410) einer Szene (420);

Durchführen von Objektklassifizierung (421) in mindestens einem der empfangenen digitalen Bilder, um das Vorhandensein, den Ort und die Klasse eines Objekts in der Szene zu bestimmen;

Durchführen von Bewegungserkennung (422) in der empfangenen Abfolge digitaler Bilder, um das Vorhandensein und Auffälligkeiten von Bewegung in der Szene zu bestimmen;

Auswählen eines Mediendateityps auf Basis von: dem bestimmten Vorhandensein und der Klasse eines Objekts in der Szene und Vorhandensein und Auffälligkeiten von Bewegung des Objekts in der Szene; und

Erstellen einer Mediendatei des ausgewählten Mediendateityps aus der empfangenen Abfolge digitaler Bilder,

wobei der Mediendateityp aus einem Standbild und einem Cinemagramm ausgewählt wird, und

wobei das Cinemagramm ein Standbild und bewegte Bilder umfasst, die statische Szenen mit sich wiederholenden Bewegungen, die einem Objekt zugeordnet sind, kombinieren, und

dadurch gekennzeichnet, dass:

Cinemagramm als Mediendateityp ausgewählt wird, wenn ein Objekt mindestens einer vorgegebenen Klasse in der Szene vorhanden ist und die auffällige Bewegung an dem Ort des Objekts sich wiederholt und unter einem vorgegebenen Intensitätsgrad liegt; und

Standbild als Mediendateityp ausgewählt wird, wenn keine Objekte in der Szene klassifiziert werden können, oder bestimmt wird, dass keine Bewegung in der Szene vorhanden ist, oder wenn in einem Großteil der Szene Bewegung erkannt wird und den vorgegebenen Intensitätsgrad überschreitet.


 
11. Einrichtung nach Anspruch 10, umfassend eine Digitalkameraeinheit, die einen Sucher umfasst, der zum Aufnehmen digitaler Bilder einer Szene konfiguriert ist, wobei die Bildverarbeitungseinheit mit der Digitalkameraeinheit verbunden ist, um die aufgenommenen digitalen Bilder einer Szene zu empfangen.
 


Revendications

1. Procédé, comprenant :

la réception (101) d'une séquence d'images numériques d'une scène ;

la réalisation, par un processeur, d'une classification d'objet (102) dans au moins une des images numériques reçues pour déterminer la présence, la position et la classe d'un objet dans la scène ;

la réalisation, par un processeur, d'une détection de mouvement (104) dans la séquence reçue d'images numériques pour déterminer la présence et l'importance du mouvement dans la scène ;

la sélection automatique (106), par un processeur, d'un type de fichier multimédia sur la base de : la présence et la classe déterminées d'un objet dans la scène, et la présence et l'importance du mouvement dans la scène ; et

la création (107) d'un fichier multimédia du type de fichier multimédia sélectionné à partir de la séquence reçue d'images numériques ;

dans lequel le type de fichier multimédia est sélectionné parmi une image fixe et une cinémagraphie, et

dans lequel la cinémagraphie comprend une image fixe et des images animées, combinant des scènes statiques avec un mouvement répété associé à un objet, et

caractérisé en ce que :

une cinémagraphie est sélectionnée (303, 313) comme type de fichier multimédia si un objet d'au moins une classe prédéterminée est présent dans la scène et que le mouvement important à la position de l'objet se répète et est inférieur à un niveau d'intensité prédéterminé ; et

une image fixe est sélectionnée (302, 312) comme type de fichier multimédia si aucun objet ne peut être classé dans la scène, ou si aucun mouvement n'est déterminé comme étant présent dans la scène, ou si un mouvement est détecté dans une majorité de la scène et dépasse le niveau d'intensité prédéterminé.


 
2. Procédé selon la revendication 1, dans lequel la séquence d'images numériques d'une scène est reçue d'une caméra numérique utilisée pour capturer la séquence d'images numériques.
 
3. Procédé selon la revendication 2, comprenant :

la mise en mémoire tampon (201) de trois trames ou plus de la séquence reçue d'images numériques d'une scène dans une mémoire, et

la réception (202) d'une commande utilisateur pour créer un fichier multimédia,

dans lequel la classification d'objet et la détection de mouvement sont réalisées dans les cinq trames mises en mémoire tampon ou plus précédant la commande utilisateur pour créer un fichier multimédia.


 
4. Procédé selon la revendication 3, comprenant la surveillance (204) de la stabilité et/ou l'état d'inclinaison de la caméra utilisée pour capturer les images numériques,
dans lequel les conditions de sélection d'un type de fichier multimédia comprennent également la stabilité et/ou l'inclinaison mesurées de la caméra utilisée pour capturer les images numériques.
 
5. Procédé selon la revendication 4, dans lequel une image fixe (302) est sélectionnée en tant que type de fichier multimédia si la stabilité et/ou l'état d'inclinaison surveillés de la caméra dépassent un seuil prédéterminé au moment où la commande utilisateur pour créer un fichier multimédia est reçue.
 
6. Procédé selon la revendication 1, dans lequel une cinémagraphie est sélectionnée comme type de fichier multimédia si un mouvement important est détecté dans deux zones ou plus de la scène, le procédé comprenant en outre la réception d'une sélection utilisateur d'au moins l'une des deux zones ou plus de la scène dans lequel un mouvement important a été détecté, et la création d'une cinémagraphie avec un mouvement localisé dans les une ou plusieurs zones de la scène sélectionnées par l'utilisateur.
 
7. Procédé selon la revendication 1, dans lequel la séquence d'images numériques d'une scène est reçue à partir d'une vidéo préenregistrée.
 
8. Procédé selon la revendication 7, comprenant la création de deux fichiers multimédia ou plus du type de fichier multimédia sélectionné à partir de deux parties ou plus de la vidéo préenregistrée reçue.
 
9. Procédé selon la revendication 1, dans lequel, si la présence d'un objet est détectée dans la scène, le procédé comprend en outre le suivi de l'objet détecté dans la séquence reçue d'images numériques.
 
10. Appareil (400), comprenant une unité de traitement d'image (401) configurée pour :

recevoir une séquence d'images numériques (410) d'une scène (420) ;

réaliser une classification d'objet (421) dans au moins une des images numériques reçues pour déterminer la présence, la position et la classe d'un objet dans la scène ;

réaliser une détection de mouvement (422) dans la séquence reçue d'images numériques pour déterminer la présence et l'importance du mouvement dans la scène ;

sélectionner un type de fichier multimédia sur la base de : la présence et la classe déterminées d'un objet dans la scène, et la présence et l'importance du mouvement de l'objet dans la scène ; et

créer un fichier multimédia du type de fichier multimédia sélectionné à partir de la séquence d'images numériques reçue,

dans lequel le type de fichier multimédia est sélectionné parmi une image fixe et une cinémagraphie, et

dans lequel la cinémagraphie comprend une image fixe et des images animées, combinant des scènes statiques avec des mouvements répétés associés à un objet, et

caractérisé en ce que :

une cinémagraphie est sélectionnée comme type de fichier multimédia si un objet d'au moins une classe prédéterminée est présent dans la scène et que le mouvement important à la position de l'objet se répète et est inférieur à un niveau d'intensité prédéterminé ; et

une image fixe est sélectionnée comme type de fichier multimédia si aucun objet ne peut être classé dans la scène, ou si aucun mouvement n'est déterminé comme étant présent dans la scène, ou si un mouvement est détecté dans une majorité de la scène et dépasse le niveau d'intensité prédéterminé.


 
11. Appareil selon la revendication 10, comprenant une unité caméra numérique comprenant un viseur configuré pour capturer des images numériques d'une scène, l'unité de traitement d'image étant connectée à l'unité caméra numérique pour recevoir les images numériques capturées d'une scène.
 




Drawing

















Cited references

REFERENCES CITED IN THE DESCRIPTION



This list of references cited by the applicant is for the reader's convenience only. It does not form part of the European patent document. Even though great care has been taken in compiling the references, errors or omissions cannot be excluded and the EPO disclaims all liability in this regard.

Non-patent literature cited in the description