(19)
(11) EP 4 030 772 A1

(12) EUROPEAN PATENT APPLICATION
published in accordance with Art. 153(4) EPC

(43) Date of publication:
20.07.2022 Bulletin 2022/29

(21) Application number: 21817039.7

(22) Date of filing: 30.03.2021
(51) International Patent Classification (IPC): 
H04N 21/858(2011.01)
(52) Cooperative Patent Classification (CPC):
H04N 21/233; H04N 21/4402; H04N 21/439; H04N 19/61; H04N 21/2343; H04N 21/44; H04N 21/845; H04N 21/858
(86) International application number:
PCT/CN2021/084179
(87) International publication number:
WO 2021/244116 (09.12.2021 Gazette 2021/49)
(84) Designated Contracting States:
AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR
Designated Extension States:
BA ME
Designated Validation States:
KH MA MD TN

(30) Priority: 04.06.2020 CN 202010501593

(71) Applicant: Tencent Technology (Shenzhen) Company Limited
Shenzhen City, Guangdong 518057 (CN)

(72) Inventors:
  • HU, Ying
    Shenzhen, Guangdong 518057 (CN)
  • XU, Xiaozhong
    Shenzhen, Guangdong 518057 (CN)
  • LIU, Shan
    Shenzhen, Guangdong 518057 (CN)

(74) Representative: Gunzelmann, Rainer 
Wuesthoff & Wuesthoff Patentanwälte PartG mbB Schweigerstraße 2
81541 München
81541 München (DE)

   


(54) DATA PROCESSING METHOD, DEVICE AND APPARATUS FOR IMMERSIVE MEDIA, AND STORAGE MEDIUM


(57) Provided in embodiments of the present invention are a data processing method for an immersive media, and a related apparatus. The method comprises: acquiring a media file format data box of an immersive media, the media file format data box comprising a scaling policy for the ith scaling region of the immersive media in a target scaling mode, where i is a positive integer; and scaling the ith scaling region of the immersive media according to the media file format data box.




Description

RELATED APPLICATION



[0001] This application claims priority to Chinese Patent Application No. 202010501593.5 filed on June 4, 2020, which is incorporated herein by reference in its entirety.

FIELD OF THE TECHNOLOGY



[0002] This disclosure relates to the field of computer technologies and the field of virtual reality (VR) technologies, and in particular, to a data processing method, apparatus, and device for immersive media, and a computer-readable storage medium.

BACKGROUND OF THE DISCLOSURE



[0003] In immersive media transmission solutions in the related art, user-initiated zooming of immersive media has been supported. For video content supporting zooming at a specific playback time in a specific screen region, a server may prepare videos of a plurality of zoom ratio versions for the region. When a user performs a zoom operation, a content playback device requests the videos of all the zoom ratio versions from the server. Finally, the user decides to present a video at a specific zoom ratio and in a specific resolution through a zoom behavior in some embodiments. However, the zoom behavior depends entirely on an actual zoom operation of the user. Because a zoom behavior of the user cannot be known in advance, before the user performs zooming, the content playback device needs to request videos in all zoom resolutions first, which inevitably causes bandwidth waste.

SUMMARY



[0004] Embodiments of this disclosure include data processing methods, apparatuses, and devices for immersive media, and computer storage mediums, for example to save the transmission bandwidth.

[0005] The embodiments of this disclosure provide a data processing method for immersive media, applicable to a computer device, the method including:

obtaining a media file format data box of immersive media, the media file format data box including a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer; and

performing zoom processing on the ith zoom region of the immersive media according to the media file format data box.



[0006] In the embodiments of this disclosure, a media file format data box of immersive media is obtained, the media file format data box including a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer; and zoom processing is performed on the ith zoom region of the immersive media according to the media file format data box. In view of the above, in the target zoom mode, a client does not need to request encapsulated files of all zoom resolution versions, thereby saving the transmission bandwidth.

[0007] The embodiments of this disclosure further provide a data processing method for immersive media, applicable to a content production device, the method including:

obtaining zoom information of immersive media;

configuring a media file format data box of the immersive media according to the zoom information of the immersive media, the media file format data box including a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer; and

adding the media file format data box of the immersive media into an encapsulated file of the immersive media.



[0008] In the embodiments of this disclosure, a media file format data box is configured according to immersive media and zoom information of the immersive media, and the media file format data box of the immersive media is added into an encapsulated file of the immersive media. Therefore, a content playback device can request, according to the media file format data box, a video file corresponding to a target zoom mode at a current resolution from a server and consume it without requesting videos of all zoom resolution versions, thereby saving the transmission bandwidth.

[0009] The embodiments of this disclosure provide a data processing method for immersive media, applicable to a content playback device, the method including:

obtaining an encapsulated file of immersive media, the encapsulated file including a media file format data box of the immersive media, the media file format data box including a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer; and

parsing the encapsulated file, and displaying the parsed immersive media; and

performing zoom processing on the ith zoom region of the immersive media according to the media file format data box in response to displaying the ith zoom region of the immersive media.



[0010] In the embodiments of this disclosure, an encapsulated file of immersive media is parsed to obtain a media file format data box of the immersive media, and zoom processing is performed on the ith zoom region of the immersive media according to the media file format data box. In view of the above, in the target zoom mode, a content playback device (client) does not need to request videos of all zoom resolution versions, thereby saving the transmission bandwidth. In addition, when the client consumes a video file corresponding to a target zoom mode at a current resolution, the client automatically presents, according to the target zoom mode, a zoom effect specified by an immersive media content producer, so that a user can obtain a best viewing experience.

[0011] The embodiments of this disclosure provide a data processing apparatus for immersive media, including:

an obtaining unit, configured to obtain a media file format data box of immersive media, the media file format data box including a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer; and

a processing unit, configured to perform zoom processing on the ith zoom region of the immersive media according to the media file format data box.



[0012] The embodiments of this disclosure provide another data processing apparatus for immersive media, including:

an obtaining unit, configured to obtain zoom information of immersive media; and

a processing unit, configured to configure a media file format data box of the immersive media according to the zoom information of the immersive media, the media file format data box including a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer; and add the media file format data box of the immersive media into an encapsulated file of the immersive media.



[0013] The embodiments of this disclosure provide another data processing apparatus for immersive media, including:

an obtaining unit, configured to obtain an encapsulated file of immersive media, the encapsulated file including a media file format data box of the immersive media, the media file format data box including a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer; and

a processing unit, configured to parse the encapsulated file, and displaying the parsed immersive media; and perform zoom processing on the ith zoom region of the immersive media according to the media file format data box in response to displaying the ith zoom region of the immersive media.



[0014] The embodiments of this disclosure provide a data processing device for immersive media, including:

one or more processors and one or more memories,

the one or more memories storing at least one segment of program code, the at least one segment of program code being loaded and executed by the one or more processors, to implement any of the data processing methods for immersive media provided in the embodiments of this disclosure.



[0015] The embodiments of this disclosure further provide computer-readable storage medium, storing at least one segment of program code, the at least one segment of program code being loaded and executed by a processor, to implement any of the data processing methods for immersive media provided in the embodiments of this disclosure.

[0016] In the embodiments of this disclosure, a media file format data box and a media presentation description file of immersive media are extended to support a target (director) zoom mode, so that a content production device can formulate different zoom policies at different resolutions for a user according to an intention of an immersive media content producer, and a client requests a corresponding video file from a server according to a zoom policy corresponding to a current resolution and consumes it. In view of the above, in the target zoom mode, the client does not need to request encapsulated files of all zoom resolution versions, thereby saving the transmission bandwidth. In addition, when the client consumes an encapsulated file corresponding to a target zoom mode at a current resolution, the client automatically presents, according to the target zoom mode, a zoom effect specified by an immersive media content producer, so that the user can obtain a best viewing experience.

BRIEF DESCRIPTION OF THE DRAWINGS



[0017] 

FIG. 1A is an architectural diagram of an immersive media system according to an embodiment of this disclosure.

FIG. 1B is a basic block diagram of video encoding according to an embodiment of this disclosure.

FIG. 1C is a schematic diagram of 6DoF according to an embodiment of this disclosure.

FIG. 1D is a schematic diagram of 3DoF according to an embodiment of this disclosure.

FIG. 1E is a schematic diagram of 3DoF+ according to an embodiment of this disclosure.

FIG. 1F is a schematic diagram of input image division according to an embodiment of this disclosure.

FIG. 2 is a schematic diagram of the ith zoom region according to an embodiment of this disclosure.

FIG. 3 is a flowchart of a data processing method for immersive media according to an embodiment of this disclosure.

FIG. 4 is a flowchart of another data processing method for immersive media according to an embodiment of this disclosure.

FIG. 5 is a flowchart of another data processing method for immersive media according to an embodiment of this disclosure.

FIG. 6 is a schematic structural diagram of a data processing apparatus for immersive media according to an embodiment of this disclosure.

FIG. 7 is a schematic structural diagram of another data processing apparatus for immersive media according to an embodiment of this disclosure.

FIG. 8 is a schematic structural diagram of a content production device according to an embodiment of this disclosure.

FIG. 9 is a schematic structural diagram of a content playback device according to an embodiment of this disclosure.


DESCRIPTION OF EMBODIMENTS



[0018] Technical solutions in embodiments of this disclosure are clearly and completely described below with reference to the accompanying drawings in the embodiments of this disclosure. The described embodiments are merely some rather than all of the embodiments of this disclosure. Other embodiments obtained by a person of ordinary skill in the art based on the embodiments of this disclosure without making creative efforts shall fall within the protection scope of this disclosure.

[0019] The embodiments of this disclosure relate to a data processing technology for immersive media. The so-called immersive media can refer to a media file that can provide immersive media content, to enable a user immersed in the media content to obtain visual, auditory, and other sensory experience in the real world. In some embodiments, the immersive media may be three degrees of freedom (3DoF) immersive media, 3DoF+ immersive media, or 6DoF immersive media. The immersive media content includes video content represented in various forms in a three-dimensional (3D) space, for example, a 3D video content represented in a sphere form. In some embodiments, the immersive media content may be virtual reality (VR) video content, panoramic video content, sphere video content, or 360-degree video content. Therefore, the immersive media may also be referred to as a VR video, a panoramic video, a sphere video, or a 360-degree video. In addition, the immersive media content further includes audio content synchronized with the video content represented in the 3D space.

[0020] FIG. 1A is an architectural diagram of an immersive media system according to an exemplary embodiment of this disclosure. As shown in FIG. 1A, the immersive media system includes a content production device and a content playback device. The content production device may be a computer device used by a provider of immersive media (e.g., a content producer of immersive media). The computer device may be a terminal (e.g., a personal computer (PC) or an intelligent mobile device (e.g., a smartphone)) or a server. The content playback device may be a computer device used by a user of immersive media (e.g., a user). The computer device may be a terminal (e.g., a PC, an intelligent mobile device (such as a smartphone), or a VR device (e.g., a VR helmet or VR glasses)). A data processing procedure for immersive media includes a data processing procedure on the side of the content production device and a data processing procedure on the side of the content playback device.

[0021] The data processing procedure on the side of the content production device mainly includes: (1) obtaining and producing procedures for media content of immersive media; and (2) encoding and file encapsulation procedures for the immersive media. The data processing procedure on the side of the content playback device mainly includes: (3) file decapsulation and decoding procedures for the immersive media; and (4) a rendering procedure for the immersive media. In addition, a transmission procedure of immersive media is involved between the content production device and the content playback device. The transmission procedure may be carried out based on various transmission protocols. The transmission protocol herein may include, but is not limited to, the Dynamic Adaptive Streaming over HTTP (DASH) protocol, the HTTP Live Streaming (HLS) protocol, the Smart Media Transport Protocol (SMTP), the Transmission Control Protocol (TCP), and the like.

[0022] The procedures involved in the data processing procedures for immersive media are respectively described below in detail.

[0023] FIG. 1B is a basic block diagram of video encoding according to an exemplary embodiment of this disclosure. The procedures involved in the data processing procedures for immersive media are described in detail with reference to FIG. 1A and FIG. 1B.

I. Data processing procedure on the side of the content production device


(1) Obtaining of media content of immersive media



[0024] Modes of obtaining media content of immersive media may be divided into two modes: obtaining the media content by acquiring an audio-visual scene of a real world through a capturing device and generating the media content through a computer. In some embodiments, the capturing device may refer to a hardware assembly disposed in the content production device. For example, the capturing device refers to a microphone, a camera, a sensor, or the like of the terminal. In some embodiments, the capturing device may alternatively be a hardware device connected to the content production device, for example, a camera connected to the server, and configured to provide a service of obtaining media content of immersive media for THE content production device. The capturing device may include, but is not limited to, an audio device, a camera device, and a sensing device. The audio device may include an audio sensor, a microphone, or the like. The camera device may include an ordinary camera, a stereo camera, a light field camera, or the like. The sensing device may include a laser device, a radar device, or the like. There may be a plurality of capturing devices. The capturing devices are deployed at some specific positions in a real space to simultaneously capture audio content and video content from different angles in the space. The captured audio content and video content are synchronized temporally and spatially. Due to the different obtaining modes, compression encoding modes corresponding to media content of different immersive media may also be different.

(2) Production procedure of media content of immersive media



[0025] The captured audio content itself is content adapted to be audio-encoded for immersive media. Only after being subjected to a series of production procedures, the captured video content can become content adapted to be video-encoded for immersive media. The production procedures include:

① Concatenation. Because the captured video content is photographed by the capturing devices from different angles, concatenation can refer to concatenating the video content photographed from the angles into a complete 360-degree visual panoramic video capable of reflecting a real space. That is, the concatenated video can be a panoramic video (or a sphere video) represented in a 3D space.

② Projection. Projection can refer to a procedure of mapping a concatenated 3D video onto a two-dimensional (2D) image. The 2D image formed through projection can be referred to as a projected image. A projection mode may include, but is not limited to, latitude and longitude map projection or regular hexahedron projection.



[0026] Only a panoramic video can be captured if the capturing devices are adopted. After such a video is processed by the content production device and transmitted to the content playback device for corresponding data processing, a user on the side of the content playback device can only view 360-degree video information by performing some specific actions (e.g., rotating the head), and cannot obtain a corresponding video change by performing a non-specific action (e.g., moving the head), resulting in a poor VR experience. Therefore, depth information matching the panoramic video information needs to be additionally provided, to enable the user to obtain better immersion and a better VR experience, which involves a variety of production technologies. Common production technologies include a six degrees of freedom (6DoF) production technology. FIG. 1C is a schematic diagram of 6DoF according to an exemplary embodiment of this disclosure. 6DoF is divided into window 6DoF, omnidirectional 6DoF, and 6DoF. Window 6DoF means that rotational movements of a user around X and Y axes are constrained, and translational movements of the user along Z axis are constrained. For example, a user cannot see beyond a frame of a window, and the user cannot pass through the window. Omnidirectional 6DoF means that rotational movements of a user around X, Y, and Z axes are constrained. For example, the user cannot freely pass through 3D 360-degree VR content in a constrained movement region. 6DoF means that a user can freely translate along X, Y, and Z axes. For example, the user can freely walk in 3D 360-degree VR content. Similar to 6DoF, there are also 3DoF and 3DoF+ production technologies. FIG. 1D is a schematic diagram of 3DoF according to an exemplary embodiment of this disclosure. As shown in FIG. ID, 3DoF means that a user is fixed at a center point of a 3D space, and the head of the user rotates around X, Y, and Z axes to view a screen provided by media content. FIG. IE is a schematic diagram of 3DoF+ according to an exemplary embodiment of this disclosure. As shown in FIG. IE, 3DoF+ means that when a virtual scene provided by immersive media has specific depth information, the head of a user can move within a limited space based on 3DoF to view a screen provided by media content.

(3) Encoding procedure for media content of immersive media



[0027] The projected image may be encoded directly, or the projected image may be encoded after being regionally encapsulated. In modern mainstream video coding technologies, using High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC), and Audio Video Coding Standard (AVS) as an example, a hybrid encoding framework is used to perform a series of operations and processing on an inputted original video signal as follows:
  1. 1) Block partition structure: An inputted image can be partitioned into a plurality of non-overlapping processing units according to a size of the processing unit, and similar compression operations are performed on all the processing units. Such a processing unit can be referred to as a coding tree unit (CTU) or a largest coding unit (LCU). The CTU may be further partitioned more finely, to obtain one or more basic coding units, which are referred to as coding units (CUs). Each CU can be the most basic element in an encoding process. FIG. IF is a schematic diagram of input image division according to an embodiment of this disclosure. Various possible encoding modes for each CU are described below.
  2. 2) Predictive coding: Predictive coding can include modes such as intra prediction and inter prediction. After an original video signal is predicted by using a selected reconstructed video signal, a residual video signal is obtained. The content production device needs to decide for a current CU to select the most suitable one in a plurality of possible predictive coding modes, and inform the content playback device.
    1. a. Intra prediction: A predicted signal comes from a region in a same image that has been encoded and reconstructed.
    2. b. Inter prediction: A predicted signal comes from another image (referred to as a reference image) that has been encoded and that is different from a current image.
  3. 3) Transform & Quantization: A transform operation, such as a Discrete Fourier Transform (DFT) or a Discrete Cosine Transform (DCT), can be performed on a residual video signal, to transform the signal into a transform domain, which is referred to as a transform coefficient. A lossy quantitative operation is performed on the signal in the transform domain, to lose some information, so that the quantized signal is beneficial to a compressed expression. In some video encoding standards, there may be more than one transform mode to select from. Therefore, the content production device also needs to select one of the more than one transform mode for a currently encoded CU, and inform the content playback device. The fineness of quantization usually depends on a quantization parameter (QP). A larger value of the QP represents that coefficients within a larger range will be quantized as a same output, and therefore, may usually bring a larger distortion and a lower bit rate. Conversely, a smaller value of the QP represents that coefficients within a smaller range will be quantized as a same output, and therefore, may usually bring a smaller distortion while corresponding to a higher bit rate.
  4. 4) Entropy coding or statistical coding: Statistical compression coding can be performed on quantized transform domain signals according to frequencies of occurrence of values, and finally, a binarized (0 or 1) compressed bitstream is outputted. In addition, entropy coding also can be performed on other information, such as a selected mode and a motion vector, generated through encoding, to reduce a bit rate. Statistical coding is a lossless coding mode that can effectively reduce a bit rate required for expressing a same signal. Common statistical coding modes include Variable Length Coding (VLC) or Content Adaptive Binary Arithmetic Coding (CABAC).
  5. 5) Loop filtering: Operations of inverse quantization, inverse transform, and prediction compensation (reverse operations of the foregoing operations 2 to 4) can be performed on an image that has been encoded, to obtain a reconstructed decoded image. Compared with an original image, a reconstructed image has some information different from that of the original image due to impact of quantization, resulting in a distortion. Performing a filtering operation, for example, deblocking, sample adaptive offset (SAO) filtering, or adaptive loop filter (ALF) filtering, on the reconstructed image can effectively reduce a degree of distortion produced by quantization. Because the filtered reconstructed image is used as a reference for subsequently encoding an image and is used for predicting a future signal, the foregoing filtering operation is also referred to as loop filtering, that is, a filtering operation in an encoding loop.


[0028] When a six degrees of freedom production technology is adopted (which referred to as 6DoF when a user can move more freely in a simulated scene), in a video encoding procedure, a specific encoding mode (e.g., point cloud encoding) needs to be adopted for encoding.

(4) Encapsulation procedure for immersive media



[0029] An audio bitstream and a video bitstream are encapsulated according to a file format of immersive media (e.g., International Organization for Standardization (ISO) base media file format (ISOBMFF)) into a file container to form a media file resource of the immersive media. The media file resource may be a media file or a media segment that forms a media file of the immersive media. In addition, metadata of the media file resource of the immersive media is recorded by using media presentation description (MPD) information according to requirements of the file format of the immersive media. The metadata herein is a general term for information related to presentation of the immersive media. The metadata may include description information for media content, description information for a viewport, signaling information related to presentation of the media content, and the like. As shown in FIG. 1A, the content production device will store the media presentation description information and media file resource formed after the data processing procedure.

II. Data processing procedure on the side of the content playback device side:


(1) File decapsulation and decoding procedures for immersive media



[0030] The content playback device can adaptively dynamically obtain a media file resource of immersive media and corresponding media presentation description information from the content production device as recommended by the content production device or according to user requirements on the side of the content playback device. For example, the content playback device may determine an orientation and a position of a user according to tracking information of the head/eye/body of the user, and then dynamically request a corresponding media file resource from the content production device based on the determined orientation and position. The media file resource and media presentation description information are transmitted from the content production device to the content playback device using a transmission mechanism (e.g., DASH or SMT). The file decapsulation procedure on the side of the content playback device is reverse to the file encapsulation procedure on the side of the content production device. The content playback device decapsulates the media file resource according to requirements of the file format of the immersive media, to obtain an audio bitstream and a video bitstream. The decoding procedure on the side of the content playback device is reverse to the encoding procedure on the side of the content production device. The content playback device performs audio decoding on the audio bitstream to restore the audio content. In addition, the procedure of decoding the video bitstream by the content playback device includes the following: ① Decode the video bitstream to obtain a 2D projected image. ② Reconstruction is performed on the projected image according to the media presentation description information, to convert the projected image into a 3D image. The reconstruction herein refers to processing of re-projecting the 2D projected image to a 3D space.

[0031] It can be understood according to the foregoing encoding procedure that on the side of the content playback device, for each CU, after obtaining a compressed bitstream, the content playback device first performs entropy decoding to obtain various mode information and quantized transform coefficients. Inverse quantization and inverse transform are performed on the coefficients, to obtain a residual signal. On the other hand, a predicted signal corresponding to the CU may be obtained according to the encoding mode information, and a reconstructed signal can be obtained by adding the residual signal and the predicted signal. Finally, a loop filtering operation needs to be performed on a reconstructed value of the decoded image before a final output signal is generated.

(2) Rendering procedure for the immersive media



[0032] The content playback device renders audio content obtained by audio decoding and a 3D image obtained by video decoding according to metadata related to rendering and the viewport in the media presentation description information, and implements playback and output of the 3D image after completing the rendering. When the 3DoF and 3DoF+ production technologies are adopted, the content playback device mainly renders the 3D image based on a current viewpoint, parallax, depth information, and the like. When the 6DoF production technology is adopted, the content playback device mainly renders the 3D image in the viewport based on the current viewpoint. The viewpoint refers to a viewing position point of the user. The parallax refers to a line-of-sight difference caused by the two eyes of the user or a line-of-sight difference generated due to a movement. The viewport refers to a viewed region.

[0033] The immersive media system supports the data box. The data box can refer to a data block or object including metadata. That is, the data box includes metadata of corresponding media content. Immersive media may include a plurality of data boxes, for example, including a rotation data box, a coverage information data box, a media file format data box, and the like. In a scenario of the immersive media system, to improve a user's viewing experience, a content producer usually adds more diverse presentation forms for the media content of immersive media, and zooming is one of the important presentation forms. The zoom policy can be configured in the media format data box of the immersive media, for example, be configured in an ISOBMFF data box. Description information corresponding to the zoom policy may be configured in a zoom description signaling file, for example, be configured in a sphere region zooming descriptor or a 2D region zooming descriptor. According to related encoding standards (e.g.,, AVS) for immersive media, for the syntax of the media file format data box of the immersive media, reference may be made to Table 1 below:

Table 1



[0034]  aligned(8) class RegionWiseZoomingStruct() { unsigned int(8) num_regions; for (i = 0; i < num_regions; i++) { unsigned int(32) zoom_reg_width[i]; unsigned int(32) zoom reg_height[i]; unsigned int(32) zoom_reg_top[i]; unsigned int(32) zoom_reg_left[i]; unsigned int(8) zoom_ratio; unsigned int(8) zoom_algorithm_type; unsigned int(8) zoom_symbolization_type; unsigned int(8) zoom_area_type; } } string zoom_description;

[0035] The semantic meanings of the syntax shown in Table 1 above are as follows: num regions indicates a quantity of sphere regions corresponding to a same omnidirectional video or a quantity of zoom regions in 2D regions on a projected image. zoom _reg_width[i] indicates a width of the ith zoom region. zoom_reg_height[i] indicates a height of the ith zoom region. zoom_reg_top[i] indicates a vertical offset of the ith zoom region. zoom_reg_left[i] indicates a horizontal offset of the ith zoom region. FIG. 2 is a schematic diagram of the ith zoom region according to an exemplary embodiment of this disclosure. As shown in FIG. 2, 201 represents a width of a projected image to which the ith zoom region belongs, 202 represents a height of the projected image to which the ith zoom region belongs, 203 represents the horizontal offset zoom_reg_left[i] of the ith zoom region, 204 represents the vertical offset zoom_reg_top[i] of the ith zoom region, 205 represents the height zoom reg_height[i] of the ith zoom region, and 206 represents the width zoom_reg_width[i] of the ith zoom region. zoom_ratio indicates a zoom ratio of the ith zoom region and is in unit of 2 to 3, i being a positive integer. When a value of zoom ratio is set to 0, it indicates that a size of the ith zoom region after zoom processing is performed thereon is the same as a size thereof on which no zoom processing is performed. When the value of zoom ratio is set to non-0, the value of zoom ratio indicates an actual ratio or an approximate ratio between the size of the ith zoom region after zoom processing is performed thereon and the size (original size) thereof on which no zoom processing is performed. zoom_algorithm_type indicates a zoom algorithm type used when the ith zoom region is rendered. A mapping relationship between a value of zoom_algorithm_type and the zoom algorithm type is shown in Table 2:
Table 2
Value Description
0 Raised zoom
1 Spherical zoom (ensuring a minimal center distortion)
2 Disc-shaped uniform zoom
3..255 Undefined


[0036] zoom_symbolization_type indicates a boundary symbol type of the ith zoom region. zoom_area_type indicates a type of the ith zoom region, and a mapping relationship between a value of zoom_algorithm_type and the type of the zoom region is shown in Table 3:
Table 3
Value Description
0 Zoom region for director editing, that is, zooming a video according to a creative intention of a content provider
1 Zoom region selected according to measurement results of viewing statistics
2..239 Reserved
240..255 Undefined


[0037] zoom_description carries text description of the ith zoom region.

[0038] Description information corresponding to a zoom policy of a media file format data box of immersive media is stored in a zoom description signaling file of the immersive media. The zoom description signaling file may include at least one of a sphere region zooming (SphereRegionZooming, SRWZ) descriptor or a two-dimensional (2D) region zooming (2DRegionZooming, 2DWZ) descriptor.

[0039] The sphere region zooming (SphereRegionZooming, SRWZ) descriptor is a supplemental property (SupplementalProperty) element of which a scheme identifier (@schemeIdUri) is equal to "urn:avs:ims:2018:srwz". The SRWZ descriptor indicates a sphere region of an omnidirectional video in an omnidirectional video track carried by a representation hierarchy corresponding thereto and one or more zoom regions of the sphere region on a projected image of the omnidirectional video.

[0040] When there is an SRWZ descriptor applicable to the representation hierarchy, and a sphere region zooming data box (SphereRegionZoomingBox) also exists in a track corresponding to the representation hierarchy, the SRWZ descriptor carries information equivalent to SphereRegionZoomingBox. The content playback device can request, according to the SRWZ descriptor, to obtain a video file corresponding to a sphere region zooming operation on the omnidirectional video. The SRWZ descriptor includes elements and properties defined in Table 4 below.
Table 4
Element and property Use Data type Description
sphRegionZoom 1 omaf: sphRegionZoomType Container element, of which a property and an element indicate a sphere region and a zoom region corresponding thereto.
sphRegionZoom @shape_type Optional xs:unsignedByte Indicate a shape type of a sphere region. If a value is set to 0, a sphere region is indicated by four large circles, and if a value is set to 1, a sphere region is indicated by two azimuths and two elevation angle circles.
sphRegionZoom@remaining _area_flag Optional xs:boolean A value of 0 represents that all sphere regions are defined by the SphRegionZoom. sphRegionInfo element, a value of 1 represents that all sphere regions except the last sphere region are defined by the SphRegionZoom. sphRegionInfo element, and the last remaining sphere region is a sphere region not covered by a sphere region set defined by the SphRegionZoom. sphRegionInfo element in a content coverage range.
sphRegionZoom @view _idc_presence_flag Optional xs:boolean A value of 0 represents that the SphRegionZoom. sphRegionInfo@view _idc property does not exist. A value of 1 the SphRegionZoom. sphRegionInfo@view _idc property exists, indicating a relationship between a sphere region and a specific view (a left view, a right view, or both) or a monocular image.
sphRegionZoom@default_vi ew_idc Condition required omaf:ViewType A value of 0 represents that a sphere region is a monocular image, a value of 1 represents that a sphere region is a left view of a stereo image, a value of 2 represents that a sphere region is a right view of a stereo image, and a value of 3 represents that a sphere region includes a left view and a right view of a stereo image.
sphRegionZoom. sphRegionInfo 1..255 omaf: zoomInfoType Element, of which a property describes sphere region information in the sphRegionZoom element. There is at least one specified sphere region.
sphRegionZoom.sphRegionI nfo@view_idc Condition required omaf: ViewType A value of 0 represents that a sphere region is a monocular image, a value of 1 represents that a sphere region is a left view of a stereo image, a value of 2 represents that a sphere region is a right view of a stereo image, and a value of 3 represents that a sphere region includes a left view and a right view of a stereo image.
sphRegionZoom.sphRegionI nfo@centre_azimuth Condition required omaf: Range 1 Indicate an azimuth of a center point of a sphere region in unit of 2-16 degrees.
sphRegionZoom.sphRegionI nfo@centre_elevation Condition required omaf:Range2 Indicate an elevation angle of a center point of a sphere region in unit of 2-16 degrees.
sphRegionZoom.sphRegionI nfo@centre_tilt Condition required omaf: Range 1 Indicate a tilt angle of a center point of a sphere region in unit of 2-16 degrees.
sphRegionZoom.sphRegionI nfo@azimuth_range Condition required omaf:HRange Define an azimuth range of a sphere region in unit of 2-16 degrees using a center point thereof.
sphRegionZoom.sphRegionI nfb@elevation_range Condition required omaf:HRange Define an elevation angle range of a sphere region in unit of 2-16 degrees using a center point thereof.
sphRegionZoom. zoomInfo 1..255 omaf: sphRegionInfoType Element, of which a property describes zoom region information corresponding to a sphere region defined by the sphRegionZoom.sphRegionInfo element. There is at least one specified zoom region.
sphRegionZoom.zoomInfo@ zoom_region_left Condition required xs:unsignedShort Specify a horizontal coordinate of the upper left corner of a zoom region in a projected image in unit of a brightness sample.
sphRegionZoom.zoomInfo@ zoom _region_right Condition required xs:unsignedShort Specify a vertical coordinate of the upper left corner of a zoom region in a projected image in unit of a brightness sample.
sphRegionZoom.zoomInfo@ zoom_region_width Condition required xs:unsignedShort Specify a width of a zoom region in a projected image in unit of a brightness sample.
sphRegionZoom.zoomInfo@ zoom region _height Condition required xs:unsignedShort Specify a height of a zoom region in a projected image in unit of a brightness sample.
SphRegionZoom.zoomInfo@ zoom_ratio Condition required xs:unsignedByte Indicate a zoom ratio of a zoom region in a projected image.
SphRegionZoom.zoomInfo@ zoom_algorithm_type Optional omaf: listofUnsignedByte Indicate a zoom algorithm of a zoom region in a projected image.
SphRegionZoom.zoomInfo@ zoom_symbolization_type Optional omaf: listofUnsignedByte Indicate a symbolized type of a zoom boundary of a zoom region in a projected image.
SphRegionZoom.zoomInfo@ zoom_description Optional xs: string Indicate description information of a zoom region in a projected image.


[0041] The 2D region zooming (2DRegionZooming, 2DWZ) descriptor corresponding to a media file format data box of immersive media is a supplemental property (SupplementalProperty) element of which a scheme identifier (@schemeIdUri) is equal to "urn:mpeg:mpegI:omaf:2018:2dwz". The 2DWZ descriptor indicates a 2D region on a projected image of an omnidirectional video in an omnidirectional video track carried by a representation hierarchy corresponding thereto and one or more zoom regions of the 2D region on the projected image of the omnidirectional video.

[0042] When there is a 2DWZ descriptor applicable to the representation hierarchy, and a 2D region zooming data box (2DRegionZoomingBox) also exists in a track corresponding to the representation hierarchy, the 2DWZ descriptor carries information equivalent to 2DRegionZoomingBox. The content playback device can request, according to the 2DWZ descriptor, to obtain a video file corresponding to a 2D region zooming operation on the projected image of the omnidirectional video. The 2DWZ descriptor includes elements and properties defined in Table 5 below.
Table 5
Element and property Use Data type Description
twoDRegionZoom 1 omaf:twoDRegionZoomType Container element, of which a property and an element indicate a 2D region and a zoom region corresponding thereto.
twoDRegionZoom@remaini ng_area_flag Optional xs:boolean A value of 0 represents that all 2D regions are defined by the twoD RegionZoom.twoD RegionInfo element, a value of 1 represents that all 2D regions except the last 2D region are defined by the twoD RegionZoom.twoD RegionInfo element, and the last remaining 2D region is a 2D region not covered by a 2D region set defined by the twoD RegionZoom.twoD RegionInfo element in a content coverage range.
twoDRegionZoom@view_id c_presence_flag Optional xs:boolean A value of 0 represents that the twoD RegionZoom.twoD RegionInfo @view_idc property does not exist. A value of 1 represents that the twoD RegionZoom.twoD RegionInfo@ view_idc property exists, indicating a relationship between a 2D region and a specific view (a left view, a right view, or both) or a monocular image.
twoD RegionZoom@default_ view_idc Condition required omaf: ViewType A value of 0 represents that a 2D region is a monocular image, a value of 1 represents that a 2D region is a left view of a stereo image, a value of 2 represents that a 2D region is a right view of a stereo image, and a value of 3 represents that a 2D region includes a left view and a right view of a stereo image.
twoDRegionZoom.twoDReg ionInfo 1..255 omaf: twoDRegionInfoType Element, of which a property describes 2D region information in the twoDRegionZoom element. There is at least one specified 2D region.
twoDRegionZoom.twoDReg ionInfo@view_idc Condition required omaf: ViewType A value of 0 represents that a 2D region is a monocular image, a value of 1 represents that a 2D region is a left view of a stereo image, a value of 2 represents that a 2D region is a right view of a stereo image, and a value of 3 represents that a 2D region includes a left view and a right view of a stereo image.
twoDRegionZoom.twoDReg ionInfo@left_offset Condition required xs:unsignedShort Specify a horizontal coordinate of the upper left corner of a 2D region in a projected image in unit of a brightness sample.
twoDRegionZoom.twoDReg ionInfo@top_offset Condition required xs:unsignedShort Specify a vertical coordinate of the upper left corner of a 2D region in a projected image in unit of a brightness sample.
twoDRegionZoom.twoDReg ionInfo@region_width Condition required xs:unsignedShort Specify a width of a 2D region in a projected image in unit of a brightness sample.
twoDRegionZoom.twoDReg ionInfo@region_height Condition required xs:unsignedShort Specify a height of a 2D region in a projected image in unit of a brightness sample.
twoDRegionZoom. zoomInfo 1..255 omaf: zoomInfoType Element, of which a property describes zoom region information corresponding to a 2D region defined by the twoD RegionZoom.twoD RegionInfo element. There is at least one specified zoom region.
twoDRegionZoom. zoomInfo @zoom_region_left Condition required xs:unsignedShort Specify a horizontal coordinate of the upper left corner of a zoom region in a projected image in unit of a brightness sample.
twoDRegionZoom. zoomInfo @zoom_region_right Condition required xs:unsignedShort Specify a vertical coordinate of the upper left corner of a zoom region in a projected image in unit of a brightness sample.
twoDRegionZoom. zoomInfo @zoom_region_width Condition required xs:unsignedShort Specify a width of a zoom region in a projected image in unit of a brightness sample.
twoDRegionZoom. zoomInfo @zoom_region_height Condition required xs:unsignedShort Specify a height of a zoom region in a projected image in unit of a brightness sample.
twoDRegionZoom. zoomInfo @zoom_ratio Condition required xs:unsignedByte Indicate a zoom ratio of a zoom region in a projected image.
twoDRegionZoom. zoomInfo @zoom_algorithm_type Optional omaf: listofUnsignedByte Indicate a zoom algorithm of a zoom region in a projected image.
twoDRegionZoom. zoomInfo @zoom_symbolization_type Optional omaf: listofUnsignedByte Indicate a symbolized type of a zoom boundary of a zoom region in a projected image.
twoDRegionZoom. zoomInfo @zoom_description Optional xs: string Indicate description information of a zoom region in a projected image.


[0043] According to the media file format data box shown in Table 1, with reference to the description information in the sphere region zooming descriptor shown in Table 4 and the 2D region zooming descriptor shown in Table 5, only an autonomous zoom operation of a user on the side of the content playback device on immersive media can be supported. As can be learned from the above, an autonomous zoom behavior of a user may cause bandwidth waste, and a better viewing experience cannot be obtained. To save bandwidth while improving the user viewing experience, in the embodiments of this disclosure, the media file format data box and media presentation description file of the related immersive media are extended. For example, the semantic meanings of the syntax of the extended media file format data box, reference may be made to Table 6 below:

Table 6



[0044]  aligned(8) class RegionWiseZoomingStruct() { unsigned int(8) num_regions; for (i = 0; i < num_regions; i++) { unsigned int(32) zoom_reg_width[i]; unsigned int(32) zoom reg_height[i]; unsigned int(32) zoom_reg_top[i]; unsigned int(32) zoom_reg_left[i]; unsigned int(8) zoom_ratio; unsigned int(8) zoom_algorithm_type; unsigned int(8) zoom_symbolization_type; unsigned int(8) zoom_areatype; string zoom_description; } unsigned bit(1) auto_zoom_flag; bit(7) reserved; if(auto_zoom_flag == 1){ unsigned int(8) zoom_steps; for(i = 0; i < zoom_steps; i++){ unsigned int(8) zoom ratio; unsigned int(8) zoom_duration; unsigned int(8) zoom_duration_unit; } } }

[0045] Semantic meanings of the extended syntax newly added to Table 6 above relative to Table 1 are the following ① to ④:

① The zoom flag field auto_zoom_flag indicates whether to enable a target zoom mode (e.g., a director zoom mode). When a value of auto_zoom flag is set to an effective value, it indicates that the target zoom mode is enabled, that is, zoom processing needs to be performed on the ith zoom region in the target zoom mode. When a value of auto_zoom_flag is set to an ineffective value, it indicates that the target zoom mode is disabled, that is, zoom processing does not need to be performed on the ith zoom region in the target zoom mode, i being a positive integer. The effective value and ineffective value are set according to the requirements of the encoding standard. Using the AVS standard as an example, the effective value is 1, and the ineffective value is 0.

② The zoom step field zoom_steps indicates that a quantity of zoom steps included when the zoom processing is performed on the ith zoom region of the immersive media in the target zoom mode is m, m being a positive integer, and is used for indicating that zoom processing needs to be performed on the ith zoom region in target zoom mode m times.

③ When one zoom step corresponds to one zoom ratio field zoom_ratio, m zoom steps correspond to m zoom ratio . The jth zoom ratio field indicates a zoom ratio adopted when the jth zoom step in the zoom processing is performed on the ith zoom region of the immersive media. The zoom ratio field is in unit of 2 to 3, j being a positive integer and j≤m. When a value of the jth zoom_ratio field is 0, the jth zoom_ratio field indicates that a size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed in the target zoom mode thereon is the same as a size thereof before the zoom processing is performed thereon. When the value of the jth zoom ratio field is non-0, the jth zoom ratio field indicates that a ratio between the size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed in the target zoom mode thereon and the size thereof before the zoom processing is performed thereon is the value of the jth zoom_ratio field.

④ When one zoom step corresponds to one zoom duration zoom_durationand one unit of measure of the duration zoom_duration_unit, m zoom steps correspond to m zoom _duration fields and m zoom_duration_unit fields. The jth zoom_duration field indicates a value of a duration when the jth zoom step of the zoom processing is performed on the ith zoom region of the immersive media. A value of the zoom_duration field is non-zero value. The jth zoom_duration_unit field is used for indicating a unit of measure of the duration when the jth zoom step of the zoom processing is performed on the ith zoom region of the immersive media, the zoom_duration_unit field being in unit of second, and the zoom_duration_unit field being a non-zero value.



[0046] The zoom description signaling file includes at least one of the following: a sphere region zooming descriptor or a 2D region zooming descriptor. For semantic meanings of the extended syntax of the sphere region zooming descriptor, reference may be made to Table 7 below:
Table 7
Element and property Use Data type Description
sphRegionZoom 1 omaf: sphRegionZoomType Container element, of which a property and an element indicate a sphere region and a zoom region corresponding thereto.
sphRegionZoom @shape_type Optional xs:unsignedByte Indicate a shape type of a sphere region. If a value is set to 0, a sphere region is indicated by four large circles, and if a value is set to 1, a sphere region is indicated by two azimuths and two elevation angle circles.
sphRegionZoom@remaining _area_flag Optional xs:boolean A value of 0 represents that all sphere regions are defined by the SphRegionZoom.sphRegionInfo element, a value of 1 represents that all sphere regions except the last sphere region are defined by the SphRegionZoom.sphRegionInfo element, and the last remaining sphere region is a sphere region not covered by a sphere region set defined by the SphRegionZoom.sphRegionInfo element in a content coverage range.
sphRegionZoom @view_idc_presence_flag Optional xs:boolean A value of 0 represents that the SphRegionZoom.sphRegionInfo @view_idc property does not exist. A value of 1 represents that the SphRegionZoom.sphRegionInfo @view_idc property exists, indicating a relationship between a sphere region and a specific view (a left view, a right view, or both) or a monocular image.
sphRegionZoom@default_vi ew_idc Condition required omaf: ViewType A value of 0 represents that a sphere region is a monocular image, a value of 1 represents that a sphere region is a left view of a stereo image, a value of 2 represents that a sphere region is a right view of a stereo image, and a value of 3 represents that a sphere region includes a left view and a right view of a stereo image.
sphRegionZoom. sphRegionInfo 1..255 omaf: zoomInfoType Element, of which a property describes sphere region information in the sphRegionZoom element. There is at least one specified sphere region.
sphRegionZoom. sphRegionI nfo@view_idc Condition required omaf: ViewType A value of 0 represents that a sphere region is a monocular image, a value of 1 represents that a sphere region is a left view of a stereo image, a value of 2 represents that a sphere region is a right view of a stereo image, and a value of 3 represents that a sphere region includes a left view and a right view of a stereo image.
sphRegionZoom. sphRegionI nfo@centre_azimuth Condition required omaf: Range 1 Indicate an azimuth of a center point of a sphere region in unit of 2-16 degrees.
sphRegionZoom. sphRegionI nfo@centre_elevation Condition required omaf:Range2 Indicate an elevation angle of a center point of a sphere region in unit of 2-16 degrees.
sphRegionZoom. sphRegionI nfo@centre_tilt Condition required omaf: Range 1 Indicate a tilt angle of a center point of a sphere region in unit of 2-16 degrees.
sphRegionZoom. sphRegionI nfo@azimuth_range Condition required omaf:HRange Define an azimuth range of a sphere region in unit of 2-16 degrees using a center point thereof.
sphRegionZoom. sphRegionI nfo@elevation_range Condition required omaf:HRange Define an elevation angle range of a sphere region in unit of 2-16 degrees using a center point thereof.
sphRegionZoom. zoomInfo 1..255 omaf: sphRegionInfoType Element, of which a property describes zoom region information corresponding to a sphere region defined by the sphRegionZoom. sphRegionInfo element. There is at least one specified zoom region.
sphRegionZoom.zoomInfo@ zoom_region_left Condition required xs:unsignedShort Specify a horizontal coordinate of the upper left corner of a zoom region in a projected image in unit of a brightness sample.
sphRegionZoom.zoomInfo@ zoom region _right Condition required xs:unsignedShort Specify a vertical coordinate of the upper left corner of a zoom region in a projected image in unit of a brightness sample.
sphRegionZoom.zoomInfo@ zoom region-width Condition required xs:unsignedShort Specify a width of a zoom region in a projected image in unit of a brightness sample.
sphRegionZoom.zoomInfo@ zoom region height Condition required xs:unsignedShort Specify a height of a zoom region in a projected image in unit of a brightness sample.
SphRegionZoom.zoomInfo@ zoom _ratio Condition required xs:unsignedByte Indicate a zoom ratio of a zoom region in a projected image.
SphRegionZoom.zoomInfo@ zoom_algorithm_type Optional omaf: listofUnsignedByte Indicate a zoom algorithm of a zoom region in a projected image.
SphRegionZoom.zoomInfo@ zoom_symbolization_type Optional omaf: listofUnsignedByte Indicate a symbolized type of a zoom boundary of a zoom region in a projected image.
SphRegionZoom.zoomInfo@ zoom_description Optional xs: string Indicate description information of a zoom region in a projected image.
SphRegionZoom.zoomInfo@ auto zoom _flag Condition required xs:boolean Indicate whether to enable a director zoom mode.
SphRegionZoom.zoomInfo@ zoom_ratio Condition required xs:unsignedByte Indicate a zoom ratio at which a zoom step is actually performed, which is in unit of 2-3. When being 0, a value of the field indicates that the region has not been zoomed. When being non-0, the value of the field indicates an actual ratio or an approximate ratio between the size of the region after zooming and an original size thereof.
SphRegionZoom.zoomInfo@ zoom_duration Condition required xs:unsignedByte Indicate a duration of each zoom step, which is in unit of the zoom duration _unit field, where a value of the field cannot be 0.
SphRegionZoom.zoomInfo@ zoom_duration_unit Condition required xs:unsignedByte zoom _duration unit indicates a unit of measure of a zoom step duration, which is in unit of second, where a value of the field cannot be 0.


[0047] It can be learned by comparing Table 7 with Table 4 that description information of a zoom policy in the target zoom mode (e.g., the director zoom mode) is added to the extended sphere region zooming descriptor in this embodiment of this disclosure relative to the sphere region zooming descriptor in the related standard, and includes the elements and properties in Table 7 above, SphRegionZoom.zoomInfo@auto_zoom_flag, SphRegionZoom.zoomInfo@zoom_ratio, SphRegionZoom.zoomInfo@zoom_duration, and SphRegionZoom.zoomInfo@zoom_duration_unit, as well as related descriptions of the elements and properties.

[0048] For semantic meanings of the extended syntax of the 2D region zooming descriptor, reference may be made to Table 8 below:
Table 8
Element and property Use Data type Description
twoDRegionZoom 1 omaf:twoDRegionZoom Type Container element, of which a property and an element indicate a 2D region and a zoom region corresponding thereto.
twoD RegionZoom@remaining _area_flag Optional xs:boolean A value of 0 represents that all 2D regions are defined by the twoD RegionZoom.twoD RegionInfo element, a value of 1 represents that all 2D regions except the last 2D region are defined by the twoD RegionZoom.twoD RegionInfo element, and the last remaining 2D region is a 2D region not covered by a 2D region set defined by the twoD RegionZoom.twoD RegionInfo element in a content coverage range.
twoDRegionZoom@view_idc_ presence_flag Optional xs:boolean A value of 0 represents that the twoD RegionZoom.twoD RegionInfo @view_idc property does not exist. A value of 1 represents that the twoD RegionZoom.twoD RegionInfo @view_idc property exists, indicating a relationship between a 2D region and a specific view (a left view, a right view, or both) or a monocular image.
twoDRegionZoom@default_vi ew_idc Condition required omaf: ViewType A value of 0 represents that a 2D region is a monocular image, a value of 1 represents that a 2D region is a left view of a stereo image, a value of 2 represents that a 2D region is a right view of a stereo image, and a value of 3 represents that a 2D region includes a left view and a right view of a stereo image.
twoD RegionZoom.twoD Regio nInfo 1..255 omaf: twoDRegionInfoType Element, of which a property describes 2D region information in the twoDRegionZoom element. There is at least one specified 2D region.
twoD RegionZoom.twoD Regio nInfo@view_idc Condition required omaf: ViewType A value of 0 represents that a 2D region is a monocular image, a value of 1 represents that a 2D region is a left view of a stereo image, a value of 2 represents that a 2D region is a right view of a stereo image, and a value of 3 represents that a 2D region includes a left view and a right view of a stereo image.
twoD RegionZoom.twoD Regio nInfo@left_offset Condition required xs:unsignedShort Specify a horizontal coordinate of the upper left corner of a 2D region in a projected image in unit of a brightness sample.
twoD RegionZoom.twoD Regio nInfo@top_offset Condition required xs:unsignedShort Specify a vertical coordinate of the upper left corner of a 2D region in a projected image in unit of a brightness sample.
twoD RegionZoom.twoD Regio nInfo@region_width Condition required xs:unsignedShort Specify a width of a 2D region in a projected image in unit of a brightness sample.
twoD RegionZoom.twoD Regio nInfo@region_height Condition required xs:unsignedShort Specify a height of a 2D region in a projected image in unit of a brightness sample.
twoDRegionZoom. zoomInfo 1..255 omaf:zoomInfoType Element, of which a property describes zoom region information corresponding to a 2D region defined by the twoD RegionZoom.twoD RegionInfo element. There is at least one specified zoom region.
twoDRegionZoom.zoomInfo@ zoom_region_left Condition required xs:unsignedShort Specify a horizontal coordinate of the upper left corner of a zoom region in a projected image in unit of a brightness sample.
twoDRegionZoom.zoomInfo@ zoom region _right Condition required xs:unsignedShort Specify a vertical coordinate of the upper left corner of a zoom region in a projected image in unit of a brightness sample.
twoDRegionZoom.zoomInfo@ zoom region-width Condition required xs:unsignedShort Specify a width of a zoom region in a projected image in unit of a brightness sample.
twoDRegionZoom.zoomInfo@ zoom region height Condition required xs:unsignedShort Specify a height of a zoom region in a projected image in unit of a brightness sample.
twoDRegionZoom.zoomInfo@ zoom_ratio Condition required xs:unsignedByte Indicate a zoom ratio of a zoom region in a projected image.
twoDRegionZoom.zoomInfo@ zoom_algorithm_type Optional omaf: listofUnsignedByte Indicate a zoom algorithm of a zoom region in a projected image.
twoDRegionZoom.zoomInfo@ zoom_symbolization_type Optional omaf: listofUnsignedByte Indicate a symbolized type of a zoom boundary of a zoom region in a projected image.
twoDRegionZoom.zoomInfo@ zoom_description Optional xs: string Indicate description information of a zoom region in a projected image.
twoDRegionZoom.zoomInfo@ auto_zoom_flag Condition required xs:boolean Indicate whether to enable a director zoom mode.
twoDRegionZoom.zoomInfo@ zoom_ratio Condition required xs:unsignedByte Indicate a zoom ratio at which a zoom step is actually performed, which is in unit of 2-3. When being 0, a value of the field indicates that the region has not been zoomed. When being non-0, the value of the field indicates an actual ratio or an approximate ratio between the size of the region after zooming and an original size thereof.
twoDRegionZoom.zoomInfo@ zoom_duration Condition required xs:unsignedByte Indicate a duration of each zoom step, which is in unit of the zoom _duration unit field, where a value of the field cannot be 0.
twoDRegionZoom.zoomInfo@ zoom_duration_unit Condition required xs:unsignedByte zoom _duration unit indicates a unit of measure of a zoom step duration, which is in unit of second, where a value of the field cannot be 0.


[0049] It can be learned by comparing Table 8 with Table 5 that description information of a zoom policy in the target zoom mode (e.g., the director zoom mode) is added to the extended 2D region zooming descriptor in this embodiment of this disclosure relative to the 2D region zooming descriptor in the related standard, and includes the elements and properties in Table 8 above, twoDRegionZoom.zoomInfo@ auto zoom_flag, twoDRegionZoom.zoomInfo@ zoom ratio, twoDRegionZoom.zoomInfo@ zoom_duration, and twoDRegionZoom.zoomInfo@ zoom _duration unit, as well as related descriptions of the elements and properties.

[0050] According to the media file format data box shown in Table 6 above in the embodiments of this disclosure, with reference to the descriptions on the zoom policy in the sphere region zooming descriptor shown in Table 7 and the 2D region zooming descriptor shown in Table 8, in the target zoom mode (e.g., the director zoom mode), the user on the side of the content playback device can obtain a video file corresponding to a current resolution on a side of the content playback device based on an MPD file and consumes it without requesting videos of all zoom resolution versions, thereby saving the transmission bandwidth. In addition, when the content playback device consumes a video file corresponding to a target zoom mode at a current resolution, the content playback device automatically presents, according to the target zoom mode, a zoom effect specified by an immersive media content producer, so that a user can obtain a best viewing experience.

[0051] FIG. 3 is a flowchart of a data processing method for immersive media according to an exemplary embodiment of this disclosure. The method may be performed by the content production device or the content playback device in the immersive media system. The method includes the following steps S301 and S302:

[0052] In step S301, obtain a media file format data box of immersive media, the media file format data box including a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer.

[0053] In step S302, perform zoom processing on the ith zoom region of the immersive media according to the media file format data box.

[0054] In steps S301 and S302, for the syntax of the media file format data box of the immersive media, reference may be made to Table 6 above. The target zoom mode refers to performing zoom processing on the ith zoom region according to a zoom policy when the ith zoom region in the immersive media satisfies a zoom condition (e.g., a playback progress of the immersive media reaches a preset position, or a field of view of the user turns to the preset region). The zoom policy is generated according to zoom information specified by an immersive media content producer. For example, assuming that the zoom information specified by the immersive media content producer is that when the field of view of the user turns to the ith zoom region, the ith zoom region is enlarged to 2 times the original size of the ith zoom region, the zoom policy corresponding to the zoom information carries position information (e.g., coordinates) of the ith zoom region, a zoom condition, size information (a width and a height) and a zoom ratio.

[0055] In an implementation, the media file format data box may be an ISO base media file format (ISOBMFF) data box, and the target zoom mode may be a director zoom mode.

[0056] Before zoom processing is performed on the ith zoom region of the immersive media according to the media file format data box, a zoom description signaling file of the immersive media may be obtained first, the zoom description signaling file including description information of the zoom policy. The zoom description signaling file includes at least one of the following: a sphere region zooming descriptor or a 2D region zooming descriptor. The sphere region zooming descriptor is encapsulated in a representation hierarchy in a media presentation description file in the immersive media, and a quantity of the sphere region zooming descriptors in the representation hierarchy is less than or equal to 1. For the syntax of the sphere region zooming descriptor, reference may be made to Table 7. The 2D region zooming descriptor is encapsulated in a representation hierarchy in a media presentation description file in the immersive media, and a quantity of the 2D region zooming descriptors in the representation hierarchy is less than or equal to 1. For the syntax of the 2D region zooming descriptor, reference may be made to Table 8. After the user enables the target zoom mode, the content playback device presents the immersive media file according to the zoom description signaling file and the media file format data box of the immersive media.

[0057] In the embodiments of this disclosure, a media file format data box of immersive media is obtained, the media file format data box including a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer; and zoom processing is performed on the ith zoom region of the immersive media according to the media file format data box. In view of the above, in the target zoom mode, a content playback device does not need to request videos of all zoom resolution versions, thereby saving the transmission bandwidth.

[0058] FIG. 4 is a flowchart of another data processing method for immersive media according to an exemplary embodiment of this disclosure. The method may be performed by the content production device in the immersive media system. The method includes the following steps S401 to S403:

[0059] In step S401, obtain zoom information of immersive media.

[0060] The zoom information is generated according to an intention of a content producer. For example, the content producer can perform zoom processing on the immersive media in a production procedure. In an implementation, the content producer may first perform zoom processing on the ith zoom region of the immersive media, for example, first zoom out on the ith zoom region for a few minutes, and then zoom in for a few minutes, or zoom out a few times, then zoom in a few times, and so on, and then, specify the zoom information according to a zoom effect of the zoom processing performed on the ith zoom region. Alternatively, when the content producer determines the resolution of immersive media, the content producer may directly specify zoom information according to the resolution without performing zoom processing on the ith zoom region of the immersive media first. The zoom information is used for indicating a corresponding zoom parameter when zoom processing is performed on the ith zoom region and includes, but is not limited to, a position or size (e.g., a width, a height, and coordinates) of the ith zoom region, a zoom step performed on the ith zoom region (e.g., zooming out and then zooming in), a zoom ratio (e.g., zooming out a few times or zooming in a few times), a duration of the zoom step (e.g., zooming out for a few minutes and then zooming in for a few minutes), and the like.

[0061] In step S402, configure a media file format data box of the immersive media according to the zoom information of the immersive media, the media file format data box including a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer.

[0062] With reference to Table 6 above, the configuration procedure of step S402 may include the following (1) to (4):
  1. (1) The zoom policy includes a zoom flag field auto_zoom_flag. The zoom flag field is set to an effective value when the zoom information of the immersive media indicates that zoom processing needs to be performed on the ith zoom region in the target zoom mode. For example, a value of auto_zoom_flag is set to 1.
  2. (2) The zoom policy includes a zoom step field zoom_steps. The zoom step field is set to m when the zoom information indicates that m zoom steps need to be performed when zoom processing is performed on the ith zoom region of the immersive media in the target zoom mode, m being a positive integer.
  3. (3) When one zoom step corresponds to one zoom ratio field zoom_ratio, m zoom steps correspond to m zoom ratio fields. The jth zoom step in the m zoom steps corresponds to the jth zoom ratio field in the m zoom ratio fields zoom ratio, j being a positive integer and j≤m. The jth zoom ratio field is set to an ineffective value when the zoom information indicates that a size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed thereon is the same as a size thereof before the zoom processing is performed thereon. The jth zoom ratio field is set to an effective value when the zoom information indicates that the size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed thereon is different from the size thereof before the zoom processing is performed thereon, the effective value being a ratio between the size of the ith zoom region after the jth zoom step of the zoom processing is performed thereon and the size thereof before the zoom processing is performed thereon. For example, if in the zoom information of the immersive media, the zoom information of the jth zoom step of the zoom processing performed on the ith zoom region indicates zooming in on the ith zoom region 2 times, a value of the jth zoom ratio field in the m zoom ratio fields may be set to 16.
  4. (4) When one zoom step corresponds to one zoom duration zoom_durationand one unit of measure of the duration zoom_duration_unit, m zoom steps correspond to m zoom _duration fields and m zoom _duration _unit fields. The jth zoom step corresponds to the jth zoom duration field and the jth zoom duration unit field, j being positive integer and j≤m. A value of a duration when the jth zoom step is performed on the ith zoom region as indicated in the zoom information is set as a value of the jth zoom duration field. A unit of measure of the duration when the jth zoom step is performed on the ith zoom region as indicated in the zoom information is set as a value of the jth zoom duration unit field. For example, when the zoom information of the immersive media indicates zooming in on the ith zoom region for 3 minutes when the jth zoom step of the zoom processing is performed on the ith zoom region, a value of a duration of the jth zoom step in the m zoom time fields is set to 3, and a value of a duration unit field of the jth zoom step in the m zoom duration unit fields is set to 60.


[0063] In addition, a zoom description signaling file of the immersive media may be further configured according to the zoom information, the zoom description signaling file including description information of the zoom policy. For the syntax of the zoom description signaling file, reference may be made to Table 7 and Table 8. For a mode of configuring the extended fields in the zoom description signaling file, reference may be made to the foregoing mode of configuring the corresponding fields in the media file format data box, and details are not described herein again.

[0064] The solution of this embodiment of this disclosure is explained below in detail using an example: Zoom information specified by an immersive media content producer for a video A is as follows: zoom on a region B from the 10th minute to the 20th minute (00:10:00 to 00:20:00) of the video A. The region B is enlarged to 2 times an original size from the 10th minute to the 13th minute (00:10:00-00:13:00), the region B is restored to the original size from the 13th minute to the 17th minute (00:13:00-00:17:00), and the region B is enlarged to 4 times the original size from the 17th minute to the 20th minute (00:17:00-00:20:00). Therefore, the content production device sets, according to the zoom information specified by the content producer for the video A, a value of a zoom flag field to 1 and a value of a zoom step field to 3. A value of a zoom ratio field of a zoom step 1 is set to 16 (16×2-3=2), a value of a duration field is set to 3, and a value of a duration unit field is set to 60. It is to be understood that a mode of calculating a duration is 3×60s=180s, that is, 3 minutes. Similarly, a value of a zoom ratio field of a zoom step 2 is set to 0, a value of a duration field is set to 4, and a value of a duration unit field is set to 60. A value of a zoom ratio field of a zoom step 3 is set to 32, a value of a duration field is set to 3, and a value of a duration unit field is set to 60.

[0065] The content production device may, according to the zoom information specified by the content producer, media file format data boxes at various resolutions and corresponding zoom description signaling files for the immersive media. For example, the content production device, according to the zoom information specified by the content producer, a media file format data box 1 and a zoom description signaling file 1 at a 4K resolution (4096×2160 pixels) for the video A, which are used for indicating that the video A presents a zoom effect of "zoom in by 2 times→original ratio→zoom in by 4 times" when zoom processing is performed on the video A at the 4K resolution. In addition, the content production device configures a media file format data box 2 and a zoom description signaling file 2 at a 2K resolution for the video A, which are used for indicating that the video A presents a zoom effect of "zoom in by 1.5 times→original size→zoom in by 3 times" when zoom processing is performed on the video A at the 2K resolution.

[0066] In step S403, add the media file format data box of the immersive media into an encapsulated file of the immersive media.

[0067] In an implementation, the content production device adds immersive media with the same content but different resolutions and media file format data boxes corresponding thereto respectively to encapsulated files of the immersive media.

[0068] In some embodiments, the content production device may package all the media file format data boxes of the immersive media at different resolutions, and send the packaged file to the content playback device, so that the content playback device requests a corresponding encapsulated file according to a current resolution and the packaged file.

[0069] In the embodiments of this disclosure, the content production device configures a media file format data box according to immersive media and zoom information of the immersive media, and adds the media file format data box of the immersive media into an encapsulated file of the immersive media. Therefore, a content playback device can request, according to the media file format data box, a video file corresponding to a target zoom mode at a current resolution from a server and consume it without requesting videos of all zoom resolution versions, thereby saving the transmission bandwidth.

[0070] FIG. 5 is a flowchart of another data processing method for immersive media according to an exemplary embodiment of this disclosure. The method may be performed by the content playback device in the immersive media system. The method includes the following steps S501 to S503:

[0071] In step S501, obtain an encapsulated file of immersive media, the encapsulated file including a media file format data box of the immersive media, the media file format data box including a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer.

[0072] In step S502, parse the encapsulated file, and display the parsed immersive media.

[0073] In an implementation, the content playback device first decapsulates the encapsulated file, to obtain an encoded file of the immersive media and the media file format data box of the immersive media, and then decodes and displays the encoded file of the immersive media.

[0074] In step S503, perform zoom processing on the ith zoom region of the immersive media according to the media file format data box in response to displaying the ith zoom region of the immersive media.

[0075] With reference to Table 6 above, the zoom processing procedure of step S503 may include the following (1)-(4):
  1. (1) The zoom policy includes a zoom flag field auto_zoom_flag. The content playback device performs zoom processing on the ith zoom region of the immersive media in the target zoom mode when a value of the zoom flag field is an effective value. The zoom processing may be requesting, from a server, and displaying a video corresponding to a size of the ith zoom region after the zoom processing is performed thereon.
  2. (2) The zoom policy includes a zoom step field zoom_steps. Zoom processing is performed on the ith zoom region of the immersive media m times in the target zoom mode when a value of the zoom step field is m, m being a positive integer. For example, the content playback device needs to perform zoom processing on the ith zoom region of the immersive media 3 times in the target zoom mode when a value of the zoom step field is 3.
  3. (3) When one zoom step corresponds to one zoom ratio field zoom_ratio, m zoom steps correspond to m zoom ratio fields. The jth zoom step in the m zoom steps corresponds to the jth zoom ratio field in the m zoom ratio fields zoom ratio, j being a positive integer and j≤m. When a value of the jth zoom ratio field is an ineffective value, a size of the ith zoom region is scaled in the target zoom mode to a size of the ith zoom region before the zoom processing is performed thereon. When the value of the jth zoom ratio field is an effective value, the jth zoom step of the zoom processing is performed on the ith zoom region of the immersive media in the target zoom mode according to the effective value, to make a ratio between the size of the ith zoom region of the immersive media after the jth zoom step is performed thereon and the size of the ith zoom region of the immersive media before the zoom processing is performed thereon reach the effective value.
  4. (4) When one zoom step corresponds to one zoom duration zoom_durationand one unit of measure of the duration zoom_duration_unit, m zoom steps correspond to m zoom durations and m units of measure of the durations. The jth zoom step corresponds to the jth zoom duration field and the jth zoom duration unit field, j being positive integer and j≤m. The jth zoom step of the zoom processing is performed on the ith zoom region of the immersive media in the target zoom mode. The duration of the jth zoom step is indicated by both the jth zoom duration field and the jth zoom duration unit field. It is to be understood that, within the zoom duration, the content playback device continuously performs zoom processing on an image in the ith zoom region of the immersive media until the end of the zoom duration. For example, when the immersive media displays a total of 20 frames of images within the zoom duration, the content playback device performs zoom processing on the ith zoom regions of the 20 frames of images and displays them.


[0076] In addition, the content playback device can obtain an MPD file of the immersive media before obtaining the encapsulated file of the immersive media. The MPD file includes zoom description signaling files at various resolutions. The content playback device obtains an encapsulated file corresponding to a current resolution on the side of the content playback device, and presents a zoom effect of the immersive media in the encapsulated file according to the implementation of the foregoing step (1) to step (4).

[0077] The solution of this embodiment of this disclosure is explained below in further detail using an example: It is assumed that both a user 1 and a user 2 have selected the director zoom mode; a basic resolution consumed by the user 1 is 4K, and the user 1 requests, from a server, a video file corresponding to a 4K resolution under a representation hierarchy of starting the director zoom mode; and a basic resolution consumed by the user 2 is 2K, and the user 2 requests, from the server, a video file corresponding to a 2K resolution under a representation hierarchy of starting the director zoom mode. The server receives the requests from the user 1 and the user 2, encapsulates the video files corresponding to the 2K resolution and the 4K resolution respectively, and pushes them to the user 1 and the user 2. An encapsulated file 1 of the immersive media received by user 1 includes:

auto_zoom_flag=1; zoom_steps=3;

step1: zoom_ratio=16; zoom_duration=3; zoom_duration _unit=60;

step2: zoom_ratio=0; zoom_duration=4; zoom_duration_unit=60;

step3: zoom_ratio=32; zoom_duration=3; zoom_duration_unit=60;



[0078] An encapsulated file 2 of the immersive media received by user 2 includes:

auto_zoom_flag=1; zoom_steps=3;

step1: zoom_ratio=12; zoom_duration=3; zoom_duration_unit=60;

step2: zoom_ratio=0; zoom_duration=4; zoom_duration_unit=60;

step3: zoom_ratio=24; zoom_duration=3; zoom_duration_unit=60;



[0079] In addition, the encapsulated file 1 of the immersive media and the encapsulated file 2 of the immersive media received by the user 1 and the user 2 may further include position information and size information of a zoom region i, and a condition for performing zoom processing. Assuming that the condition for performing zoom processing is to perform zoom processing on the zoom region i when a playback progress reaches the 10th minute, a content playback device 1 used by the user 1 enlarges the zoom region i to 2 times an original size thereof from the 10th minute to the 13th minute (00:10:00-00:13:00), restores the zoom region i to the original size from the 13th minute to the 17th minute (00:13:00-00:17:00), enlarges the zoom region i from the 17th minutes to 20th minutes (00:17:00-00:20:00) to 4 times the original size, and ends the zoom processing at the 20th minute (00:20:00). Similarly, a content playback device 2 used by the user 2 enlarges the zoom region i to 1.5 times an original size thereof from the 10th minute to the 13th minute (00:10:00-00:13:00), restores the zoom region i to the original size from the 13th minute to the 17th minute (00:13:00-00:17:00), enlarges the zoom region i from the 17th minutes to 20th minutes (00:17:00-00:20:00) to 3 times the original size, and ends the zoom processing at the 20th minute (00:20:00).

[0080] In the embodiments of this disclosure, the content playback device parses an encapsulated file of immersive media to obtain a media file format data box of the immersive media, and zoom processing is performed on the ith zoom region of the immersive media according to the media file format data box. In view of the above, in the target zoom mode, a content playback device does not need to request videos of all zoom resolution versions, thereby saving the transmission bandwidth. In addition, when the content playback device consumes a video file corresponding to a target zoom mode at a current resolution, the content playback device automatically presents, according to the target zoom mode, a zoom effect specified by an immersive media content producer, so that a user can obtain an improved viewing experience.

[0081] The method in the embodiments of this disclosure is described in detail above. For ease of better implementing the foregoing solutions in the embodiments of this disclosure, an apparatus in an embodiment of this disclosure is correspondingly provided in the following.

[0082] FIG. 6 is a schematic structural diagram of a data processing apparatus for immersive media according to an exemplary embodiment of this disclosure. The data processing apparatus for immersive media may be a computer program (including program code) run on a content production device. For example, the data processing apparatus for immersive media may be application software on a content production device. As shown in FIG. 6, the data processing apparatus for immersive media includes an obtaining unit 601 and a processing unit 602.

[0083] In an exemplary embodiment, the data processing apparatus for immersive media may be configured to perform the corresponding steps in the method shown in FIG. 3.

[0084] The obtaining unit 601 is configured to obtain a media file format data box of immersive media, the media file format data box including a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer.

[0085] The processing unit 602 is configured to perform zoom processing on the ith zoom region of the immersive media according to the media file format data box.

[0086] In an implementation, the media file format data box includes an international organization for standardization base media file format data box; and the target zoom mode includes a director zoom mode.

[0087] In an implementation, the zoom policy includes a zoom flag field.

[0088] When a value of the zoom flag field is an effective value, the zoom flag field is used for indicating that zoom processing needs to be performed on the ith zoom region of the immersive media in the target zoom mode.

[0089] In an implementation, the zoom policy includes a zoom step field, a value of the zoom step field being m, m being a positive integer. The zoom step field is used for indicating that a quantity of zoom steps included when the zoom processing is performed on the ith zoom region of the immersive media in the target zoom mode is m.

[0090] In an implementation, the zoom processing includes m zoom steps, m being a positive integer. The zoom policy includes m zoom ratio fields. The jth zoom step in the m zoom steps corresponds to the jth zoom ratio field in the m zoom ratio fields, j being a positive integer and j≤m.

[0091] The jth zoom ratio field is used for indicating a zoom ratio adopted when the jth zoom step in the zoom processing is performed on the ith zoom region of the immersive media. The zoom ratio is in unit of 2 to 3.

[0092] When a value of the jth zoom ratio field is an ineffective value, the jth zoom ratio field is used for indicating that a size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed in the target zoom mode thereon is the same as a size thereof before the zoom processing is performed thereon.

[0093] When the value of the jth zoom ratio field is an effective value, the jth zoom ratio field is used for indicating that a ratio between the size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed in the target zoom mode thereon and the size thereof before the zoom processing is performed thereon is the value of the jth zoom ratio field.

[0094] In an implementation, the zoom processing includes m zoom steps, m being a positive integer. The zoom policy includes m zoom duration fields and m zoom duration unit fields. The jth zoom step in the m zoom steps corresponds to the jth zoom duration field in the m zoom duration fields and the jth zoom duration unit field in the m zoom duration unit fields, j being positive integer and j≤m.

[0095] The jth zoom duration field is used for indicating a value of a duration when the jth zoom step of the zoom processing is performed on the ith zoom region of the immersive media, the zoom duration field being a non-zero value.

[0096] The jth zoom duration unit field is used for indicating a unit of measure of the duration when the jth zoom step of the zoom processing is performed on the ith zoom region of the immersive media, the unit of measure being in unit of second, and the zoom duration unit field being a non-zero value.

[0097] In an implementation, the obtaining unit 601 is further configured to:
obtain a zoom description signaling file of the immersive media, the zoom description signaling file including description information of the zoom policy; and

[0098] In an implementation, the zoom description signaling file includes at least one of the following: a sphere region zooming descriptor or a 2D region zooming descriptor.

[0099] The sphere region zooming descriptor is encapsulated in a representation hierarchy in a media presentation description file in the immersive media, and a quantity of the sphere region zooming descriptors in the representation hierarchy is less than or equal to 1.

[0100] The 2D region zooming descriptor is encapsulated in the representation hierarchy in the media presentation description file in the immersive media, and a quantity of the 2D region zooming descriptors in the representation hierarchy is less than or equal to 1.

[0101] In some embodiments, the data processing apparatus for immersive media may be configured to perform the corresponding steps in the method shown in FIG. 4.

[0102] The obtaining unit 601 is configured to obtain zoom information of immersive media.

[0103] The processing unit 602 is configured to configure a media file format data box of the immersive media according to the zoom information of the immersive media, the media file format data box including a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer; and add the media file format data box of the immersive media into an encapsulated file of the immersive media.

[0104] In an implementation, the zoom policy includes a zoom flag field. The processing unit 602 is further configured to configure a media file format data box of the immersive media according to the zoom information of the immersive media, for example:
set the zoom flag field to an effective value when the zoom information indicates that zoom processing needs to be performed on the ith zoom region of the immersive media in the target zoom mode.

[0105] In an implementation, the zoom policy includes a zoom step field. The processing unit 602 is further configured to configure a media file format data box of the immersive media according to the zoom information of the immersive media, for example:
set the zoom step field to m when the zoom information indicates that m zoom steps need to be performed when zoom processing is performed on the ith zoom region of the immersive media in the target zoom mode, m being a positive integer.

[0106] In an implementation, the zoom processing includes m zoom steps, m being a positive integer. The zoom policy includes m zoom ratio fields. The jth zoom step in the m zoom steps corresponds to the jth zoom ratio field in the m zoom ratio fields, j being a positive integer and j≤m. The processing unit 602 is further configured to configure a media file format data box of the immersive media according to the zoom information of the immersive media, for example:

set the jth zoom ratio field to an ineffective value when the zoom information indicates that a size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed thereon is the same as a size thereof before the zoom processing is performed thereon; and

set the jth zoom ratio field to an effective value when the zoom information indicates that the size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed thereon is different the size thereof before the zoom processing is performed thereon, the effective value being a ratio between the size of the ith zoom region after the jth zoom step of the zoom processing is performed thereon and the size thereof before the zoom processing is performed thereon.



[0107] In an implementation, the zoom processing includes m zoom steps, m being a positive integer. The zoom policy includes m zoom duration fields and m zoom duration unit fields. The jth zoom step in the m zoom steps corresponds to the jth zoom duration field in the m zoom duration fields and the jth zoom duration unit field in the m zoom duration unit fields, j being positive integer and j≤m. The processing unit 602 is further configured to configure a media file format data box of the immersive media according to the zoom information of the immersive media, for example:
set a value of a duration when the jth zoom step is performed on the ith zoom region as indicated in the zoom information as a value of the jth zoom duration field; and set a unit of measure of the duration when the jth zoom step is performed on the ith zoom region as indicated in the zoom information as a value of the jth zoom duration unit field.

[0108] In an implementation, the processing unit 602 is further configured to:

configure a zoom description signaling file of the immersive media according to the zoom information, the zoom description signaling file including description information of the zoom policy; and

encapsulate the zoom description signaling file into a representation hierarchy in the media presentation description file in the immersive media.



[0109] According to an embodiment of the present disclosure, the units of the data processing apparatus for immersive media shown in FIG. 6 may be separately or wholly combined into one or several other units, or one (or more) of the units herein may further be divided into multiple units of smaller functions. In this way, same operations can be implemented, and implementation of the technical effects of this embodiment of the present disclosure is not affected. The foregoing units are divided based on logical functions. In an actual application, a function of one unit may also be implemented by a plurality of units, or functions of a plurality of units are implemented by one unit. In other embodiments of this disclosure, the data processing apparatus for immersive media may also include other units. During actual application, the functions may also be cooperatively implemented by other units and may be cooperatively implemented by a plurality of units. According to another embodiment of this disclosure, a computer program (including program code) that can perform the steps in the corresponding method shown in FIG. 3 or FIG. 4 may be run on a general-purpose computing device, such as a computer, which includes processing elements and storage elements such as a central processing unit (CPU), a random access memory (RAM), and a read-only memory (ROM), to construct the data processing apparatus for immersive media shown in FIG. 3 or FIG. 4 and implement the data processing method for immersive media in the embodiments of this disclosure. The computer program may be recorded on, for example, a computer-readable recording medium, and may be loaded into the foregoing computing device by using the computer-readable recording medium and run on the computing device.

[0110] Based on the same concept, a principle and beneficial effects of resolving a problem by the data processing apparatus for immersive media provided in the embodiments of this disclosure are similar to a principle and beneficial effects of resolving a problem by the data processing method for immersive media in the embodiments of this disclosure. Reference may be made to the principle and beneficial effects of the implementation of the method. For brevity, details are not described herein again.

[0111] FIG. 7 is a schematic structural diagram of another data processing apparatus for immersive media according to an exemplary embodiment of this disclosure. The data processing apparatus for immersive media may be a computer program (including program code) run on a content playback device. For example, the data processing apparatus for immersive media may be application software on a content playback device. As shown in FIG. 7, the data processing apparatus for immersive media includes an obtaining unit 701 and a processing unit 702.

[0112] In an exemplary embodiment, the data processing apparatus for immersive media may be configured to perform the corresponding steps in the method shown in FIG. 3.

[0113] The obtaining unit 701 is configured to obtain a media file format data box of immersive media, the media file format data box including a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer.

[0114] The processing unit 702 is configured to perform zoom processing on the ith zoom region of the immersive media according to the media file format data box.

[0115] In an implementation, the media file format data box includes an international organization for standardization base media file format data box. The target zoom mode includes a director zoom mode.

[0116] In an implementation, the zoom policy includes a zoom flag field.

[0117] When a value of the zoom flag field is an effective value, the zoom flag field is used for indicating that zoom processing needs to be performed on the ith zoom region of the immersive media in the target zoom mode.

[0118] In an implementation, the zoom policy includes a zoom step field, a value of the zoom step field being m, m being a positive integer. The zoom step field is used for indicating that a quantity of zoom steps included when the zoom processing is performed on the ith zoom region of the immersive media in the target zoom mode is m.

[0119] In an implementation, the zoom processing includes m zoom steps, m being a positive integer. The zoom policy includes m zoom ratio fields. The jth zoom step in the m zoom steps corresponds to the jth zoom ratio field in the m zoom ratio fields, j being a positive integer and j≤m.

[0120] The jth zoom ratio field is used for indicating a zoom ratio adopted when the jth zoom step in the zoom processing is performed on the ith zoom region of the immersive media. The zoom ratio is in unit of 2 to 3.

[0121] When a value of the jth zoom ratio field is an ineffective value, the jth zoom ratio field is used for indicating that a size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed in the target zoom mode thereon is the same as a size thereof before the zoom processing is performed thereon.

[0122] When the value of the jth zoom ratio field is an effective value, the jth zoom ratio field is used for indicating that a ratio between the size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed in the target zoom mode thereon and the size thereof before the zoom processing is performed thereon is the value of the jth zoom ratio field.

[0123] In an implementation, the zoom processing includes m zoom steps, m being a positive integer. The zoom policy includes m zoom duration fields and m zoom duration unit fields. The jth zoom step in the m zoom steps corresponds to the jth zoom duration field in the m zoom duration fields and the jth zoom duration unit field in the m zoom duration unit fields, j being positive integer and j≤m.

[0124] The jth zoom duration field is used for indicating a value of a duration when the jth zoom step of the zoom processing is performed on the ith zoom region of the immersive media, the zoom duration field being a non-zero value.

[0125] The jth zoom duration unit field is used for indicating a unit of measure of the duration when the jth zoom step of the zoom processing is performed on the ith zoom region of the immersive media, the unit of measure being in unit of second, and the zoom duration unit field being a non-zero value.

[0126] In an implementation, the obtaining unit 701 is further configured to:
obtain a zoom description signaling file of the immersive media, the zoom description signaling file including description information of the zoom policy.

[0127] In an implementation, the zoom description signaling file includes at least one of the following: a sphere region zooming descriptor or a 2D region zooming descriptor.

[0128] The sphere region zooming descriptor is encapsulated in a representation hierarchy in a media presentation description file in the immersive media, and a quantity of the sphere region zooming descriptors in the representation hierarchy is less than or equal to 1.

[0129] The 2D region zooming descriptor is encapsulated in the representation hierarchy in the media presentation description file in the immersive media, and a quantity of the 2D region zooming descriptors in the representation hierarchy is less than or equal to 1.

[0130] In some embodiments, the data processing apparatus for immersive media may be configured to perform the corresponding steps in the method shown in FIG. 5.

[0131] The obtaining unit 701 is configured to obtain an encapsulated file of immersive media, the encapsulated file including a media file format data box of the immersive media, the media file format data box including a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer.

[0132] The processing unit 702 is configured to parse the encapsulated file, and displaying the parsed immersive media; and perform zoom processing on the ith zoom region of the immersive media according to the media file format data box in response to displaying the ith zoom region of the immersive media.

[0133] In an implementation, the zoom policy includes a zoom flag field. The processing unit 702 is further configured to perform zoom processing on the ith zoom region of the immersive media according to the media file format data box, for example:
perform zoom processing on the ith zoom region of the immersive media in the target zoom mode when a value of the zoom flag field is an effective value.

[0134] In an implementation, the zoom policy includes a zoom step field, a value of the zoom step field being m, m being a positive integer. The processing unit 702 is further configured to perform zoom processing on the ith zoom region of the immersive media according to the media file format data box, for example:
perform zoom processing on the ith zoom region of the immersive media in the target zoom mode m times.

[0135] In an implementation, the zoom processing includes m zoom steps, m being a positive integer. The zoom policy includes m zoom ratio fields. The jth zoom step in the m zoom steps corresponds to the jth zoom ratio field in the m zoom ratio fields, j being a positive integer and j≤m. The processing unit 702 is further configured to perform zoom processing on the ith zoom region of the immersive media according to the media file format data box, for example:

perform the jth zoom step of the zoom processing on the ith zoom region of the immersive media in the target zoom mode when a value of the jth zoom ratio field is an ineffective value, to make a size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed thereon the same as a size of the ith zoom region of the immersive media before the zoom processing is performed thereon; and

perform, when the value of the jth zoom ratio field is an effective value, the jth zoom step of the zoom processing on the ith zoom region of the immersive media in the target zoom mode according to the effective value, to make a ratio between the size of the ith zoom region of the immersive media after the jth zoom step is performed thereon and the size of the ith zoom region of the immersive media before the zoom processing is performed thereon reach the effective value.



[0136] In an implementation, the zoom processing includes m zoom steps, m being a positive integer. The zoom policy includes m zoom duration fields and m zoom duration unit fields. The jth zoom step in the m zoom steps corresponds to the jth zoom duration field in the m zoom duration fields and the jth zoom duration unit field in the m zoom duration unit fields, j being positive integer and j≤m. The processing unit 702 is further configured to perform zoom processing on the ith zoom region of the immersive media according to the media file format data box, for example:
perform the jth zoom step of the zoom processing on the ith zoom region of the immersive media in the target zoom mode according to a common indication of the jth zoom duration field and the jth zoom duration unit field.

[0137] In an implementation, the processing unit 702 is further configured to:
obtain a zoom description signaling file of the immersive media, the zoom description signaling file including description information of the zoom policy; and

[0138] The obtaining unit 701 is further configured to obtain an encapsulated file of immersive media, for example:
obtain an encapsulated file of the immersive media according to the zoom description signaling file.

[0139] According to an embodiment of the present disclosure, the units of the data processing apparatus for immersive media shown in FIG. 7 may be separately or wholly combined into one or several other units, or one (or more) of the units herein may further be divided into multiple units of smaller functions. In this way, same operations can be implemented, and implementation of the technical effects of this embodiment of the present disclosure is not affected. The foregoing units are divided based on logical functions. In an actual application, a function of one unit may also be implemented by a plurality of units, or functions of a plurality of units are implemented by one unit. In other embodiments of this disclosure, the data processing apparatus for immersive media may also include other units. During actual application, the functions may also be cooperatively implemented by other units and may be cooperatively implemented by a plurality of units. According to another embodiment of this disclosure, a computer program (including program code) that can perform the steps in the corresponding method shown in FIG. 3 or FIG. 5 may be run on a general-purpose computing device, such as a computer, which includes processing elements and storage elements such as a central processing unit (CPU), a random access memory (RAM), and a read-only memory (ROM), to construct the data processing apparatus for immersive media shown in FIG. 3 or FIG. 5 and implement the data processing method for immersive media in the embodiments of this disclosure. The computer program may be recorded on, for example, a computer-readable recording medium, and may be loaded into the foregoing computing device by using the computer-readable recording medium and run on the computing device.

[0140] Based on the same concept, a principle and beneficial effects of resolving a problem by the data processing apparatus for immersive media provided in the embodiments of this disclosure are similar to a principle and beneficial effects of resolving a problem by the data processing method for immersive media in the embodiments of this disclosure. Reference may be made to the principle and beneficial effects of the implementation of the method. For brevity, details are not described herein again.

[0141] FIG. 8 is a schematic structural diagram of a content production device according to an exemplary embodiment of this disclosure. The content production device may be a computer device used by a provider of immersive media. The computer device may be a terminal (such as a PC or an intelligent mobile device (such as a smartphone)) or a server. As shown in FIG. 8, the content production device includes a capturing device 801, a processor 802, a memory 803, and a transmitter 804.

[0142] The capturing device 801 is configured to acquire sound-visual scene of a real world to obtain original data of immersive media (including audio content and video content synchronized temporally and spatially). The capturing device 801 may include, but is not limited to, an audio device, a camera device, and a sensing device. The audio device may include an audio sensor, a microphone, or the like. The camera device may include an ordinary camera, a stereo camera, a light field camera, or the like. The sensing device may include a laser device, a radar device, or the like.

[0143] The processor 802 (or referred to as a central processing unit (CPU)) can include a processing core of a content production device. The processor 802 is adapted to implement one or more instructions, and is adapted to load and execute the one or more instructions to implement the data processing method for immersive media shown in FIG. 3 or FIG. 4.

[0144] The memory 803 is a memory device in the content production device, and is configured to store a program and a media resource. It may be understood that the memory 803 herein may include an internal storage medium in the content production device and certainly may also include an extended storage medium supported by the content production device. The memory may be a high-speed RAM memory, or may be a non-volatile memory such as at least one magnetic disk storage, and optionally, may be at least one memory far away from the foregoing processor. The memory 803 provides a storage space. The storage space is configured to store an operating system of the content production device. In addition, the storage space is further configured to store a computer program. The computer program includes program instructions. In addition, the program instructions are adapted to be invoked and executed by the processor 802, to perform the steps of the data processing method for immersive media. In addition, the memory 803 may be further configured to store an immersive media file formed after processing performed by the processor. The immersive media file includes a media file resource and media presentation description information.

[0145] The transmitter 804 is configured to implement transmission and interaction between the content production device and another device, for example, implement transmission of immersive media between the content production device and a content playback device. That is, the content production device transmits a media resource related to the immersive media to the content playback device through the transmitter 804.

[0146] Referring to FIG. 8 again, the processor 802 may include a converter 821, an encoder 822, and an encapsulator 823.

[0147] The converter 821 is configured to perform a series of conversion on captured video content, so that the video content becomes content adapted to be video-encoded for immersive media. The conversion may include: concatenation and projection. In some embodiments, the conversion further includes region encapsulation. The converter 821 may converted captured 3D video content into a 2D image and provide the 2D image to the encoder for video encoding.

[0148] The encoder 822 is configured to perform audio encoding on captured audio content to form an audio bitstream of the immersive media, and is further configured to perform video encoding on the 2D image obtained by the converter 821 through conversion, to obtain a video bitstream.

[0149] The encapsulator 823 is configured to encapsulate the audio bitstream and the video bitstream according to a file format of the immersive media (such as ISOBMFF) into a file container to form a media file resource of the immersive media. The media file resource may be a media file or a media segment that forms a media file of the immersive media; and record, according to requirements of the file format of the immersive media, metadata of the media file resource of the immersive media using media presentation description information. The encapsulated file of the immersive media obtained by the encapsulator through processing is stored on the memory, and provided to the content playback device according to requirements for presentation of the immersive media.

[0150] In an exemplary embodiment, the processor 802 (e.g., devices included in the processor 802) performs the steps of the data processing method for immersive media shown in FIG. 3 by invoking one or more instructions on the memory 803. In some embodiments, the memory 803 stores one or more first instructions. The one or more first instructions are adapted to be loaded by the processor 802 to perform the following steps:

obtaining a media file format data box of immersive media, the media file format data box including a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer; and

performing zoom processing on the ith zoom region of the immersive media according to the media file format data box.



[0151] In an implementation, the media file format data box includes an international organization for standardization base media file format data box. The target zoom mode includes a director zoom mode.

[0152] In an implementation, the zoom policy includes a zoom flag field, and when a value of the zoom flag field is an effective value, the zoom flag field is used for indicating that zoom processing needs to be performed on the ith zoom region of the immersive media in the target zoom mode.

[0153] In an implementation, the zoom policy includes a zoom step field, a value of the zoom step field being m, m being a positive integer. The zoom step field is used for indicating that a quantity of zoom steps included when the zoom processing is performed on the ith zoom region of the immersive media in the target zoom mode is m.

[0154] In an implementation, the zoom processing includes m zoom steps, m being a positive integer. The zoom policy includes m zoom ratio fields. The jth zoom step in the m zoom steps corresponds to the jth zoom ratio field in the m zoom ratio fields, j being a positive integer and j≤m.

[0155] The jth zoom ratio field is used for indicating a zoom ratio adopted when the jth zoom step in the zoom processing is performed on the ith zoom region of the immersive media. The zoom ratio is in unit of 2 to 3.

[0156] When a value of the jth zoom ratio field is an ineffective value, the jth zoom ratio field is used for indicating that a size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed in the target zoom mode thereon is the same as a size thereof before the zoom processing is performed thereon.

[0157] When the value of the jth zoom ratio field is an effective value, the jth zoom ratio field is used for indicating that a ratio between the size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed in the target zoom mode thereon and the size thereof before the zoom processing is performed thereon is the value of the jth zoom ratio field.

[0158] In an implementation, the zoom processing includes m zoom steps, m being a positive integer. The zoom policy includes m zoom duration fields and m zoom duration unit fields. The jth zoom step in the m zoom steps corresponds to the jth zoom duration field in the m zoom duration fields and the jth zoom duration unit field in the m zoom duration unit fields, j being positive integer and j≤m.

[0159] The jth zoom duration field is used for indicating a value of a duration when the jth zoom step of the zoom processing is performed on the ith zoom region of the immersive media, the zoom duration field being a non-zero value.

[0160] The jth zoom duration unit field is used for indicating a unit of measure of the duration when the jth zoom step of the zoom processing is performed on the ith zoom region of the immersive media, the unit of measure being in unit of second, and the zoom duration unit field being a non-zero value.

[0161] In an implementation, the computer program on the memory 803 is loaded by the processor 802 to further perform the following step:
obtaining a zoom description signaling file of the immersive media, the zoom description signaling file including description information of the zoom policy.

[0162] In an implementation, the zoom description signaling file includes at least one of the following: a sphere region zooming descriptor or a 2D region zooming descriptor.

[0163] The sphere region zooming descriptor is encapsulated in a representation hierarchy in a media presentation description file in the immersive media, and a quantity of the sphere region zooming descriptors in the representation hierarchy is less than or equal to 1.

[0164] The 2D region zooming descriptor is encapsulated in the representation hierarchy in the media presentation description file in the immersive media, and a quantity of the 2D region zooming descriptors in the representation hierarchy is less than or equal to 1.

[0165] In some embodiments, the processor (that is, devices included in the processor 802) performs the steps of the data processing method for immersive media shown in FIG. 4 by invoking one or more instructions on the memory 803. In some embodiments, the memory 803 stores one or more second instructions. The one or more second instructions are adapted to be loaded by the processor 802 to perform the following steps:

obtaining zoom information of immersive media;

configuring a media file format data box of the immersive media according to the zoom information of the immersive media, the media file format data box including a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer; and

adding the media file format data box of the immersive media into an encapsulated file of the immersive media.



[0166] In an implementation, the zoom policy includes a zoom flag field. When the one or more second instructions are adapted to be loaded by the processor 802 to configure a media file format data box of the immersive media according to the zoom information of the immersive media, the following step is performed:
setting the zoom flag field to an effective value when the zoom information indicates that zoom processing needs to be performed on the ith zoom region of the immersive media in the target zoom mode.

[0167] In an implementation, the zoom policy includes a zoom step field. When the one or more second instructions are adapted to be loaded by the processor 802 to configure a media file format data box of the immersive media according to the zoom information of the immersive media, the following step is performed:
setting the zoom step field to m when the zoom information indicates that m zoom steps need to be performed when zoom processing is performed on the ith zoom region of the immersive media in the target zoom mode, m being a positive integer.

[0168] In an implementation, the zoom processing includes m zoom steps, m being a positive integer. The zoom policy includes m zoom ratio fields. The jth zoom step in the m zoom steps corresponds to the jth zoom ratio field in the m zoom ratio fields, j being a positive integer and j≤m. When the one or more second instructions are adapted to be loaded by the processor 802 to configure a media file format data box of the immersive media according to the zoom information of the immersive media, the following step is performed:

setting the jth zoom ratio field to an ineffective value when the zoom information indicates that a size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed thereon is the same as a size thereof before the zoom processing is performed thereon; and

setting the jth zoom ratio field to an effective value when the zoom information indicates that the size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed thereon is different the size thereof before the zoom processing is performed thereon, the effective value being a ratio between the size of the ith zoom region after the jth zoom step of the zoom processing is performed thereon and the size thereof before the zoom processing is performed thereon.



[0169] In an implementation, the zoom processing includes m zoom steps, m being a positive integer. The zoom policy includes m zoom duration fields and m zoom duration unit fields. The jth zoom step in the m zoom steps corresponds to the jth zoom duration field in the m zoom duration fields and the jth zoom duration unit field in the m zoom duration unit fields, j being positive integer and j≤m. When the one or more second instructions are adapted to be loaded by the processor 802 to configure a media file format data box of the immersive media according to the zoom information of the immersive media, the following steps are performed:
setting a value of a duration when the jth zoom step is performed on the ith zoom region as indicated in the zoom information as a value of the jth zoom duration field; and setting a unit of measure of the duration when the jth zoom step is performed on the ith zoom region as indicated in the zoom information as a value of the jth zoom duration unit field.

[0170] In an implementation, the computer program on the memory 803 is loaded by the processor 802 to further perform the following steps:

configuring a zoom description signaling file of the immersive media according to the zoom information, the zoom description signaling file including description information of the zoom policy; and

encapsulating the zoom description signaling file into a representation hierarchy in the media presentation description file in the immersive media.



[0171] Based on the same inventive concept, a principle and beneficial effects of resolving a problem by the data processing device for immersive media provided in the embodiments of this disclosure are similar to a principle and beneficial effects of resolving a problem by the data processing method for immersive media in the embodiments of this disclosure. Reference may be made to the principle and beneficial effects of the implementation of the method. For brevity, details are not described herein again.

[0172] FIG. 9 is a schematic structural diagram of a content playback device according to an embodiment of this disclosure. The content playback device may be a computer device used by a user of immersive media. The computer device may be a terminal (such as a PC, an intelligent mobile device (such as a smartphone), or a VR device (such as a VR helmet or VR glasses)). As shown in FIG. 9, the content playback device includes a receiver 901, a processor 902, a memory 903, and a display/playback apparatus 904.

[0173] The receiver 901 is configured to implement transmission and interaction with another device, for example, implement transmission of immersive media between a content production device and the content playback device. That is, the content playback device receives, through the receiver 901, a media resource related to the immersive media transmitted by the content production device.

[0174] The processor 902 (or referred to as a central processing unit (CPU)) can include a processing core of a content production device. The processor 902 is adapted to implement one or more instructions, and is adapted to load and execute the one or more instructions to implement the data processing method for immersive media shown in FIG. 3 or FIG. 5.

[0175] The memory 903 is a memory device in the content playback device, and is configured to store a program and a media resource. It may be understood that the memory 903 herein may include an internal storage medium in the content playback device and certainly may also include an extended storage medium supported by the content playback device. The memory 903 may be a high-speed RAM memory, or may be a non-volatile memory such as at least one magnetic disk storage, and optionally, may be at least one memory far away from the foregoing processor. The memory 903 provides a storage space. The storage space is configured to store an operating system of the content playback device. In addition, the storage space is further configured to store a computer program. The computer program includes program instructions. In addition, the program instructions are adapted to be invoked and executed by the processor 802, to perform the steps of the data processing method for immersive media. In addition, the memory 903 may be further configured to store an 3D image of the immersive media after processing performed by the processor 902, audio content corresponding to the 3D image, information required for rendering the 3D image and the audio content, and the like.

[0176] The display/playback apparatus 904 is configured to output a sound and a 3D image obtained through rendering.

[0177] Referring to FIG. 9 again, the processor 902 may include a parser 921, a decoder 922, a converter 923, and a renderer 924.

[0178] The parser 921 is configured to perform file decapsulation on an encapsulated file of the immersive media from the content production device, for example, decapsulate a media file resource according to requirements of a file format of the immersive media, to obtain an audio bitstream and a video bitstream; and provide the audio bitstream and the video bitstream to the decoder 922.

[0179] The decoder 922 performs audio decoding on the audio bitstream to obtain audio content and provides the audio content to the renderer for audio rendering. In addition, the decoder 922 decodes the video bitstream to obtain a 2D image. According to metadata provided in media presentation description information, when the metadata indicates that a region encapsulation procedure has been performed on the immersive media, the 2D image refers to an encapsulated image, and when the metadata indicates that no region encapsulation procedure has been performed on the immersive media, the 2D image refers to a projected image.

[0180] The converter 923 is configured to convert the 2D image into a 3D image. When a region encapsulation procedure has been performed on the immersive media, the converter 923 first performs region decapsulation on the encapsulated image to obtain a projected image, and then, performs reconstruction on the projected image to obtain a 3D image. When no region encapsulation procedure has been performed on the immersive media, the converter 923 directly performs reconstruction on the projected image to obtain a 3D image.

[0181] The renderer 924 is configured to render the audio content and the 3D image the immersive media. The audio content and the 3D image are rendered according to metadata related to rendering and a viewport in the media presentation description information, and are outputted by the display/playback apparatus after the rendering is completed.

[0182] In an exemplary embodiment, the processor 902 (e.g., devices included in the processor 902) performs the steps of the data processing method for immersive media shown in FIG. 3 by invoking one or more instructions on the memory 903. In some embodiments, the memory 903 stores one or more first instructions. The one or more first instructions are adapted to be loaded by the processor 902 to perform the following steps:

obtaining a media file format data box of immersive media, the media file format data box including a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer; and

performing zoom processing on the ith zoom region of the immersive media according to the media file format data box.



[0183] |In an implementation, the media file format data box includes an international organization for standardization base media file format data box. The target zoom mode includes a director zoom mode.

[0184] In an implementation, the zoom policy includes a zoom flag field, and when a value of the zoom flag field is an effective value, the zoom flag field is used for indicating that zoom processing needs to be performed on the ith zoom region of the immersive media in the target zoom mode.

[0185] In an implementation, the zoom policy includes a zoom step field, a value of the zoom step field being m, m being a positive integer. The zoom step field is used for indicating that a quantity of zoom steps included when the zoom processing is performed on the ith zoom region of the immersive media in the target zoom mode is m.

[0186] In an implementation, the zoom processing includes m zoom steps, m being a positive integer. The zoom policy includes m zoom ratio fields. The jth zoom step in the m zoom steps corresponds to the jth zoom ratio field in the m zoom ratio fields, j being a positive integer and j≤m.

[0187] The jth zoom ratio field is used for indicating a zoom ratio adopted when the jth zoom step in the zoom processing is performed on the ith zoom region of the immersive media. The zoom ratio is in unit of 2 to 3.

[0188] When a value of the jth zoom ratio field is an ineffective value, the jth zoom ratio field is used for indicating that a size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed in the target zoom mode thereon is the same as a size thereof before the zoom processing is performed thereon.

[0189] When the value of the jth zoom ratio field is an effective value, the jth zoom ratio field is used for indicating that a ratio between the size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed in the target zoom mode thereon and the size thereof before the zoom processing is performed thereon is the value of the jth zoom ratio field.

[0190] In an implementation, the zoom processing includes m zoom steps, m being a positive integer. The zoom policy includes m zoom duration fields and m zoom duration unit fields. The jth zoom step in the m zoom steps corresponds to the jth zoom duration field in the m zoom duration fields and the jth zoom duration unit field in the m zoom duration unit fields, j being positive integer and j≤m.

[0191] The jth zoom duration field is used for indicating a value of a duration when the jth zoom step of the zoom processing is performed on the ith zoom region of the immersive media, the zoom duration field being a non-zero value.

[0192] The jth zoom duration unit field is used for indicating a unit of measure of the duration when the jth zoom step of the zoom processing is performed on the ith zoom region of the immersive media, the unit of measure being in unit of second, and the zoom duration unit field being a non-zero value.

[0193] In an implementation, the computer program on the memory 903 is loaded by the processor 902 to further perform:
obtaining a zoom description signaling file of the immersive media, the zoom description signaling file including description information of the zoom policy.

[0194] In an implementation, the zoom description signaling file includes at least one of the following: a sphere region zooming descriptor or a 2D region zooming descriptor.

[0195] The sphere region zooming descriptor is encapsulated in a representation hierarchy in a media presentation description file in the immersive media, and a quantity of the sphere region zooming descriptors in the representation hierarchy is less than or equal to 1.

[0196] The 2D region zooming descriptor is encapsulated in the representation hierarchy in the media presentation description file in the immersive media, and a quantity of the 2D region zooming descriptors in the representation hierarchy is less than or equal to 1.

[0197] In some embodiments, the processor 902 (e.g., devices included in the processor 902) performs the steps of the data processing method for immersive media shown in FIG. 5 by invoking one or more instructions on the memory 903. In some embodiments, the memory 903 stores one or more second instructions. The one or more second instructions are adapted to be loaded by the processor 902 to perform the following steps:

obtaining an encapsulated file of immersive media, the encapsulated file including a media file format data box of the immersive media, the media file format data box including a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer; and

parsing the encapsulated file, and displaying the parsed immersive media; and

performing zoom processing on the ith zoom region of the immersive media according to the media file format data box in response to displaying the ith zoom region of the immersive media.



[0198] In an implementation, the zoom policy includes a zoom flag field. In a case the one or more second instructions are adapted to be loaded by the processor 902 to perform zoom processing on the ith zoom region of the immersive media according to the media file format data box when a value of the zoom flag field is an effective value, the following step is performed:
performing zoom processing on the ith zoom region of the immersive media in the target zoom mode m times.

[0199] In an implementation, the zoom policy includes a zoom step field, a value of the zoom step field being m, m being a positive integer. When the one or more second instructions are adapted to be loaded by the processor 902 to perform zoom processing on the ith zoom region of the immersive media according to the media file format data box, the following step is performed:
performing zoom processing on the ith zoom region of the immersive media in the target zoom mode m times.

[0200] In an implementation, the zoom processing includes m zoom steps, m being a positive integer. The zoom policy includes m zoom ratio fields. The jth zoom step in the m zoom steps corresponds to the jth zoom ratio field in the m zoom ratio fields, j being a positive integer and j≤m. When the one or more second instructions are adapted to be loaded by the processor 902 to perform zoom processing on the ith zoom region of the immersive media according to the media file format data box, the following steps are performed:

performing the jth zoom step of the zoom processing on the ith zoom region of the immersive media in the target zoom mode when a value of the jth zoom ratio field is an ineffective value, to make a size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed thereon the same as a size of the ith zoom region of the immersive media before the zoom processing is performed thereon; and

performing, when the value of the jth zoom ratio field is an effective value, the jth zoom step of the zoom processing on the ith zoom region of the immersive media in the target zoom mode according to the effective value, to make a ratio between the size of the ith zoom region of the immersive media after the jth zoom step is performed thereon and the size of the ith zoom region of the immersive media before the zoom processing is performed thereon reach the effective value.



[0201] In an implementation, the zoom processing includes m zoom steps, m being a positive integer. The zoom policy includes m zoom duration fields and m zoom duration unit fields. The jth zoom step in the m zoom steps corresponds to the jth zoom duration field in the m zoom duration fields and the jth zoom duration unit field in the m zoom duration unit fields, j being positive integer and j≤m. When the one or more second instructions are adapted to be loaded by the processor 902 to perform zoom processing on the ith zoom region of the immersive media according to the media file format data box, the following step is performed:
performing the jth zoom step of the zoom processing on the ith zoom region of the immersive media in the target zoom mode according to a common indication of the jth zoom duration field and the jth zoom duration unit field.

[0202] In an implementation, the computer program on the memory 903 is loaded by the processor 902 to further perform:
obtaining a zoom description signaling file of the immersive media, the zoom description signaling file including description information of the zoom policy.

[0203] When the processor 902 obtains an encapsulated file of the immersive media through the receiver 901, the following step is performed:
obtaining an encapsulated file of the immersive media according to the zoom description signaling file.

[0204] Based on the same inventive concept, a principle and beneficial effects of resolving a problem by the data processing device for immersive media provided in the embodiments of this disclosure are similar to a principle and beneficial effects of resolving a problem by the data processing method for immersive media in the embodiments of this disclosure. Reference may be made to the principle and beneficial effects of the implementation of the method. For brevity, details are not described herein again.

[0205] What is disclosed above are merely exemplary embodiments of this disclosure, and which are not intended to limit the scope of the claims of this disclosure. Therefore, equivalent variations made in accordance with the claims of this disclosure shall fall within the scope of this disclosure.


Claims

1. A data processing method for immersive media, applicable to a computer device, the method comprising:

obtaining a media file format data box of immersive media, the media file format data box comprising a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer; and

performing zoom processing on the ith zoom region of the immersive media according to the media file format data box.


 
2. The method according to claim 1, wherein the media file format data box comprises an international organization for standardization base media file format data box; and the target zoom mode comprises a director zoom mode.
 
3. The method according to claim 1 or 2, wherein the zoom policy comprises a zoom flag field, and when the zoom flag field is an effective value, the zoom flag field indicates that zoom processing needs to be performed on the ith zoom region of the immersive media in the target zoom mode.
 
4. The method according to claim 1 or 2, wherein the zoom policy comprises a zoom step field, a value of the zoom step field being m, m being a positive integer; and the zoom step field is used for indicating that a quantity of zoom steps comprised when the zoom processing is performed on the ith zoom region of the immersive media in the target zoom mode is m.
 
5. The method according to claim 1 or 2, wherein the zoom processing comprises m zoom steps, m being a positive integer; the zoom policy comprises m zoom ratio fields; the jth zoom step in the m zoom steps corresponds to the jth zoom ratio field in the m zoom ratio fields, j being a positive integer, and j being less than or equal to m;

the jth zoom ratio field is used for indicating a zoom ratio adopted when the jth zoom step in the zoom processing is performed on the ith zoom region of the immersive media; the zoom ratio is in unit of 2 to 3;

when a value of the jth zoom ratio field is an ineffective value, the jth zoom ratio field is used for indicating that a size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed in the target zoom mode thereon is the same as a size thereof before the zoom processing is performed thereon; and

when the value of the jth zoom ratio field is an effective value, the jth zoom ratio field is used for indicating that a ratio between the size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed in the target zoom mode thereon and the size thereof before the zoom processing is performed thereon is the value of the jth zoom ratio field.


 
6. The method according to claim 1 or 2, wherein the zoom processing comprises m zoom steps, m being a positive integer; the zoom policy comprises m zoom duration fields and m zoom duration unit fields; the jth zoom step in the m zoom steps corresponds to the jth zoom duration field in the m zoom duration fields, j being positive integer and j≤m;

the jth zoom duration field is used for indicating a value of a duration when the jth zoom step of the zoom processing is performed on the ith zoom region of the immersive media, the zoom duration field being a non-zero value; and

the jth zoom duration unit field is used for indicating a unit of measure of the duration when the jth zoom step of the zoom processing is performed on the ith zoom region of the immersive media, the unit of measure being in unit of second, and the zoom duration unit field being a non-zero value.


 
7. The method according to claim 1, wherein the method further comprises:

obtaining a zoom description signaling file of the immersive media, the zoom description signaling file comprising description information of the zoom policy; and

correspondingly, the performing zoom processing on the ith zoom region of the immersive media in an encapsulated file according to the media file format data box comprises:
performing zoom processing on the ith zoom region of the immersive media in the encapsulated file according to the zoom policy comprised in the media file format data box and the description information of the zoom policy comprised in the zoom description signaling file.


 
8. The method according to claim 7, wherein the zoom description signaling file comprises at least one of the following: a sphere region zooming descriptor or a two-dimensional (2D) region zooming descriptor;

the sphere region zooming descriptor is encapsulated in a representation hierarchy in a media presentation description file in the immersive media, and a quantity of the sphere region zooming descriptors in the representation hierarchy is less than or equal to 1; and

the 2D region zooming descriptor is encapsulated in the representation hierarchy in the media presentation description file in the immersive media, and a quantity of the 2D region zooming descriptors in the representation hierarchy is less than or equal to 1.


 
9. A data processing method for immersive media, applicable to a content production device, the method comprising:

obtaining zoom information of immersive media;

configuring a media file format data box of the immersive media according to the zoom information of the immersive media, the media file format data box comprising a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer; and

adding the media file format data box of the immersive media into an encapsulated file of the immersive media.


 
10. The method according to claim 9, wherein the zoom policy comprises a zoom flag field; and
the configuring a media file format data box of the immersive media according to the zoom information of the immersive media comprises:
setting the zoom flag field to an effective value when the zoom information indicates that zoom processing needs to be performed on the ith zoom region of the immersive media in the target zoom mode.
 
11. The method according to claim 9, wherein the zoom policy comprises a zoom step field; and
the configuring a media file format data box of the immersive media according to the zoom information of the immersive media comprises:
setting the zoom step field to m when the zoom information indicates that m zoom steps need to be performed when zoom processing is performed on the ith zoom region of the immersive media in the target zoom mode, m being a positive integer.
 
12. The method according to claim 9, wherein the zoom processing comprises m zoom steps, m being a positive integer; the zoom policy comprises m zoom ratio fields; the jth zoom step in the m zoom steps corresponds to the jth zoom ratio field in the m zoom ratio fields, j being a positive integer and j≤m; and
the configuring a media file format data box of the immersive media according to the zoom information of the immersive media comprises:

setting the jth zoom ratio field to an ineffective value in a case that the zoom information indicates that a size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed thereon is the same as a size thereof before the zoom processing is performed thereon; and

setting the jth zoom ratio field to an effective value in a case that the zoom information indicates that the size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed thereon is different the size thereof before the zoom processing is performed thereon, the effective value being a ratio between the size of the ith zoom region after the jth zoom step of the zoom processing is performed thereon and the size thereof before the zoom processing is performed thereon.


 
13. The method according to claim 9, wherein the zoom processing comprises m zoom steps, m being a positive integer; the zoom policy comprises m zoom duration fields and m zoom duration unit fields; the jth zoom step in the m zoom steps corresponds to the jth zoom duration field in the m zoom duration fields, j being positive integer and j≤m; and
the configuring a media file format data box of the immersive media according to the zoom information of the immersive media comprises:
setting a value of a duration in a case that the jth zoom step is performed on the ith zoom region as indicated in the zoom information as a value of the jth zoom duration field; and setting a unit of measure of the duration in a case that the jth zoom step is performed on the ith zoom region as indicated in the zoom information as a value of the jth zoom duration unit field.
 
14. The method according to claim 9, wherein the method further comprises:

configuring a zoom description signaling file of the immersive media according to the zoom information, the zoom description signaling file comprising description information of the zoom policy; and

encapsulating the zoom description signaling file into a representation hierarchy in the media presentation description file in the immersive media.


 
15. A data processing method for immersive media, applicable to a content playback device, the method comprising:

obtaining an encapsulated file of immersive media, the encapsulated file comprising a media file format data box of the immersive media, the media file format data box comprising a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer; and

parsing the encapsulated file, and displaying the parsed immersive media; and

performing zoom processing on the ith zoom region of the immersive media according to the media file format data box in response to displaying the ith zoom region of the immersive media.


 
16. The method according to claim 15, wherein the zoom policy comprises a zoom flag field; and the performing zoom processing on the ith zoom region of the immersive media according to the media file format data box comprises:
performing zoom processing on the ith zoom region of the immersive media in the target zoom mode when a value of the zoom flag field is an effective value.
 
17. The method according to claim 15, wherein the zoom policy comprises a zoom step field, a value of the zoom step field being m, m being a positive integer; and
the performing zoom processing on the ith zoom region of the immersive media according to the media file format data box comprises:
performing zoom processing on the ith zoom region of the immersive media in the target zoom mode m times.
 
18. The method according to claim 15, wherein the zoom processing comprises m zoom steps, m being a positive integer; the zoom policy comprises m zoom ratio fields; the jth zoom step in the m zoom steps corresponds to the jth zoom ratio field in the m zoom ratio fields, j being a positive integer and j≤m; and
the performing zoom processing on the ith zoom region of the immersive media according to the media file format data box comprises:

performing the jth zoom step of the zoom processing on the ith zoom region of the immersive media in the target zoom mode in a case that a value of the jth zoom ratio field is an ineffective value, to make a size of the ith zoom region of the immersive media after the jth zoom step of the zoom processing is performed thereon the same as a size of the ith zoom region of the immersive media before the zoom processing is performed thereon; and

performing, in a case that the value of the jth zoom ratio field is an effective value, the jth zoom step of the zoom processing on the ith zoom region of the immersive media in the target zoom mode according to the effective value, to make a ratio between the size of the ith zoom region of the immersive media after the jth zoom step is performed thereon and the size of the ith zoom region of the immersive media before the zoom processing is performed thereon reach the effective value.


 
19. The method according to claim 15, wherein the zoom processing comprises m zoom steps, m being a positive integer; the zoom policy comprises m zoom duration fields and m zoom duration unit fields; the jth zoom step in the m zoom steps corresponds to the jth zoom duration field in the m zoom duration fields and the jth zoom duration unit field in the m zoom duration unit fields, j being positive integer and j≤m;and
the performing zoom processing on the ith zoom region of the immersive media according to the media file format data box comprises:
performing the jth zoom step of the zoom processing on the ith zoom region of the immersive media in the target zoom mode according to a common indication of the jth zoom duration field and the jth zoom duration unit field.
 
20. The method according to claim 15, wherein the method further comprises:

obtaining a zoom description signaling file of the immersive media, the zoom description signaling file comprising description information of the zoom policy; and

the obtaining an encapsulated file of immersive media comprises:
obtaining an encapsulated file of the immersive media according to the zoom description signaling file.


 
21. A data processing apparatus for immersive media, comprising:

an obtaining unit, configured to obtain a media file format data box of immersive media, the media file format data box comprising a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer; and

a processing unit, configured to perform zoom processing on the ith zoom region of the immersive media according to the media file format data box.


 
22. A data processing apparatus for immersive media, comprising:

an obtaining unit, configured to obtain zoom information of immersive media; and

a processing unit, configured to configure a media file format data box of the immersive media according to the zoom information of the immersive media, the media file format data box comprising a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer; and add the media file format data box of the immersive media into an encapsulated file of the immersive media.


 
23. A data processing apparatus for immersive media, comprising:

an obtaining unit, configured to obtain an encapsulated file of immersive media, the encapsulated file comprising a media file format data box of the immersive media, the media file format data box comprising a zoom policy of the ith zoom region of the immersive media in a target zoom mode, i being a positive integer; and

a processing unit, configured to parse the encapsulated file, and displaying the parsed immersive media; and perform zoom processing on the ith zoom region of the immersive media according to the media file format data box in response to displaying the ith zoom region of the immersive media.


 
24. A data processing device for immersive media, comprising: one or more processors and one or more memories,
the one or more memories storing at least one segment of program code, the at least one segment of program code being loaded and executed by the one or more processors, to implement the data processing method for immersive media according to any one of claims 1 to 20.
 
25. A computer-readable storage medium, storing at least one segment of program code, the at least one segment of program code being loaded and executed by a processor, to implement the data processing method for immersive media according to any one of claims 1 to 20.
 




Drawing




























Search report










Cited references

REFERENCES CITED IN THE DESCRIPTION



This list of references cited by the applicant is for the reader's convenience only. It does not form part of the European patent document. Even though great care has been taken in compiling the references, errors or omissions cannot be excluded and the EPO disclaims all liability in this regard.

Patent documents cited in the description