METHOD AND APPARATUS FOR TRANSCODING VIDEO, DEVICE, AND MEDIUM

(19)

(11)

EP 3 934 253 A1

(12)	EUROPEAN PATENT APPLICATION

(43)	Date of publication:
	05.01.2022 Bulletin 2022/01

(21)	Application number: 20215139.5

(22)	Date of filing: 17.12.2020

(51)

International Patent Classification (IPC):

H04N 19/40^(2014.01)

(52)	Cooperative Patent Classification (CPC):
	H04N 21/23418; H04N 21/8456; H04N 21/234309

(84)	Designated Contracting States:
	AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR
	Designated Extension States:
	BA ME
	Designated Validation States:
	KH MA MD TN

(30)

Priority:

29.06.2020 CN 202010611335

(71)	Applicant: Beijing Baidu Netcom Science and Technology Co., Ltd.
	Beijing 100085 (CN)

(72)	Inventors:
	XING, Huaifei Beijing, 100085 (CN) CHANG, Yanlong Beijing, 100085 (CN)

(74)	Representative: advotec.
	Patent- und Rechtsanwaltspartnerschaft Tappe mbB Widenmayerstraße 4 80538 München 80538 München (DE)

(56)

References cited: :

(54)	METHOD AND APPARATUS FOR TRANSCODING VIDEO, DEVICE, AND MEDIUM

(57) Embodiments of the present disclosure provide a method and an apparatus for transcoding a video, a related electronic device, and a related storage medium. An input attribute of a video and a target attribute are obtained (S101). A segment transcoding speed of the video is determined (S102) based on the input attribute and the target attribute. The segment transcoding speed indicates a transcoding speed of a video segment obtained by segmenting the video. The number of video segments of the video is determined (S103) based on a preset target transcoding speed of the video and the segment transcoding speed. The video is segmented (S104) based on a video length of the video and the number of video segments to obtain the video segments. The video segments are transcoded (S105) based on the segment transcoding speed.

Description

FIELD

[0001] Embodiments of the present disclosure relate to a field of computer technologies and particularly to a field of cloud computing, deep learning and video processing technologies, and especially relate to a method and an apparatus for transcoding a video, a device, and a medium.

BACKGROUND

[0002] Video transcoding is widely used in a field, such as broadcast television, Internet video, and live video, which provides a user with a live or VOD (video on demand) service. The video transcoding mainly refers to conversion the video from a video coding format to another video coding format (also referred as a target coding format). A bandwidth during transmitting a video may be reduced after the video transcoding is performed, thereby reducing a distribution cost.

[0003] Presently, in order to improve a transcoding speed, an existing video transcoding solution adopts a strategy for segmenting a video into a preset number of video segments. For example, all videos are segmented into the same number of video segments and each video segment is transcoded. However, due to differences of various videos, the strategy of segmenting the video into the preset number of video segments may be unsuitable to all videos, causing reduction of the video transcoding efficiency.

SUMMARY

[0004] Embodiments of the present disclosure provide a method and an apparatus for transcoding a video, a device, and a medium, to improve a rationality of a video segment in a video transcoding process and ensure an improvement of a video transcoding efficiency.

[0005] According to an aspect of embodiments of the present disclosure, a method for transcoding a video is provided. The method includes: obtaining an input attribute of a video and a target attribute; determining a segment transcoding speed of the video based on the input attribute and the target attribute, the segment transcoding speed indicating a transcoding speed of a video segment obtained by segmenting the video; determining the number of video segments of the video based on a preset target transcoding speed of the video and the segment transcoding speed; segmenting the video based on a video length of the video and the number of video segments to obtain the video segments; and transcoding the video segments based on the segment transcoding speed.

[0006] According to another aspect of embodiments of the present disclosure, an apparatus for transcoding a video is provided. The apparatus includes: a first obtaining module, a first determining module, a second determining module, a processing module, and a first transcoding module. The first obtaining module is configured to obtain an input attribute of a video and a target attribute. The first determining module is configured to determine a segment transcoding speed of the video based on the input attribute and the target attribute. The segment transcoding speed indicates a transcoding speed of a video segment obtained by segmenting the video. The second determining module is configured to determine the number of video segments of the video based on a preset target transcoding speed of the video and the segment transcoding speed. The processing module is configured to segment the video based on a video length of the video and the number of video segments to obtain the video segments. The first transcoding module is configured to transcode the video segments based on the segment transcoding speed.

[0007] According to another aspect of embodiments of the present disclosure, an electronic device is provided. The electronic device includes: at least one processor and a memory. The memory is communicatively coupled to the at least one processor. The memory is configured to store instructions executed by the at least one processor. When the instructions are executed by the at least one processor, the at least one processor is caused to execute the method for transcoding the video according to any one of embodiments of the present disclosure.

[0008] According to another aspect of embodiments of the present disclosure, a non-transitory computer readable storage medium is provided. The non-transitory computer readable storage medium has computer instructions stored thereon. The computer instructions are configured to enable a computer to execute the method for transcoding the video according to any one of embodiments of the present disclosure.

[0009] In embodiments of the present disclosure, the segment transcoding speed of the video is determined based on the input attribute of the video and the target attribute. The number of video segments is determined based on the preset target transcoding speed and the segment transcoding speed. The video is segmented into video segments and the video segments are transcoded, thereby improving the rationality of segmenting the video in the video transcoding process. The transcoding speed of the video may be preset, thereby improving the video transcoding efficiency.

[0010] It should be understood that, description in Summary of the present disclosure does not aim to limit a key or important feature in embodiments of the present disclosure, and does not used to limit the scope of the present disclosure. Other features of the present disclosure will be easily understood by following descriptions.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011] The accompanying drawings are used for better understanding the solution and do not constitute a limitation of the present disclosure.

FIG. 1 is a schematic diagram illustrating a method for transcoding a video according to embodiments of the present disclosure.

FIG. 2 is a flow chart illustrating a method for transcoding a video according to embodiments of the present disclosure.

FIG. 3 is a flow chart illustrating a method for transcoding a video according to embodiments of the present disclosure.

FIG. 4 is a flow chart illustrating a method for transcoding a video according to embodiments of the present disclosure.

FIG. 5 is a flow chart illustrating a method for transcoding a video according to embodiments of the present disclosure.

FIG. 6 is a block diagram illustrating an apparatus for transcoding a video according to embodiments of the present disclosure.

FIG. 7 is a block diagram illustrating an electronic device according to embodiments of the present disclosure.

DETAILED DESCRIPTION

[0012] Description will be made below to exemplary embodiments of the present disclosure with reference to accompanying drawings, which includes various details of embodiments of the present disclosure to facilitate understanding and should be regarded as merely examples. Therefore, it should be recognized by the skilled in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the present disclosure. Meanwhile, for clarity and conciseness, descriptions for well-known functions and structures are omitted in the following description.

[0013] FIG. 1 is a schematic diagram illustrating a method for transcoding a video according to embodiments of the present disclosure for describing embodiments of the present disclosure. As illustrated in FIG. 1, during the video transcoding, video transcoding tasks may be continuously obtained via a task interface (such as a Console/API (application programming interface)) and added to a transcoding task queue. A video transcoding task to be processed is called from the transcoding task queue by utilizing a task scheduler based on a preset task scheduling strategy. Available computing resources are called to execute the video transcoding task, thereby improving a task processing efficiency. The preset task scheduling strategy may include, but be not limited to, preferentially executing a video transcoding task with a higher priority based on preset priorities of the video transcoding tasks; or preferentially executing a video transcoding task with a lower complexity based on complexities of the video transcoding tasks. As illustrated in FIG. 1, the complexities of the video transcoding tasks may be predicted by utilizing a coding complexity predictor.

[0014] In some embodiments of the present disclosure, before the video transcoding task is executed, computing resources of the computing device may be managed and divided in advance to form multiple transcoding instance containers. In this way, the video transcoding tasks may be executed by the multiple transcoding instance containers respectively. The multiple transcoding instance containers may be not influenced by each other, such that a distributed processing of the video transcoding tasks is implemented and the video transcoding efficiency is improved. As illustrated in FIG. 1, the multiple transcoding instance containers may be obtained through the division based on an idle CPU (central processing unit) cluster, an idle GPU (graphic processing unit) cluster and an available computing chip (such as a FPGA chip) cluster of the computing device. The number of transcoding instance containers may be set based on actually available resources. The number of transcoding instance containers illustrated in FIG. 1 should not be understood as a limitation to embodiments of the present disclosure. Description will be made below to transcoding each video to be transcoded according to embodiments of the present disclosure.

[0015] FIG. 2 is a flow chart illustrating a method for transcoding a video according to embodiments of the present disclosure. The method may be applicable to transcode a video. In detail, the method may be applicable to video cloud transcoding. That is, a cloud computing infrastructure is employed for the video transcoding. The method according to embodiments of the present disclosure may be executed by an apparatus for transcoding a video. The apparatus may be implemented by software and/or hardware, and may be integrated on any electronic device with a computing capability, such as a server.

[0016] As illustrated in FIG. 2, the method for transcoding a video according to embodiments of the present disclosure may include the following.

[0017] At block S101, an input attribute of a video to be transcoded and a target attribute are obtained.

[0018] The input attribute of the video to be transcoded refers to attributes of the video to be transcoded, which may be determined by analyzing attributes of the video to be transcoded with a video analysis technology. For example, the input attribute includes at least one of resolution, code rate, encoder type, and complexity of a preset number of video pictures. The complexity includes a temporal complexity and a spatial complexity. The encoder type may include an encoder standard of an encoder for encoding the video to be transcoded, such as H.264, H.265, VP9 or AV1. The temporal complexity may be used to represent a time variation of a sequence of video pictures. The spatial complexity may be used to represent a texture complexity of a video picture (also called as a frame). For example, a preset number of video pictures may be periodically extracted from the video to be transcoded based on a time sequence. Referring to an existing algorithm for obtaining the temporal complexity and the spatial complexity, the temporal complexity and the spatial complexity of the extracted video pictures are obtained and used as the input attribute of the video to be transcoded.

[0019] The target attribute, also referred as transcoding parameters for obtaining a transcoded video, may be preset depending on a transcoding requirement or a quality of the transcoded video. The target attribute includes at least one of resolution, code rate, encoder type, and the number of key pictures. The key pictures may include I pictures, P pictures and B pictures in the video. In some embodiments, the target attribute may include the number of B pictures.

[0020] A processing speed of transcoding the video may be affected by all attributes of the video. Therefore, in embodiments of the present disclosure, the input attribute of the video to be transcoded and the target attribute are taken into consideration as a factor for accurately determining a segment transcoding speed of the video to be transcoded, thereby providing a foundation for setting the number of video segments.

[0021] At block S102, a segment transcoding speed of the video to be transcoded is determined based on the input attribute and the target attribute. The segment transcoding speed indicates a transcoding speed of a video segment. The video segment is a segment of the video obtained by segmenting the video to be transcoded.

[0022] In detail, a predetermined mapping relationship among input attributes of videos, target attributes and segment transcoding speeds may be utilized to determine the segment transcoding speed of a current video to be transcoded. The transcoding speed may also be called as a transcoding velocity. The mapping relationship between the input attributes, the target attributes and the segment transcoding speeds indicates an influence rule of the input and target attributes on the segment transcoding speeds. The mapping relationship may be determined by statistically analyzing rules among sample input attributes of sample videos to be transcoded, sample target attributes and sample segment transcoding speeds of the sample videos.

[0023] At block S103, the number of video segments of the video to be transcoded is determined based on a preset target transcoding speed of the video to be transcoded and the segment transcoding speed.

[0024] The preset target transcoding speed of the video to be transcoded is a speed value preset based on a transcoding requirement, which refers to a speed of processing the whole video to be transcoded in a transcoding process. The target transcoding speed being preset means that the transcoding speed of the video to be transcoded is controllable based on the transcoding requirement. The transcoding speed of the video is a quotient of dividing a video length of the video by a transcoding duration of the video. The segment transcoding speed is a quotient of dividing a segment length of the video segment by a transcoding duration of the video segment. In a process of distributed video transcoding, in order to improve the efficiency of the video transcoding, each video segment may have the same transcoding duration and the transcoding duration of each video segment may be equal to the transcoding duration of the whole video. In this way, occurrence of a condition that some nodes are in idle states since the transcoding speed at other nodes is slow to prolong the transcoding duration of the whole video may be reduced. Therefore, after the segment transcoding speed v1 of the video to be transcoded is determined, the number x of video segments of the video to be transcoded may be determined by dividing a preset target transcoding speed v2 of the video to be transcoded by the segment transcoding speed v1, that is, x=v2/v1, where x is an integer. Generally, the more the video segments, the higher the video transcoding efficiency is.

[0025] At block S104, the video to be transcoded is segmented based on a video length of the video to be transcoded and the number of video segments, to obtain the video segments.

[0026] The video to be transcoded may be segmented based on the video length of the video to be transcoded and the number of video segments, to determine a time interval corresponding to each video segment. That is, a time stamp is determined for each video segment.

[0027] At block S105, the video segments are transcoded based on the segment transcoding speed.

[0028] After transcoding of each video segment is completed to obtain multiple transcoded video segments, the multiple transcoded video segments may be gathered based on the time stamp of each video segment to obtain the transcoded video. The transcoded video may be stored or issued subsequently.

[0029] In some embodiments of the present disclosure, the segment transcoding speed of the video to be transcoded is determined based on the input attribute of the video to be transcoded and the target attribute. That is, the segment transcoding speed of the video is accurately determined based on consideration factors of fine granularity, and a foundation for reasonably determining the number of video segments is provided. The number of video segments of the video to be transcoded is determined based on the preset target transcoding speed of the video to be transcoded and the segment transcoding speed. The video to be transcoded is segmented to obtain the video segments and the video segments are transcoded. Therefore, the rationality is increased for segmenting the video in the video transcoding process and a problem existing in the prior art that the video is unreasonably divided is solved. In some embodiments of the present disclosure, the target transcoding speed of the video may be preset, such that the number of video segments may be determined based on the target transcoding speed, thereby ensuring the effective improvement of the video transcoding efficiency.

[0030] On the basis of the above, in some embodiments of the present disclosure, before the segment transcoding speed of the video to be transcoded is determined based on the input attribute and the target attribute, the method also includes obtaining resource configuration information of container instances. Correspondingly, determining the segment transcoding speed of the video to be transcoded based on the input attribute and the target attribute includes determining the segment transcoding speed of the video to be transcoded based on the input attribute, the target attribute, and the resource configuration information.

[0031] In some embodiments of the present disclosure, the container instance refers to an independent resource space for performing the video transcoding task. The independent resource space is obtained by managing and dividing the computing resources on the computing device in advance. The resource configuration information of the container instances may include configuration information of CPU, configuration information of GPU and configuration information of a computing chip. For example, the configuration information includes a model of the CPU, a main frequency of the CPU, the number of cores of the CPU, a model of the GPU, a main frequency of the CPU, the number of cores of the GPU, and a model of the computing chip.

[0032] In detail, a predetermined mapping relationship among input attributes of videos, target attributes, resource configuration information of container instances and segment transcoding speeds may be utilized to determine the segment transcoding speed of a current video to be transcoded. The mapping relationship indicates an influence rule of the input attributes of videos, the target attributes and the resource configuration information of container instances on the segment transcoding speeds. The mapping relationship may be determined by statistically analyzing rules among sample input attributes of a large amount of sample videos to be transcoded, sample target attributes, resource configuration information of container instances used during performing the transcoding task, and segment transcoding speeds of the large amount of sample videos. By considering the resource configuration information of the container instances while determining the segment transcoding speed of the video to be transcoded, the container resource may be fully used for performing the transcoding task, thereby maximizing the utilization of container resources.

[0033] Further, transcoding the video segments based on the segment transcoding speed includes calling the container instances to transcode the video segments based on the segment transcoding speed. By calling the multiple container instances simultaneously, distributed transcoding processing may be performed on the video to be transcoded, thereby improving the video transcoding efficiency.

[0034] Further, before determining the segment transcoding speed of the video to be transcoded based on the input attribute, the target attribute and the resource configuration information, the method according to embodiments of the present disclosure also includes obtaining a preset speed level of the segment transcoding speed of the video to be transcoded. Correspondingly, determining the segment transcoding speed of the video to be transcoded based on the input attribute, the target attribute and the resource configuration information includes determining the segment transcoding speed of the video to be transcoded based on the input attribute, the target attribute, the resource configuration information, and the preset speed level.

[0035] The preset speed level refers to a level of the transcoding speed set in advance, such as slowest, slower, slow, fast, faster and fastest. The video transcoding speed has a negative correlation to a video transcoding quality. Therefore, the transcoding speed level may be preset based on different transcoding requirements and the segment transcoding speed meeting the transcoding speed level may be determined. Therefore, embodiments of the present disclosure may meet different requirements.

[0036] In detail, a predetermined mapping relationship among input attributes of videos, target attributes, resource configuration information of container instances, preset speed levels, and segment transcoding speeds may be utilized to determine the segment transcoding speed of the current video to be transcoded. The predetermined mapping relationship indicates an influence rule of the input attributes of videos, the target attributes, the resource configuration information of container instances and the preset speed levels on the segment transcoding speeds. The mapping relationship may be determined by statistically analyzing rules among sample input attributes of a large amount of sample videos to be transcoded, sample target attributes, resource configuration information of container instances used during performing the video transcoding task, the speed levels, and the segment transcoding speeds of the large amount of sample videos.

[0037] FIG. 3 is a flow chart illustrating a method for transcoding a video according to embodiments of the present disclosure. The method illustrated in FIGS. 1 and 2 may be described in detail. Further, the method illustrated in FIG. 3 may be combined with the method illustrated in FIGS. 1 and 2, which is not limited in the disclosure. As illustrated in FIG. 3, the method may include the following.

[0038] At block S201, an input attribute of a video to be transcoded and a target attribute are obtained.

[0039] At block S202, a segment transcoding speed is determined with a transcoding speed prediction model based on the input attribute and the target attribute. The transcoding speed prediction model is trained in advance. The segment transcoding speed refers to a transcoding speed of a video segment obtained after the video to be transcoded is segmented.

[0040] In some embodiments of the present disclosure, a mapping relationship among input attributes of videos, target attributes and segment transcoding speeds may be represented by the pre-trained model. The transcoding speed prediction model may be any model with a function for predicting the segment transcoding speed.

[0041] Further, the method according to embodiments of the present disclosure may also include obtaining a sample input attribute of a sample video and a sample target attribute; marking a segment transcoding speed of the sample video to obtain a marked result; and training the transcoding speed prediction model based on the sample input attribute, the sample target attribute and the marked result. In detail, the transcoding speed prediction model may be trained based on a support vector machine (SVM) model, a random forest (RF) model or an eXtreme gradient boosting (XGBoost).

[0042] At block S203, the number of video segments of the video to be transcoded is determined based on a preset target transcoding speed of the video to be transcoded and the segment transcoding speed.

[0043] At block S204, the video to be transcoded is segmented based on a video length of the video to be transcoded and the number of video segments, to obtain the video segments.

[0044] At block S205, the video segments are transcoded based on the segment transcoding speed.

[0045] In embodiments of the present disclosure, by utilizing the transcoding speed prediction model, the segment transcoding speed of the video to be transcoded is determined based on the input attribute of the video to be transcoded and the target attribute, thereby improving the rationality and the accuracy of determining the segment transcoding speed. The number of video segments of the video to be transcoded is determined based on the preset target transcoding speed of the video to be transcoded and the segment transcoding speed, thereby improving the rationality of segmenting the video in the video transcoding process. As a result, an existing problem that the video is unreasonably segmented may be solved, the video transcoding efficiency may be improved.

[0046] On the basis of the above, it should be noted that, more factors are taken into consideration to determine the segment transcoding speed of the video to be transcoded. For example, the resource configuration information of the container instances and/or the preset speed level of the segment transcoding speed are considered. Correspondingly, during training the transcoding speed prediction model, in addition to the sample input attributes, the sample target attributes and the marked results of the sample video segments, the resource configuration information of the container instances and/or the speed levels corresponding to the segment transcoding speeds of the sample videos may also be used as inputs for training the transcoding speed prediction model to obtain a final transcoding speed prediction model, thereby improving the accuracy of predicting the segment transcoding speed by utilizing the transcoding speed prediction model.

[0047] FIG. 4 is a flow chart illustrating a method for transcoding a video according to embodiments of the present disclosure. The method in FIGS. 1 to 3 may be described in detail in FIG. 4. FIG. 4 is not understood as a detailed limitation of embodiments of the present disclosure. As illustrated in FIG. 4, the method may include a prediction stage and a training stage. In detail, the method may include the following.

[0048] At block S21, a video to be transcoded is input.

[0049] At block S22, the video is analyzed to determine an input attribute of the video. A preset target attribute, preset resource configuration information and a preset speed level of a segment transcoding speed are obtained.

[0050] As illustrated in FIG. 4, the block S22 may proceed to the block S27. At block S27, the input attribute of the video and the target attribute may be outputted as input data of the training stage of a model.

[0051] At block S23, a segment transcoding speed is predicted with a transcoding speed prediction model.

[0052] At block S24, the number of video segments of the video to be transcoded is determined.

[0053] The number of video segments of the video to be transcoded is determined based on a preset target transcoding speed of the video to be transcoded and the segment transcoding speed.

[0054] At block S25, distributed transcoding instances are called to execute transcoding tasks of video segments respectively.

[0055] In detail, the number of transcoding instances called may be equal to the number of video segments. The transcoding instances called may be configured to execute the transcoding tasks of the video segments simultaneously. The block S25 may proceed to the block S28. At block S28, the segment transcoding speed of each video segment, the transcoding speed level, and resource configuration information of the instance containers are outputted as input data of the training stage of the model. Comparing with outputting a preset speed level and preset resource configuration information before performing the transcoding task, outputting the speed level of the segment transcoding speed and the resource configuration information of the instance containers after performing the transcoding task may ensure the authenticity of the above-mentioned information. The information output at blocks S27 and S28 may be determined by analyzing a transcoding task log.

[0056] At block S26, the transcoded video segments are gathered to form a target video.

[0057] In detail, the transcoded video segments may be merged with a video editing technology based on the time stamp of each video segment.

[0058] At block S27, the input attribute of the video and the target attribute are output.

[0059] At block S28, the segment transcoding speed of each video segment, the speed level and the resource configuration information of the instance containers are output.

[0060] At block S29, sample input attributes of sample videos to be transcoded, sample target attributes, resource configuration information of instance containers and speed levels of the sample video segments are collected as training sample data.

[0061] At block S30, the transcoding speed prediction model is updated constantly by an on-line learning platform based on the collected training sample data.

[0062] In embodiments of the present disclosure, a method of reinforcement learning is employed to continuously expand the training data of the model, such that sample data is more real, and the model is continuously updated and iterated to improve the accuracy of the model. After determining that a new model is better, the new model may be downloaded and used in the prediction stage to determine the segment transcoding speed of a current video to be transcoded.

[0063] FIG. 5 is a flow chart illustrating a method for transcoding a video according to embodiments of the present disclosure. The method in FIGS. 1 to 4 may be described in detail in FIG. 5. Further, the method in FIGS. 1 to 4 may be combined with the method illustrated in FIG. 5. As illustrated in FIG. 5, the method may include the following.

[0064] At block S301, an input attribute of a video to be transcoded and a target attribute are obtained.

[0065] For example, the input attribute includes at least one of resolution, code rate, encoder type, and complexity of a preset number of video pictures. The complexity includes a temporal complexity and a spatial complexity. The target attribute includes at least one of resolution, code rate, encoder type, and the number of key pictures.

[0066] At block S302, a segment transcoding speed of the video to be transcoded is determined based on the input attribute and the target attribute. The segment transcoding speed indicates a transcoding speed of a video segment obtained by segmenting the video to be transcoded.

[0067] At block S303, the number of video segments of the video to be transcoded is determined based on a preset target transcoding speed of the video to be transcoded and the segment transcoding speed.

[0068] At block S304, audio data and picture data of the video to be transcoded are separated to obtain audio data to be transcoded and a picture sequence to be transcoded.

[0069] In detail, audio and pictures may be extracted from the video to be transcoded with an available audio-video separation tool, to obtain the audio data to be transcoded and the picture sequence to be transcoded.

[0070] It should be noted that, the block S304 may be executed before or after the blocks S301-S303. The logical order illustrated in FIG. 5 should not be understood as a limitation to embodiments of the present disclosure. Separating the audio data and the picture sequence may be performed at any stage before segmenting the video to be transcoded, as long as segmenting only the picture sequence and not segmenting the audio data.

[0071] At block S305, the picture sequence to be transcoded is segmented based on a video length of the video to be transcoded and the number of video segments, to obtain segments of the picture sequence.

[0072] In detail, the picture sequence to be transcoded is segmented based on the video length of the video to be transcoded and the number of video segments, to determine a time interval corresponding to each segment of the picture sequence. That is, a time stamp is determined for each segment of the picture sequence.

[0073] At block S306, the segments of the picture sequence are transcoded based on the segment transcoding speed to obtain transcoded segments of the picture sequence.

[0074] The block S306 may proceed to the block S308. At block S308, the transcoded segments of the picture sequence are merged.

[0075] At block S307, the audio data to be transcoded is transcoded based on the preset target transcoding speed of the video to be transcoded to obtain transcoded audio data.

[0076] In some embodiments of the present disclosure, reasons of not segmenting the audio data include the following. If the audio data is segmented and audio segments are transcoded into some specific formats, such as an AAC format, impulse noise may be presented at a junction of audio segments. The impulse noise is determined by a specific coding standard, which affects the quality of the transcoded audio. Therefore, in the present disclosure, the whole audio is transcoded based on the preset target transcoding speed of the video to be transcoded without segmenting the audio, which may not only improve the quality of the transcoded audio, but also enable a transcoding duration of the audio data is the same as a transcoding duration of the picture sequence.

[0077] At block S308, the transcoded segments of the picture sequence are merged based on time stamps of the segments of the picture sequence to obtain a target picture sequence.

[0078] At block S309, the target picture sequence is combined with the transcoded audio data to obtain a target video.

[0079] In detail, an available video synthesis tool may be used to combine a target picture sequence with the transcoded audio data, to obtain the transcoded video.

[0080] With the embodiments of the present disclosure, the audio data and the picture data of the video to be transcoded are separated to obtain the audio data to be transcoded and the picture sequence to be transcoded. The picture sequence to be transcoded is segmented and the segments of the picture sequence are transcoded to obtain transcoded segments of the picture sequence.. The audio data to be transcoded is transcoded. The transcoded video is obtained by combining the transcoded segments of the picture sequence and the transcoded audio data. In this way, not only the quality of the transcoded audio is improved, but also the video transcoding efficiency is improved.

[0081] FIG. 6 is a block diagram illustrating an apparatus for transcoding a video according to an embodiment of the present disclosure. In embodiments of the present disclosure, the apparatus may be applicable to a condition for transcoding a video. In detail, the apparatus may be applicable to a condition for video cloud transcoding, that is, a cloud computing infrastructure is employed for the video transcoding. The apparatus according to embodiments of the present disclosure may be implemented by software and/or hardware, and may be integrated on any electronic device with a computing capability, such as a server.

[0082] As illustrated in FIG. 6, the apparatus 400 for transcoding the video according to embodiments of the present disclosure may include a first obtaining module 401, a first determining module 402, a second determining module 403, a processing module 404, and a first transcoding module 405.

[0083] The first obtaining module 401 is configured to obtain an input attribute of a video to be transcoded and a target attribute.

[0084] The first determining module 402 is configured to determine a segment transcoding speed of the video to be transcoded based on the input attribute and the target attribute. The segment transcoding speed indicates a transcoding speed of a video segment obtained by segmenting the video to be transcoded.

[0085] The second determining module 403 is configured to determine the number of video segments of the video to be transcoded based on a preset target transcoding speed of the video to be transcoded and the segment transcoding speed.

[0086] The processing module 404 is configured to segment the video to be transcoded based on a video length of the video to be transcoded and the number of video segments to obtain the video segments.

[0087] The first transcoding module 405 is configured to transcode the video segments based on the segment transcoding speed.

[0088] In embodiments of the present disclosure, the input attribute includes at least one of resolution, code rate, encoder type, and complexity of a preset number of video pictures. The complexity includes a temporal complexity and a spatial complexity.

[0089] The target attribute includes at least one of resolution, code rate, encoder type, and the number of key pictures.

[0090] In embodiments of the present disclosure, the apparatus also includes a second obtaining module, configured to obtain resource configuration information of container instances.

[0091] The first determining module 402 is configured to determine the segment transcoding speed of the video to be transcoded based on the input attribute, the target attribute, and the resource configuration information.

[0092] In embodiments of the present disclosure, the first transcoding module 405 is configured to: call the container instances to transcode the video segments based on the segment transcoding speed.

[0093] In embodiments of the present disclosure, the apparatus also includes a third obtaining module, configured to obtain a preset speed level of the segment transcoding speed of the video to be transcoded.

[0094] The first determining module 402 is configured to determine the segment transcoding speed of the video to be transcoded based on the input attribute, the target attribute, the resource configuration information, and the preset speed level.

[0095] In embodiments of the present disclosure, the first determining module 402 is configured to: determine the segment transcoding speed using a transcoding speed prediction model based on the input attribute and the target attribute.

[0096] In embodiments of the present disclosure, the apparatus also includes a fourth obtaining module, a marking module, and a training module.

[0097] The fourth obtaining module is configured to obtain sample input attributes of sample video segments and sample target attributes.

[0098] The marking module is configured to mark segment transcoding speeds of the sample videos to obtain marked results.

[0099] The training module is configured to train the transcoding speed prediction model based on the sample input attributes, the sample target attributes and the marked results.

[0100] In embodiments of the present disclosure, the apparatus also includes a separating module. The separating module is configured to separate audio data and picture data of the video to be transcoded to obtain audio data to be transcoded and a picture sequence to be transcoded before the video to be transcoded is segmented based on the video length of the video to be transcoded and the number of video segments.

[0101] The processing module 404 is configured to segment the picture sequence to be transcoded based on the video length of the video to be transcoded and the number of video segments, to obtain segments of the picture sequence.

[0102] The first transcoding module 405 is configured to transcode the segments of the picture sequence based on the segment transcoding speed to obtain transcoded segments of the picture sequence.

[0103] In embodiments of the present disclosure, the apparatus also includes a second transcoding module, configured to transcode the audio data to be transcoded based on the preset target transcoding speed of the video to be transcoded to obtain transcoded audio data.

[0104] In embodiments of the present disclosure, the apparatus also includes a third determining module and a fourth determining module.

[0105] The third determining module is configured to merge transcoded segments of the picture sequence based on time stamps of the segments of the picture sequence to obtain a target picture sequence.

[0106] The fourth determining module is configured to combine the target picture sequence with the transcoded audio data to obtain a target video.

[0107] The apparatus 400 for transcoding the video according to embodiments of the present disclosure may execute the method for transcoding the video according to embodiments of the present disclosure, and has corresponding functional modules for executing the method and beneficial effects. Content not described in detail in the apparatus embodiment of the present disclosure may refer to the description in any method embodiment of the present disclosure.

[0108] According to embodiments of the present disclosure, embodiments of the present disclosure also provides an electronic device and a readable storage medium.

[0109] As illustrated in FIG. 7, FIG. 7 is a block diagram illustrating an electronic device capable of implementing a method for transcoding a video according to embodiments of the present disclosure. The electronic device aims to represent various forms of digital computers, such as a laptop computer, a desktop computer, a workstation, a personal digital assistant, a server, a blade server, a mainframe computer and other suitable computer. The electronic device may also represent various forms of mobile devices, such as personal digital processing, a cellular phone, a smart phone, a wearable device and other similar computing device. The components, connections and relationships of the components, and functions of the components illustrated herein are merely examples, and are not intended to limit the embodiments of the present disclosure described and/or claimed herein.

[0110] As illustrated in FIG. 7, the electronic device includes: one or more processors 501, a memory 502, and interfaces for connecting various components, including a high-speed interface and a low-speed interface. Various components are connected to each other via different buses, and may be mounted on a common main board or in other ways as required. The processor may process instructions executed within the electronic device, including instructions stored in or on the memory to display graphical information of the GUI (graphical user interface) on an external input/output device (such as a display device coupled to an interface). In other implementations, multiple processors and/or multiple buses may be used together with multiple memories if desired. Similarly, multiple electronic devices may be connected, and each device provides some necessary operations (for example, as a server array, a group of blade servers, or a multiprocessor system). In FIG. 7, a processor 501 is taken as an example.

[0111] The memory 502 is a non-transitory computer readable storage medium provided by embodiments of the present disclosure. The memory is configured to store instructions executable by at least one processor, to enable the at least one processor to execute the method for transcoding the video provided by embodiments of the present disclosure. The non-transitory computer readable storage medium provided by embodiments of the present disclosure is configured to store computer instructions. The computer instructions are configured to enable a computer to execute the method for transcoding the video provided by embodiments of the present disclosure.

[0112] As the non-transitory computer readable storage medium, the memory 502 may be configured to store non-transitory software programs, non-transitory computer executable programs and modules, such as program instructions/module (such as the first obtaining module 401, the first determining module 402, the second determining module 403, the processing module 404, and the first transcoding module 405 illustrated in FIG. 6) corresponding to the method for transcoding the video according to embodiments of the present disclosure. The processor 501 is configured to execute various functional applications and data processing of the server by operating non-transitory software programs, instructions and modules stored in the memory 502, that is, implements the method for transcoding the video according to the above method embodiments.

[0113] The memory 502 may include a storage program region and a storage data region. The storage program region may store an application required by an operating system and at least one function. The storage data region may store data created according to predicted usage of the electronic device based on the semantic representation. In addition, the memory 502 may include a high-speed random-access memory, and may also include a non-transitory memory, such as at least one disk memory device, a flash memory device, or other non-transitory solid-state memory device. In some embodiments, the memory 502 may optionally include memories remotely located to the processor 501, and these remote memories may be connected to the electronic device via a network. Examples of the above network include, but are not limited to, an Internet, an intranet, a local area network, a mobile communication network and combinations thereof.

[0114] The electronic device capable of implementing the method for transcoding the video may also include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503, and the output device 504 may be connected via a bus or in other means. In FIG. 7, the bus is taken as an example.

[0115] The input device 503 may receive inputted digital or character information, and generate key signal input related to user setting and function control of the electronic device capable of implementing the method for transcoding the video, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, an indicator stick, one or more mouse buttons, a trackball, a joystick and other input device. The output device 504 may include a display device, an auxiliary lighting device (e.g., LED), a haptic feedback device (e.g., a vibration motor), and the like. The display device may include, but be not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some embodiments, the display device may be the touch screen.

[0116] The various implementations of the system and technologies described herein may be implemented in a digital electronic circuit system, an integrated circuit system, an application specific ASIC (application specific integrated circuit), a computer hardware, a firmware, a software, and/or combinations thereof. These various implementations may include: being implemented in one or more computer programs. The one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor. The programmable processor may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and may transmit data and the instructions to the storage system, the at least one input device, and the at least one output device.

[0117] These computing programs (also called programs, software, software applications, or codes) include machine instructions of programmable processors, and may be implemented by utilizing high-level procedures and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine readable medium" and "computer readable medium" refer to any computer program product, device, and/or apparatus (such as, a magnetic disk, an optical disk, a memory, a programmable logic device (PLD)) for providing machine instructions and/or data to a programmable processor, including a machine readable medium that receives machine instructions as a machine readable signal. The term "machine readable signal" refers to any signal for providing the machine instructions and/or data to the programmable processor.

[0118] To provide interaction with a user, the system and technologies described herein may be implemented on a computer. The computer has a display device (such as, a CRT (cathode ray tube) or a LCD (liquid crystal display) monitor) for displaying information to the user, a keyboard and a pointing device (such as, a mouse or a trackball), through which the user may provide the input to the computer. Other types of devices may also be configured to provide interaction with the user. For example, the feedback provided to the user may be any form of sensory feedback (such as, visual feedback, auditory feedback, or tactile feedback), and the input from the user may be received in any form (including acoustic input, voice input or tactile input).

[0119] The system and technologies described herein may be implemented in a computing system including a background component (such as, a data server), a computing system including a middleware component (such as, an application server), or a computing system including a front-end component (such as, a user computer having a graphical user interface or a web browser through which the user may interact with embodiments of the system and technologies described herein), or a computing system including any combination of such background component, the middleware components and the front-end component. Components of the system may be connected to each other via digital data communication in any form or medium (such as, a communication network). Examples of the communication network include a local area network (LAN), a wide area networks (WAN), and the Internet.

[0120] The computer system may include a client and a server. The client and the server are generally remote from each other and generally interact via the communication network. A relationship between the client and the server is generated by computer programs operated on a corresponding computer and having a client-server relationship with each other.

[0121] In embodiments of the present disclosure, firstly, the segment transcoding speed of the video to be transcoded is determined based on the input attribute of the video to be transcoded and the target attribute. Then, the number of video segments of the video to be transcoded is determined based on the preset target transcoding speed and the segment transcoding speed. The video to be transcoded is segmented and transcoding is performed on the video segments, thereby improving the rationality of the video segment in the video transcoding process. The transcoding speed of a whole video may be preset, which ensures the effective improvement of the video transcoding efficiency.

[0122] It should be understood that, steps may be reordered, added or deleted by utilizing flows in the various forms illustrated above. For example, the steps described in the present disclosure may be executed in parallel, sequentially or in different orders, so long as desired results of the technical solution disclosed in embodiments of the present disclosure may be achieved, there is no limitation here.

[0123] It should be understood by the skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made based on design requirements and other factors.

Claims

1. A method for transcoding a video, comprising:

obtaining (S101; S201; S301) an input attribute of a video and a target attribute;

determining (S102; S302) a segment transcoding speed of the video based on the input attribute and the target attribute, the segment transcoding speed indicating a transcoding speed of a video segment obtained by segmenting the video;

determining (S103; S203; S303) the number of video segments of the video based on a preset target transcoding speed of the video and the segment transcoding speed;

segmenting (S104; S204) the video based on a video length of the video and the number of video segments to obtain the video segments; and

transcoding (S105; S205) the video segments based on the segment transcoding speed.

2. The method of claim 1, wherein the input attribute comprises at least one of resolution, code rate, encoder type, and complexity of a preset number of video pictures, the complexity comprising a temporal complexity and a spatial complexity; and
the target attribute comprises at least one of resolution, code rate, encoder type, and the number of key pictures.

3. The method of claim 1 or 2, further comprising:

obtaining resource configuration information of container instances; and

determining the segment transcoding speed based on the input attribute, the target attribute and the resource configuration information.

4. The method of claim 3, wherein transcoding the video segments based on the segment transcoding speed comprises:
calling the container instances to transcode the video segments based on the segment transcoding speed.

5. The method of claim 3 or 4, further comprising:

obtaining a preset speed level of the segment transcoding speed; and

determining the segment transcoding speed based on the input attribute, the target attribute, the resource configuration information, and the preset speed level.

6. The method of any one of claims 1 to 5, wherein determining (S102; S302) the segment transcoding speed based on the input attribute and the target attribute comprises:
determining (S202) the segment transcoding speed using a transcoding speed prediction model, based on the input attribute and the target attribute.

7. The method of claim 6, further comprising:

obtaining sample input attributes of sample videos and sample target attributes;

marking segment transcoding speeds of the sample videos to obtain marked results; and

training the transcoding speed prediction model based on the sample input attributes, the sample target attributes and the marked results.

8. The method of any one of claims 1 to 7, further comprising:

separating (S304) audio data and picture data of the video to obtain audio data and a picture sequence;

segmenting (S305) the picture sequence based on the video length and the number of video segments, to obtain segments of the picture sequence; and

transcoding (S306) the segments of the picture sequence based on the segment transcoding speed to obtain transcoded segments of the picture sequence.

9. The method of claim 8, further comprising:
transcoding (S307) the audio data based on the preset target transcoding speed to obtain transcoded audio data.

10. The method of claim 9, further comprising:

merging (S308) the transcoded segments of the picture sequence based on time stamps of the segments of the picture sequence to obtain a target picture sequence; and

combining (S309) the target picture sequence with the transcoded audio data to obtain a target video.

11. An apparatus (400) for transcoding a video, comprising:

a first obtaining module (401), configured to obtain an input attribute of a video and a target attribute;

a first determining module (402), configured to determine a segment transcoding speed of the video based on the input attribute and the target attribute, the segment transcoding speed indicating a transcoding speed of a video segment obtained by segmenting the video;

a second determining module (403), configured to determine the number of video segments of the video based on a preset target transcoding speed of the video and the segment transcoding speed;

a processing module (404), configured to segment the video based on a video length of the video and the number of video segments to obtain the video segments; and

a first transcoding module (405), configured to transcode the video segments based on the segment transcoding speed.

12. The apparatus (400) of claim 11, further comprising:

a second obtaining module, configured to obtain resource configuration information of container instances,

wherein the first determining module is configured to determine the segment transcoding speed based on the input attribute, the target attribute, and the resource configuration information.

13. The apparatus of claim 11 or 12, wherein the first transcoding module is configured to:
call the container instances to transcode the video segments based on the segment transcoding speed.

14. An electronic device, comprising:

at least one processor; and

a memory, communicatively coupled to the at least one processor,

wherein the memory is configured to store instructions executable by the at least one processor, and when the instructions are executed by the at least one processor, the at least one processor is caused to implement a method for transcoding a video of any one of claims 1 to 10.

15. A non-transitory computer readable storage medium having computer instructions stored thereon, wherein the computer instructions are configured to enable a computer to execute a method for transcoding a video of any one of claims 1 to 10.

Drawing

Search report

Search report