(19)
(11)EP 3 404 652 B1

(12)EUROPEAN PATENT SPECIFICATION

(45)Mention of the grant of the patent:
04.12.2019 Bulletin 2019/49

(21)Application number: 17738118.3

(22)Date of filing:  10.01.2017
(51)Int. Cl.: 
G10L 21/0316  (2013.01)
H04S 1/00  (2006.01)
G10L 19/005  (2013.01)
G10L 19/008  (2013.01)
G10L 21/0208  (2013.01)
(86)International application number:
PCT/CN2017/070692
(87)International publication number:
WO 2017/121304 (20.07.2017 Gazette  2017/29)

(54)

AUDIO DATA PROCESSING METHOD AND TERMINAL

AUDIODATENVERARBEITUNGSVERFAHREN UND -ENDGERÄT

PROCÉDÉ ET TERMINAL DE TRAITEMENT DE DONNÉES AUDIO


(84)Designated Contracting States:
AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

(30)Priority: 14.01.2016 CN 201610025708

(43)Date of publication of application:
21.11.2018 Bulletin 2018/47

(73)Proprietor: Tencent Technology (Shenzhen) Company Limited
Shenzhen, Guangdong 518057 (CN)

(72)Inventor:
  • YANG, Jiang
    Shenzhen Guangdong 518057 (CN)

(74)Representative: V.O. 
P.O. Box 87930 Carnegieplein 5
2508 DH Den Haag
2508 DH Den Haag (NL)


(56)References cited: : 
WO-A1-94/17517
CN-A- 101 640 053
CN-A- 103 905 843
US-A1- 2009 018 836
US-B1- 6 255 576
CN-A- 101 425 291
CN-A- 101 789 240
US-A1- 2006 085 197
US-A1- 2014 236 584
  
  • FERREIRA A J S: "A new frequency domain approach to time-scale expansion of audio signals", ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 1998. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON SEATTLE, WA, USA 12-15 MAY 1998, NEW YORK, NY, USA,IEEE, US, vol. 6, 12 May 1998 (1998-05-12), pages 3577-3580, XP010279577, DOI: 10.1109/ICASSP.1998.679649 ISBN: 978-0-7803-4428-0
  
Note: Within nine months from the publication of the mention of the grant of the European patent, any person may give notice to the European Patent Office of opposition to the European patent granted. Notice of opposition shall be filed in a written reasoned statement. It shall not be deemed to have been filed until the opposition fee has been paid. (Art. 99(1) European Patent Convention).


Description

RELATED APPLICATION



[0001] This application claims priority to Chinese Patent Application No. 201610025708.1, entitled "AUDIO DATA PROCESSING METHOD AND APPARATUS" filed with the Chinese Patent Office on January 14, 2016, which is incorporated by reference in its entirety.

FIELD OF THE TECHNOLOGY



[0002] This application relates to the field of audio data processing technologies, and in particularly, to an audio data processing method and terminal.

BACKGROUND OF THE DISCLOSURE



[0003] Due to the application of audio data processing technologies, sounds can be collected by using an audio monitoring unit by people, to generate audio data and store the data. The stored audio data may be played by using an audio player when necessary, thereby re-showing the sounds. The wide application of audio data processing technologies makes it very easy to record and re-show sounds, thereby playing a vital role in people's life and work.

[0004] Currently, when processing an audio data stream, one frame of audio data needs to be inserted between two adjacent frames of audio data. For example, in some special sound effects, one frame of audio data is inserted between two adjacent frames of audio data in an audio data stream of either a left channel or a right channel, so that there is a difference of one frame of audio data between the audio data stream of the left channel and the audio data stream of the right channel, thereby achieving the special effect of Surround. For another example, when the audio data stream of the left channel and the audio data stream of the right channel are asynchronous, audio data may be inserted into one of the audio data streams to resolve the problem of the asynchronous audio data streams of the left channel and the right channel.

[0005] Different approaches of time-scale audio data modification are presented, for example, in WO 94/17517 A1, US 2009/018836 A1 and US 6255576 B1.

[0006] However, currently, when inserting audio data between two adjacent frames of audio data in an audio data stream, the inserted audio data is usually directly one the two frames of audio data, leading to that there is apparent noise where the audio data is inserted during audio play after the insertion. This is a problem needing to be resolved. Similarly, there is noise when one frame of audio data is deleted from the audio data stream.

SUMMARY



[0007] According to embodiments of this application, an audio data processing method and terminal is provided.

[0008] An audio data processing method, including:

obtaining a first audio frame and a second audio frame adjacent to each other from an audio data stream, the first audio frame preceding the second audio frame in a time sequence of the audio data stream;

determining a frame segmentation position, wherein a sampling point value at the frame segmentation position in the first audio frame and a sampling point value at the frame segmentation position in the second audio frame satisfy a distance closeness condition; and

obtaining respective sampling point value preceding the frame segmentation position in the second audio frame and respective sampling point value following the frame segmentation position in the first audio frame;

sequentially stitching the respective sampling point value obtained from the second audio frame and the respective sampling point value obtained from the first audio frame, to generate a third audio frame; and

inserting the third audio frame between the first audio frame and the second audio frame.



[0009] An audio data processing method, including:

obtaining a first audio frame and a second audio frame adjacent to each other from an audio data stream, the first audio frame preceding the second audio frame in a time sequence of the audio data stream;

determining a frame segmentation position, wherein a sampling point value at the frame segmentation position in the first audio frame and a sampling point value at the frame segmentation position in the second audio frame satisfy a distance closeness condition;

obtaining respective sampling point value preceding the frame segmentation position in the first audio frame and respective sampling point value following the frame segmentation position in the second audio frame, sequentially stitching the respective sampling point value, to generate a fourth audio frame; and

replacing the first audio frame and the second audio frame with the fourth audio frame.



[0010] A terminal, including a memory and a processor, wherein computer readable instructions are stored in the memory, the computer readable instructions, when executed by the processor, causing the processor to perform operations including:

obtaining a first audio frame and a second audio frame adjacent to each other from an audio data stream, the first audio frame preceding the second audio frame in a time sequence of the audio data stream;

determining a frame segmentation position, wherein a sampling point value at the frame segmentation position in the first audio frame and a sampling point value at the frame segmentation position in the second audio frame satisfy a distance closeness condition; and

obtaining respective sampling point value preceding the frame segmentation position in the second audio frame and respective sampling point value following the frame segmentation position in the first audio frame, sequentially stitching the respective sampling point value, to generate a third audio frame, and inserting the third audio frame between the first audio frame and the second audio frame.



[0011] A terminal, including a memory and a processor, wherein computer readable instructions are stored in the memory, the computer readable instructions, when executed by the processor, causing the processor to perform operations including:

obtaining a first audio frame and a second audio frame adjacent to each other from an audio data stream, the first audio frame preceding the second audio frame in a time sequence of the audio data stream;

determining a frame segmentation position, wherein a sampling point value at the frame segmentation position in the first audio frame and a sampling point value at the frame segmentation position in the second audio frame satisfy a distance closeness condition; and

obtaining respective sampling point value preceding the frame segmentation position in the first audio frame and respective sampling point value following the frame segmentation position in the second audio frame, sequentially stitching the respective sampling point value, to generate a fourth audio frame, and replacing the first audio frame and the second audio frame with the fourth audio frame.



[0012] Details of one or more embodiments of this application are shown in the following accompanying drawings and descriptions. Other features, objectives and advantages of this application will be apparent according to the specification, accompanying drawings and claims.

BRIEF DESCRIPTION OF THE DRAWINGS



[0013] To describe the technical solutions of the embodiments of this application or the existing technology more clearly, the following briefly describes the accompanying drawings required for describing the embodiments or the existing technology. Apparently, the accompanying drawings in the following description show merely some embodiments of this application, and a person of ordinary skill in the art may still derive other drawings from these accompanying drawings without creative efforts.

FIG. 1 is a schematic structural diagram of a terminal for implementing an audio data processing method according to an embodiment;

FIG. 2 is a schematic flowchart of an audio data processing method according to an embodiment;

FIG. 3A is a schematic diagram of inserting an audio frame between a first audio frame and a second audio frame adjacent to each other according to an embodiment;

FIG. 3B is a schematic diagram of deleting one of a first audio frame and a second audio frame adjacent to each other according to an embodiment;

FIG. 4 is a partial distribution diagram of sampling point values of a first audio frame according to an embodiment;

FIG. 5 is a partial distribution diagram of sampling point values of a second audio frame according to an embodiment;

FIG. 6 is a partial distribution diagram of sampling point values obtained by overlapping a first audio frame and a second audio frame according to an embodiment;

FIG. 7A is a schematic diagram of a process of segmenting an audio frame, stitching an audio frame and inserting an audio frame according to an embodiment;

FIG. 7B is a schematic diagram of a process of segmenting an audio frame, stitching an audio frame and replacing an audio frame according to an embodiment;

FIG. 8 is a schematic diagram of a process of reserving a copy and performing play processing according to an embodiment;

FIG. 9 is a schematic flowchart of steps of determining a frame segmentation position according to an embodiment;

FIG. 10 is a schematic diagram showing that a first fitted curve of a first audio frame and a second fitted curve of a second audio frame are in a same coordinate system according to an embodiment;

FIG. 11 is a schematic flowchart of an audio data processing method according to another embodiment;

FIG. 12 is a structural block diagram of a terminal according to an embodiment;

FIG. 13 is a structural block diagram of a terminal according to another embodiment; and

FIG. 14 is a structural block diagram of a frame segmentation position determining module in FIG. 12 or FIG. 13 according to an embodiment.


DESCRIPTION OF EMBODIMENTS



[0014] To make the objectives, technical solutions, and advantages of this application clearer and more comprehensible, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely used to explain this application but are not intended to limit this application.

[0015] As shown in FIG. 1, in an embodiment, a terminal 100 for implementing an audio data processing method is provided, including a processor, a non-volatile storage medium, an internal memory, an input apparatus and an audio output interface connected by using a system bus. The processor is capable of calculating and controlling working of the terminal 100. The processor is configured to perform an audio data processing method. The non-volatile storage medium includes at least one of a magnetic storage medium, an optical storage medium, or a flash storage medium. Computer readable instructions are stored in the non-volatile storage medium. The computer readable instructions, when executed by the processor, cause the processor to perform an audio data processing method. The input apparatus includes at least one of a physical button, a rack ball, a touchpad, a physical interface used for connecting an external control device, or a touch layer overlapping with a display screen. The external control device may be, for example, a mouse, a multimedia wire-controlled apparatus, or the like. The terminal 100 includes various electronic devices capable of processing audio data, such as a desktop computer, a portable notebook computer, a mobile phone, a music player, or a smartwatch.

[0016] As shown in FIG. 2, in an embodiment, an audio data processing method is provided. In this embodiment, an example is used for description in which the method is applied to the terminal 100 in FIG. 1. The method specifically includes the following steps:
Step 202: Obtain a first audio frame and a second audio frame adjacent to each other from an audio data stream, the first audio frame preceding the second audio frame in time sequence.

[0017] Specifically, the audio data stream includes a series of sampling point values having a time sequence. The sampling point values are obtained by sampling original analog sound signals according to a particular audio sampling rate. A series of sampling point values can describe a sound. The audio sampling rate refers to a quantity of sampling points collected within one second, with a unit being Hertz (Hz). A higher audio sampling rate indicates a higher acoustic frequency that can be described.

[0018] An audio frame includes sampling point values having a time sequence and specific quantity. According to a coding format of an audio data stream, if there is an audio frame in the coding format, the audio frame may be directly used. If there is no audio frame but a series of sampling point values having a time sequence, an audio frame may be segmented from the series of sampling point values having a time sequence according to a preset frame length. The preset frame length refers to a preset quantity of sampling point values included in one audio frame.

[0019] The first audio frame and the second audio frame obtained from the audio data stream are adjacent, and the first audio frame precedes the second audio frame in time sequence. That is, when performing play processing on the audio data stream, the first audio frame is played before the second audio frame. The first audio frame and the second audio frame are two adjacent audio frames between which an audio frame needs to be inserted.

[0020] For example, referring to FIG. 3A, an audio data stream includes a first audio frame A, a second audio frame B, ... ranked according to a time sequence. When an audio frame needs to be inserted, an audio frame F needs to be inserted between the first audio frame A and the second audio frame B. Referring to FIG. 3B, when an audio frame needs to be deleted, sampling point values of one of the first audio frame A and the second audio frame B need to be deleted, to reserve one audio frame G.

[0021] Step 204: Determine a frame segmentation position, where a sampling point value at the frame segmentation position in the first audio frame and a sampling point value at the frame segmentation position in the second audio frame satisfy a distance near condition.

[0022] Specifically, the frame segmentation position is a position where the first audio frame and the second audio frame are segmented, and is a position relative to an audio frame. A distance refers to an absolute value of a difference between two sampling point values that are at corresponding positions in two audio frames. For example, referring to a partial distribution diagram shown in FIG. 4 of sampling point values of a first audio frame A and a partial distribution diagram shown in FIG. 5 of sampling point values of a second audio frame B. An absolute value of a difference between the first sampling point value of the first audio frame A and the first sampling point value of the second audio frame B is a distance between the first sampling point value of the first audio frame A and the first sampling point value of the second audio frame B.

[0023] The distance near condition is a quantization condition used for determining whether a distance between two sampling point values is near. In an embodiment, the distance near condition may be that a distance is equal to 0, or that a distance between two sampling point values is not equal but near. For example, the distance is less than or equal to a threshold. The threshold may be preset, or may be determined according to dynamic of sampling point values of the first audio frame and/or the second audio frame. For example, the threshold may be obtained by multiplying an average value of the sampling point values of the first audio frame and/or the second audio frame by a preset percentage.

[0024] In an embodiment, the terminal may calculate a distance of each sampling point value pair of the first audio frame and the second audio frame, to select a sampling point value pair having the smallest distance. The frame segmentation position is a position corresponding to the selected sampling point value pair having the smallest distance. In this case, the distance near condition is that a distance of the sampling point value pair corresponding to the frame segmentation position in the first audio frame and the second audio frame is the smallest. The sampling point value pair herein refers to two sampling point values at same positions in the two audio frames. The positions of the sampling point values are positions of the sampling point values relative to the audio frames to which the sampling point values belong.

[0025] For example, a partial distribution diagram shown in FIG. 6 of sampling point values is obtained by overlapping FIG. 4 and FIG. 5, so that it is convenient to compare the partial distribution of sampling point values of the audio frame A with the partial distribution of sampling point values of the audio frame B. Assuming that the frame segmentation position is S, an absolute value of a difference between a sampling point value at S of the audio frame A and a sampling point value at S of the audio frame B is very near and even equal. That is, the sampling point value at S in the audio frame A and the sampling point value at S in the audio frame B satisfy the distance near condition.

[0026] Step 206: Obtain sampling point values preceding the frame segmentation position in the second audio frame and sampling point values following the frame segmentation position in the first audio frame and sequentially stitch the sampling point values, to generate a third audio frame, and insert the third audio frame between the first audio frame and the second audio frame; or obtain sampling point values preceding the frame segmentation position in the first audio frame and sampling point values following the frame segmentation position in the second audio frame and sequentially stitch the sampling point values, to generate a fourth audio frame, and replace the first audio frame and the second audio frame with the fourth audio frame.

[0027] Specifically, when an audio frame needs to be inserted, sampling point values preceding the frame segmentation position in the second audio frame are obtained, and sampling point values following the frame segmentation position in the first audio frame are obtained. A quantity of the obtained sampling point values is right equal to a length of one audio frame. The sampling point values of the second audio frame and the sampling point values of the first audio frame are sequentially stitched, to generate a third audio frame. In addition, the sampling point values from the second audio frame are ranked according to a sequence in the second audio frame, and the sampling point values from the first audio frame are ranked according to a sequence in the first audio frame. Finally, the generated third audio frame is inserted between the first audio frame and the second audio frame.

[0028] For example, referring to FIG. 7A, the first audio frame A is segmented into a preceding part and a following part by the frame segmentation position S, and the second audio frame B is also segmented into a preceding part and a following part by the frame segmentation position S. The preceding part refers to sampling point values preceding the frame segmentation position S. Correspondingly, the following part refers to sampling point values following the frame segmentation position S. The preceding part of the second audio frame B and the following part of the first audio frame A are stitched, to obtain a third audio frame F, and then the third audio frame F obtained by stitching may be inserted between the first audio frame A and the second audio frame B.

[0029] When an audio frame needs to be deleted, sampling point values preceding the frame segmentation position in the first audio frame are obtained, and sampling point values following the frame segmentation position in the second audio frame are obtained. A quantity of the obtained sampling point values is right equal to a length of one audio frame. The sampling point values of the first audio frame and the sampling point values of the second audio frame are sequentially stitched, to obtain a fourth audio frame. In addition, the sampling point values from the first audio frame are ranked according to a sequence in the first audio frame, and the sampling point values from the second audio frame are ranked according to a sequence in the second audio frame. Finally, the first audio frame and the second audio frame are replaced with the generated fourth audio frame.

[0030] For example, referring to FIG. 7B, a first audio frame D is segmented into a preceding part and a following part by the frame segmentation position S, and a second audio frame E is also segmented into a preceding part and a following part by the frame segmentation position S. The preceding part refers to sampling point values preceding the frame segmentation position S. Correspondingly, the following part refers to sampling point values following the frame segmentation position S. The preceding part of the first audio frame A and the following part of the second audio frame B are stitched, to obtain a fourth audio frame G, and then the first audio frame A and the second audio frame B may be replaced with the fourth audio frame G obtained by stitching.

[0031] In the foregoing audio data processing method, when an audio frame needs to be inserted, a part preceding a frame segmentation position in a second audio frame and a part following the frame segmentation position in a first audio frame are stitched, to obtain a third audio frame, and the third audio frame is inserted between the first audio frame and the second audio frame. After the insertion, a preceding part of the third audio frame is the preceding part of the second audio frame, and a following part of the third audio frame is the following part of the first audio frame. Because the first audio frame and the second audio frame are seamlessly stitched, the preceding part of the third audio frame and the first audio frame may be seamlessly stitched, and the following part of the third audio frame and the second audio frame may be seamlessly stitched. In addition, because the third audio frame satisfies a distance near condition at the frame segmentation position, an excessively great and sudden change does not occur at the stitched area. In this way, noise caused by skipping between audio frames when an audio frame is inserted is substantially overcome.

[0032] When an audio frame needs to be deleted, a part preceding the frame segmentation position in the first audio frame and a part following the frame segmentation position in the second audio frame are stitched, to obtain a fourth audio frame, and the first audio frame and the second audio frame are replaced with the fourth audio frame. After the replacement, a preceding part of the fourth audio frame is the preceding part of the first audio frame, and a following part of the fourth audio frame is the following part of the second audio frame. Because the first audio frame and a preceding audio frame are seamlessly stitched, and the second audio frame and a following audio frame are seamlessly stitched, the fourth audio frame and the preceding part of the first audio frame may be seamlessly stitched, and the fourth audio frame and the following audio frame of the second audio frame may be seamlessly stitched after the replacement. In addition, because the fourth audio frame satisfies a distance near condition at the frame segmentation position, an excessively great and sudden change does not occur at the stitched area. In this way, noise caused by skipping between audio frames when an audio frame is inserted is substantially overcome.

[0033] In an embodiment, the audio data processing method further includes: when performing real-time play processing on the audio data stream, reserving copies of sampling point values with a length of at least one audio frame. In addition, step 202 includes: when an instruction for inserting an audio frame is detected, obtaining a first audio frame according to a reserved copy of a sampling point value on which play processing is performed before a sampling point value on which play processing is being performed, and obtaining a second audio frame according to sampling point values with a length of one audio frame following the sampling point value on which play processing is being performed.

[0034] Play processing refers to that a sound signal is restored according to sampling point values. Reserving copies of sampling point values with a length of at least one audio frame refers to reserve a copy of at least one audio frame. Specifically, referring to FIG. 8, when the terminal performs play processing on a sampling point value A1, the terminal reserves a copy Al' of the sampling point value A1. A copy of a sampling point value on which play processing is performed before the sampling point value A1 is also reserved. A total length of the reserved copies is a length of at least one audio frame.

[0035] The terminal is performing play processing on a sampling point value B1 after a length of one audio frame, and a copy B1 of the sampling point value B1 is also reserved. In this case, the reserved copies include at least a copy A' of the audio frame A. Assuming that the terminal detects an instruction for inserting an audio frame, the terminal uses copies of sampling point values with a length of one audio frame from the sampling point value A1 to the sampling point value B1 on which play processing is performed as the first audio frame A, and uses an audio frame B with a length of one audio frame and following the sampling point value B1 as the second audio frame.

[0036] In this embodiment, by reserving a copy of at least one audio frame when real-time play processing is performed on an audio data stream, a terminal may immediately respond to an instruction for inserting an audio frame when detecting the instruction, without waiting for a time length of one audio frame, thereby improving efficiency of inserting an audio frame.

[0037] As shown in FIG. 9, in an embodiment, step 204 specifically includes the following steps:
Step 902: Obtain a candidate position, where a sampling point value at the candidate position in the first audio frame and a sampling point value at a corresponding candidate position in the second audio frame satisfy a distance near condition.

[0038] A candidate position is a selected position in an audio frame that may be used as a frame segmentation position. Specifically, the terminal may traverse all positions in an audio frame, and determine, each time a position is traversed, whether a sampling point value pair at corresponding positions in the first audio frame and the second audio frame satisfies a distance near condition. If the sampling point value pair satisfies the distance near condition, the position traversed is added to a candidate position set, and the traversal is continued. If the sampling point value pair does not satisfy the distance near condition, the traversal is continued. If the candidate position set is still empty after the traversal, a preset position (such as a middle position in an audio frame) or a position of a sampling point value pair with a smallest distance may be selected and added to the candidate position set.

[0039] The distance near condition is a quantization condition used for determining whether a distance between two sampling point values is near. In an embodiment, the distance near condition may be that a distance is equal to 0, or that a distance between two sampling point values is not equal but near. For example, the distance is less than or equal to a threshold. The threshold may be preset, or may be determined according to dynamic of sampling point values of the first audio frame and/or the second audio frame.

[0040] In an embodiment, the terminal may calculate a distance of each sampling point value pair in the first audio frame and the second audio frame and rank the distances in ascending order. Then, the terminal may add positions corresponding to a preset quantity of distances that are ranked at front to the candidate position set, or may add positions corresponding to distances that occupy a preset percentage of all calculated distances and that are obtained starting from a smallest distance. In this case, the distance near condition is that distances of sampling point value pairs corresponding to candidate positions in the first audio frame and the second audio frame are a preset quantity of distances that are ranked at front of all calculated distances that are ranked in ascending order, or are distances that are ranked at front of all calculated distances ranked in ascending order and that occupy a preset percentage of the calculated distances.

[0041] In an embodiment, the distance near condition is: A product of a first difference and a second difference is less than or equal to 0. The first difference is a difference between the sampling point value at the candidate position in the first audio frame and the sampling point value at the corresponding candidate position in the second audio frame. The second difference is a difference between a sampling point value at a position following the candidate position in the first audio frame and a sampling point value at a corresponding position in the second audio frame.

[0042] Specifically, assuming that a first audio frame A is [a1, a2, ..., am], and a second audio frame B is [b1, b2, ..., bm], a distance near condition may be expressed by using the following formula (1):

where i represents a candidate position in the first audio frame A and the second audio frame B, and may be referred to as a sampling point value sequence number, and m is a length of one audio frame; (ai-bi) is a first difference, which represents a difference between a sampling point value a; at the candidate position i in the first audio frame A and a sampling point value bi at a corresponding candidate position i in the second audio frame B; (ai+1-bi+1) is a second difference, which represents a difference between a sampling point value ai+1 at a position i+1 following the candidate position i in the first audio frame A and a sampling point value bi+1 at a corresponding position i+1 in the second audio frame B; and formula (1) represents that a product of the first difference (ai-bi) and the second difference (ai+1-bi+1) is less than or equal to 0.

[0043] The distance near condition expressed in formula (1) is used for finding an intersection of a first fitted curve constituted by sampling point values of the first audio frame and a second fitted curve constituted by sampling point values of the second audio frame. Alternatively, the intersection of the two curves may be determined in another manner. If the intersection is right a position of a sampling point value, the position is added to the candidate position set. If the intersection is not a position of any sampling point value, a position nearest to the intersection among all positions in the audio frames may be added to the candidate position set. For example, an intersection X is formed by the first fitted curve and the second fitted curve in FIG. 10, so that a position S1 or S2 nearest to the intersection X may be added to the candidate position set. There is another manner in which the intersection of the two curves is determined. For example, algebraic expressions of the two fitted curves are respectively obtained first, and then the intersection is directly determined by means of function computation. The distance near condition expressed in formula (1) is more efficient for determining an intersection.

[0044] Step 904: Obtain a sum of distances of all sampling point value pairs within a discrete position range that has a preset length and covers the candidate position in the first audio frame and the second audio frame.

[0045] The discrete position range that has a preset length and covers the candidate position includes a candidate position, and the discrete position set includes a specific quantity of discrete positions, that is, the discrete position set has a preset length. Preferentially, specific and equal quantities of discrete positions may be selected preceding and following the candidate position, to constitute the discrete position range together with the candidate position. Alternatively, discrete positions with different quantities may be selected preceding and following the candidate position, to constitute the discrete position range together with the candidate position. All positions in the discrete position set may be preferentially adjacent to each other in sequence. Alternatively and apparently, discrete positions may be selected at intervals, to constitute the discrete position range together with the candidate position.

[0046] Specifically, the terminal may select candidate positions one by one from the candidate position set, and obtain a sum of distances of all sampling point value pairs within the discrete position range that has a preset length and covers the selected candidate positions in the first audio frame and the second audio frame.

[0047] In an embodiment, the sum of distances of all sampling point value pairs within a discrete position range that has a preset length and covers the candidate positions in the first audio frame and the second audio frame may be obtained by using the following formula (2):

where n is obtained by subtracting N from the candidate position, and N may be selected from [1, (m-1)/2], may be preferentially selected from [2, (m-1)/100], and may be best to be 5. The candidate position is n+N, and the discrete position range is a discrete position range [n, ..., n+N, ..., 2N+n] that has a length of 2N+1 and that is constituted by the candidate position n+N together with N positions selected leftward and rightward from the candidate position n+N. |aj-bj| is a distance of each sampling point value pair (aj, bj) within the discrete position range in the first audio frame A and the second audio frame B. Rn is a sum of distances of all sampling point value pairs (aj, bj) within the discrete position range in the first audio frame A and the second audio frame B.

[0048] Step 906: Determine a candidate position corresponding to a smallest distance sum as a frame segmentation position.

[0049] Specifically, to select a best candidate position from the candidate position set as the frame segmentation position, distance sums of all candidate positions in the candidate position set may be respectively calculated first, and then a candidate position corresponding to a smallest distance sum may be selected as the frame segmentation position. This may be specifically expressed by using the following formula (3):

where T is a target function. By optimizing the target function T and obtaining the candidate position n that corresponds to the smallest distance sum, the frame segmentation position n+N is obtained. The determined frame segmentation position also satisfies the distance near condition: A product of a first difference and a second difference is less than or equal to 0. The first difference is a difference between a sampling point value at a frame segmentation position in the first audio frame and a sampling point value at a corresponding frame segmentation position in the second audio frame. The second difference is a difference between a sampling point value at a position following the frame segmentation position in the first audio frame and a sampling point value at a corresponding position in the second audio frame.

[0050] In step 904 to step 906, a frame segmentation position is a candidate position found nearest to an intersection of the first fitted curve and the second fitted curve. Step 904 is a specific step of obtaining a local similarity of the first audio frame and the second audio frame at the corresponding candidate position, and step 906 is a specific step of determining a frame segmentation position according to the local similarity. The local similarity at the candidate position refers to a degree to which the first fitted curve is similar to the second fitted curve within a specific range near the candidate position. A smaller local similarity calculated by using formula (2) indicates a higher degree of similarity. The more similar the first fitted curve is to the second fitted curve near the candidate position, the more similar the slopes of the two curves are, and the more steadily the third audio frame obtained by segmentation and stitching transitions, so that noise is better suppressed.

[0051] The local similarity may further be obtained by calculating cross-correlation by using cross-correlation functions. Assuming that there are two functions f(t) and g(t), the cross-correlation function is defined as R(u)=f(t)g(-t) and reflects a matching degree between the two functions at different relative positions. The cross-correlation functions may further represent similarity between two signals. When applied to this solution for calculating cross-correlation on a few points, two independent and codirectional sampling point values may have a relatively great cross-correlation, which indicates a greater similarity between the two curves. However, the determined position is not the best frame segmentation position. The disadvantage of calculating cross-correlation by using the cross-correlation functions is overcome by the local similarity obtained by using formula (2). Sampling point values at all positions play relatively the same roles in calculating cross-correlation by using formula (2). In addition, by using an absolute value of a difference as a functioning value measuring the help a sampling point value at a position brings, a difference between slopes preceding and following an intersection may be well described, and a most suitable candidate position may be found as a frame segmentation position.

[0052] In an embodiment, the audio data processing method further includes: for the first audio frame and the second audio frame adjacent to each other and obtained from the audio data stream of a designated channel when a sound effect is turned on, performing steps of obtaining sampling point values preceding a frame segmentation position in the second audio frame and sampling point values following the frame segmentation position in the first audio frame and sequentially stitching the sampling point values, to generate a third audio frame, and inserting the third audio frame between the first audio frame and the second audio frame, and performing fade-in processing on the inserted third audio frame, so that the inserted third audio frame gradually transitions from a no-sound effect state to an intact-sound effect state according to a time sequence.

[0053] Specifically, step 202, step 204 and the first half of step 206, that is, inserting an audio frame, are performed on the audio data stream of the designated channel. An instruction for turning on a sound effect is the instruction for inserting an audio frame. The turned-on sound effect is a sound effect based on channel asynchronization. By inserting one audio frame into the designated channel, the audio data stream of the designated channel is later than the remaining channels by one audio frame, so that one sound reaches ears of a person later than another sound by a time length of one audio frame, thereby generating Surround.

[0054] The no-sound effect state is a state before a sound effect is turned on, and the intact-sound effect state is a state after a sound effect is turned on. By performing fade-in processing on the third audio frame, the inserted third audio frame gradually transitions from the no-sound effect state to the intact-sound effect state according to a time sequence of sampling point values in the third audio frame, thereby achieving gentle transition of a sound effect. For example, if volume in the intact-sound effect state needs to be increased by 5 multiples, a multiple of the volume may be gradually increased until the volume is increased by 5 multiples, so that the third audio frame may be seamlessly stitched with the second audio frame in the intact-sound effect state. The gradual transition may be a linear transition or a curved transition.

[0055] In this embodiment, when a sound effect is turned off, step 202, step 204 and the second half of step 206, that is, replacing an audio frame, may be performed on the audio data stream of the designated channel, and fade-out processing may be performed on the obtained fourth audio frame, so that the obtained fourth audio frame gradually transitions from an intact-sound effect state to a no-sound effect state according to a time sequence. The fade-out processing is opposite to the fade-in processing, and is a processing process of gradually removing impact of a sound effect.

[0056] In this embodiment, two audio frames of a designated channel are replaced with one audio frame, to delete one audio frame, so that the designated channel is restored to be synchronous with another channel. A sound effect based on channel asynchronization may be quickly turned on/turned off, thereby improving sound effect switching.

[0057] In an embodiment, for the first audio frame and the second audio frame adjacent to each other and obtained from the audio data stream of a designated channel when a sound effect is turned on, the following steps may be performed: obtaining sampling point values preceding a frame segmentation position in the first audio frame and sampling point values following the frame segmentation position in the second audio frame and sequentially stitching the sampling point values, to generate a fourth audio frame, and replacing the first audio frame and the second audio frame with the fourth audio frame, and performing fade-in processing on the obtained fourth audio frame, so that the obtained fourth audio frame gradually transitions from a no-sound effect state to an intact-sound effect state according to a time sequence.

[0058] In this embodiment, when a sound effect is turned off, step 202, step 204 and the first half of step 206 may be performed on a designated channel: obtaining sampling point values preceding a frame segmentation position in the second audio frame and sampling point values following the frame segmentation position in the first audio frame and sequentially stitching the sampling point values, to generate a third audio frame, inserting the third audio frame between the first audio frame and the second audio frame, and performing fade-out processing on the inserted third audio frame, so that the inserted third audio frame gradually transitions from an intact-sound effect state to a no-sound effect state according to a time sequence. According to this embodiment, a sound effect based on channel asynchronization may also be quickly turned on/turned off, thereby improving sound effect switching.

[0059] As shown in FIG. 11, in an embodiment, an audio data processing method is provided, including the following steps:

Step 1102: When a sound effect is turned on, obtain a first audio frame and a second audio frame adjacent to each other from an audio data stream of a designated channel, the first audio frame preceding the second audio frame in time sequence.

Step 1104: Obtain a first candidate position, where a sampling point value at the first candidate position in the first audio frame and a sampling point value at a corresponding first candidate position in the second audio frame satisfy a distance near condition. The distance near condition may be: A product of a first difference and a second difference is less than or equal to 0. The first difference is a difference between the sampling point value at the candidate position in the first audio frame and the sampling point value at the corresponding candidate position in the second audio frame. The second difference is a difference between a sampling point value at a position following the candidate position in the first audio frame and a sampling point value at a corresponding position in the second audio frame.

Step 1106: Obtain a sum of distances of all sampling point value pairs within a discrete position range that has a preset length and covers the first candidate position in the first audio frame and the second audio frame.

Step 1108: Determine a first candidate position corresponding to a smallest distance sum as a first frame segmentation position.

Step 1110: Obtain sampling point values preceding the frame segmentation position in the second audio frame and sampling point values following the frame segmentation position in the first audio frame and sequentially stitch the sampling point values, to generate a third audio frame.

Step 1112: Insert the third audio frame between the first audio frame and the second audio frame.

Step 1114: Perform fade-in processing on the inserted third audio frame, so that the inserted third audio frame gradually transitions from a no-sound effect state to an intact-sound effect state according to a time sequence.

Step 1116: When a sound effect is turned off, obtain a fifth audio frame and a sixth audio frame adjacent to each other from an audio data stream of a designated channel, the fifth audio frame preceding the sixth audio frame in time sequence. The fifth audio frame is equivalent to the first audio frame in step 206 in the embodiment shown in FIG. 2 for generating the fourth audio frame, and the sixth audio frame is equivalent to the second audio frame in step 206 in the embodiment shown in FIG. 2 for generating the fourth audio frame.

Step 1118: Obtain a second candidate position, where a sampling point value at the second candidate position in the fifth audio frame and a sampling point value at a corresponding second candidate position in the sixth audio frame satisfy a distance near condition. The distance near condition may be: A product of a first difference and a second difference is less than or equal to 0. The first difference is a difference between a sampling point value at a candidate position in the fifth audio frame and a sampling point value at a corresponding candidate position in the sixth audio frame. The second difference is a difference between a sampling point value at a position following the candidate position in the fifth audio frame and a sampling point value at a corresponding position in the sixth audio frame.

Step 1120: Obtain a sum of distances of all sampling point value pairs within a discrete position range that has a preset length and covers the second candidate position in the fifth audio frame and the sixth audio frame.

Step 1122: Determine a second candidate position corresponding to a smallest distance sum as a second frame segmentation position.

Step 1124: Obtain sampling point values preceding the second frame segmentation position in the fifth audio frame and sampling point values following the second frame segmentation position in the sixth audio frame and sequentially stitch the sampling point values, to generate a fourth audio frame.

Step 1126: Replace the fifth audio frame and the sixth audio frame with the fourth audio frame.

Step 1128: Perform fade-out processing on the obtained fourth audio frame, so that the obtained fourth audio frame gradually transitions from an intact-sound effect state to a no-sound effect state according to a time sequence.



[0060] In the foregoing audio data processing method, when an audio frame needs to be inserted, a part preceding a frame segmentation position in a second audio frame and a part following the frame segmentation position in a first audio frame are stitched, to obtain a third audio frame, and the third audio frame is inserted between the first audio frame and the second audio frame. After the insertion, a preceding part of the third audio frame is the preceding part of the second audio frame, and a following part of the third audio frame is the following part of the first audio frame. Because the first audio frame and the second audio frame are seamlessly stitched, the preceding part of the third audio frame and the first audio frame may be seamlessly stitched, and the following part of the third audio frame and the second audio frame may be seamlessly stitched. In addition, because the third audio frame satisfies a distance near condition at the frame segmentation position, an excessively great and sudden change does not occur at the stitched area. In this way, noise caused by skipping between audio frames when an audio frame is inserted is substantially overcome.

[0061] When an audio frame needs to be deleted, a part preceding the frame segmentation position in the first audio frame and a part following the frame segmentation position in the second audio frame are stitched, to obtain a fourth audio frame, and the first audio frame and the second audio frame are replaced with the fourth audio frame. After the replacement, a preceding part of the fourth audio frame is the preceding part of the first audio frame, and a following part of the fourth audio frame is the following part of the second audio frame. Because the first audio frame and a preceding audio frame are seamlessly stitched, and the second audio frame and a following audio frame are seamlessly stitched, the fourth audio frame and the preceding part of the first audio frame may be seamlessly stitched, and the fourth audio frame and the following audio frame of the second audio frame may be seamlessly stitched after the replacement. In addition, because the fourth audio frame satisfies a distance near condition at the frame segmentation position, an excessively great and sudden change does not occur at the stitched area. In this way, noise caused by skipping between audio frames when an audio frame is inserted is substantially overcome.

[0062] This application further provides a terminal. An internal structure of the terminal may correspond to the structure shown in FIG. 1. Some or all of the following modules may be implemented by software, hardware or a combination thereof. As shown in FIG. 12, in an embodiment, the terminal 1200 includes an audio frame obtaining module 1201 and a frame segmentation position determining module 1202, and further includes at least one of an audio frame insertion module 1203 or an audio frame replacement module 1204.

[0063] The audio frame obtaining module 1201 is configured to obtain a first audio frame and a second audio frame adjacent to each other from an audio data stream, the first audio frame preceding the second audio frame in time sequence.

[0064] Specifically, the audio data stream includes a series of sampling point values having a time sequence. The sampling point values are obtained by sampling original analog sound signals according to a particular audio sampling rate. A series of sampling point values can describe a sound. The audio sampling rate refers to a quantity of sampling points collected within one second, with a unit being Hertz. A higher audio sampling rate indicates a higher acoustic frequency that can be described.

[0065] An audio frame includes sampling point values having a time sequence and specific quantity. According to a coding format of an audio data stream, if there is an audio frame in the coding format, the audio frame may be directly used. If there is no audio frame but a series of sampling point values having a time sequence, an audio frame may be segmented from the series of sampling point values having a time sequence according to a preset frame length. The preset frame length refers to a preset quantity of sampling point values included in one audio frame.

[0066] The first audio frame and the second audio frame obtained by the audio frame obtaining module 1201 from the audio data stream are adjacent, and the first audio frame precedes the second audio frame in time sequence. That is, when performing play processing on the audio data stream, the first audio frame is played before the second audio frame. The first audio frame and the second audio frame are two adjacent audio frames between which an audio frame needs to be inserted.

[0067] The frame segmentation position determining module 1202 is configured to determine a frame segmentation position, where a sampling point value at the frame segmentation position in the first audio frame and a sampling point value at the frame segmentation position in the second audio frame satisfy a distance near condition.

[0068] Specifically, the frame segmentation position is a position where the first audio frame and the second audio frame are segmented, and is a position relative to an audio frame. A distance refers to an absolute value of a difference between two sampling point values that are at corresponding positions in two audio frames. For example, referring to a partial distribution diagram shown in FIG. 4 of sampling point values of a first audio frame A and a partial distribution diagram shown in FIG. 5 of sampling point values of a second audio frame B. An absolute value of a difference between the first sampling point value of the first audio frame A and the first sampling point value of the second audio frame B is a distance between the first sampling point value of the first audio frame A and the first sampling point value of the second audio frame B.

[0069] The distance near condition is a quantization condition used for determining whether a distance between two sampling point values is near. In an embodiment, the distance near condition may be that a distance is equal to 0, or that a distance between two sampling point values is not equal but near. For example, the distance is less than or equal to a threshold. The threshold may be preset, or may be determined according to dynamic of sampling point values of the first audio frame and/or the second audio frame. For example, the threshold may be obtained by multiplying an average value of the sampling point values of the first audio frame and/or the second audio frame by a preset percentage.

[0070] In an embodiment, the frame segmentation position determining module 1202 may calculate a distance of each sampling point value pair of the first audio frame and the second audio frame, to select a sampling point value pair having the smallest distance. The frame segmentation position is a position corresponding to the selected sampling point value pair having the smallest distance. In this case, the distance near condition is that a distance of the sampling point value pair corresponding to the frame segmentation position in the first audio frame and the second audio frame is the smallest. The sampling point value pair herein refers to two sampling point values at same positions in the two audio frames. The positions of the sampling point values are positions of the sampling point values relative to the audio frames to which the sampling point values belong.

[0071] The audio frame insertion module 1203 is configured to obtain sampling point values preceding the frame segmentation position in the second audio frame and sampling point values following the frame segmentation position in the first audio frame and sequentially stitch the sampling point values, to generate a third audio frame, and insert the third audio frame between the first audio frame and the second audio frame.

[0072] Specifically, when an audio frame needs to be inserted, the audio frame insertion module 1203 obtains sampling point values preceding the frame segmentation position in the second audio frame, and obtains sampling point values following the frame segmentation position in the first audio frame. A quantity of the obtained sampling point values is right equal to a length of one audio frame. The sampling point values of the second audio frame and the sampling point values of the first audio frame are sequentially stitched, to generate a third audio frame. In addition, the sampling point values from the second audio frame are ranked according to a sequence in the second audio frame, and the sampling point values from the first audio frame are ranked according to a sequence in the first audio frame. Finally, the generated third audio frame is inserted between the first audio frame and the second audio frame.

[0073] The audio frame replacement module 1204 is configured to obtain sampling point values preceding the frame segmentation position in the first audio frame and sampling point values following the frame segmentation position in the second audio frame and sequentially stitch the sampling point values, to generate a fourth audio frame, and replace the first audio frame and the second audio frame with the fourth audio frame.

[0074] When an audio frame needs to be deleted, the audio frame replacement module 1204 obtains sampling point values preceding the frame segmentation position in the first audio frame, and obtains sampling point values following the frame segmentation position in the second audio frame. A quantity of the obtained sampling point values is right equal to a length of one audio frame. The sampling point values of the first audio frame and the sampling point values of the second audio frame are sequentially stitched, to obtain a fourth audio frame. In addition, the sampling point values from the first audio frame are ranked according to a sequence in the first audio frame, and the sampling point values from the second audio frame are ranked according to a sequence in the second audio frame. Finally, the first audio frame and the second audio frame are replaced with the generated fourth audio frame.

[0075] For the terminal 1200, when an audio frame needs to be inserted, a part preceding a frame segmentation position in a second audio frame and a part following the frame segmentation position in a first audio frame are stitched, to obtain a third audio frame, and the third audio frame is inserted between the first audio frame and the second audio frame. After the insertion, a preceding part of the third audio frame is the preceding part of the second audio frame, and a following part of the third audio frame is the following part of the first audio frame. Because the first audio frame and the second audio frame are seamlessly stitched, the preceding part of the third audio frame and the first audio frame may be seamlessly stitched, and the following part of the third audio frame and the second audio frame may be seamlessly stitched. In addition, because the third audio frame satisfies a distance near condition at the frame segmentation position, an excessively great and sudden change does not occur at the stitched area. In this way, noise caused by skipping between audio frames when an audio frame is inserted is substantially overcome.

[0076] When an audio frame needs to be deleted, a part preceding the frame segmentation position in the first audio frame and a part following the frame segmentation position in the second audio frame are stitched, to obtain a fourth audio frame, and the first audio frame and the second audio frame are replaced with the fourth audio frame. After the replacement, a preceding part of the fourth audio frame is the preceding part of the first audio frame, and a following part of the fourth audio frame is the following part of the second audio frame. Because the first audio frame and a preceding audio frame are seamlessly stitched, and the second audio frame and a following audio frame are seamlessly stitched, the fourth audio frame and the preceding part of the first audio frame may be seamlessly stitched, and the fourth audio frame and the following audio frame of the second audio frame may be seamlessly stitched after the replacement. In addition, because the fourth audio frame satisfies a distance near condition at the frame segmentation position, an excessively great and sudden change does not occur at the stitched area. In this way, noise caused by skipping between audio frames when an audio frame is inserted is substantially overcome.

[0077] As shown in FIG. 13, in an embodiment, the terminal 1200 further includes a copy reservation module 1205, configured to: when real-time play processing is performed on the audio data stream, reserve copies of sampling point values with a length of at least one audio frame.

[0078] The audio frame obtaining module 1201 is further configured to: when an instruction for inserting an audio frame is detected, obtain a first audio frame according to a reserved copy of a sampling point value on which play processing is performed before a sampling point value on which play processing is being performed, and obtain a second audio frame according to sampling point values with a length of one audio frame following the sampling point value on which play processing is being performed.

[0079] Play processing refers to that a sound signal is restored according to sampling point values. Reserving copies of sampling point values with a length of at least one audio frame refers to reserve a copy of at least one audio frame. Specifically, referring to FIG. 8, when play processing is performed on a sampling point value A1, the copy reservation module 1205 reserves a copy A1 of the sampling point value A1. A copy of a sampling point value on which play processing is performed before the sampling point value A1 is also reserved. A total length of the reserved copies is a length of at least one audio frame.

[0080] Play processing is being performed on a sampling point value B1 after a length of one audio frame, and the copy reservation module 1205 further reserves a copy B1 of the sampling point value B1. In this case, the reserved copies include at least a copy A' of the audio frame A. Assuming that the audio frame obtaining module 1201 detects an instruction for inserting an audio frame, the audio frame obtaining module 1201 uses copies of sampling point values with a length of one audio frame from the sampling point value A1 to the sampling point value B1 on which play processing is performed as the first audio frame A, and uses an audio frame B with a length of one audio frame and following the sampling point value B1 as the second audio frame.

[0081] In this embodiment, by reserving a copy of at least one audio frame when real-time play processing is performed on an audio data stream, a terminal may immediately respond to an instruction for inserting an audio frame when detecting the instruction, without waiting for a time length of one audio frame, thereby improving efficiency of inserting an audio frame.

[0082] As shown in FIG. 14, in an embodiment, the frame segmentation position determining module 1202 includes: a candidate position obtaining module 1202a, a similarity measurement module 1202b, and a determining module 1202c.

[0083] The candidate position obtaining module 1202a is configured to obtain a candidate position, where a sampling point value at the candidate position in the first audio frame and a sampling point value at a corresponding candidate position in the second audio frame satisfy a distance near condition. The similarity measurement module 1202b is configured to obtain a local similarity of the first audio frame and the second audio frame at the corresponding candidate position. The determining module 1202c is configured to determine a frame segmentation position according to the local similarity.

[0084] The candidate position obtaining module 1202a is configured to obtain a candidate position, where a sampling point value at the candidate position in the first audio frame and a sampling point value at a corresponding candidate position in the second audio frame satisfy a distance near condition.

[0085] A candidate position is a selected position in an audio frame that may be used as a frame segmentation position, and candidate positions are discrete. Each sampling point value corresponds to a discrete position. Specifically, the candidate position obtaining module 1202a may traverse all positions in an audio frame, and determine, each time a position is traversed, whether a sampling point value pair at corresponding positions in the first audio frame and the second audio frame satisfies a distance near condition. If the sampling point value pair satisfies the distance near condition, the candidate position obtaining module 1202a adds the position traversed to a candidate position set, and the traversal is continued. If the sampling point value pair does not satisfy the distance near condition, the traversal is continued. If the candidate position set is still empty after the traversal, the candidate position obtaining module 1202a may select and add a preset position (such as a middle position in an audio frame) or a position of a sampling point value pair with a smallest distance to the candidate position set.

[0086] The distance near condition is a quantization condition used for determining whether a distance between two sampling point values is near. In an embodiment, the distance near condition may be that a distance is equal to 0, or that a distance between two sampling point values is not equal but near. For example, the distance is less than or equal to a threshold. The threshold may be preset, or may be determined according to dynamic of sampling point values of the first audio frame and/or the second audio frame.

[0087] In an embodiment, the candidate position obtaining module 1202a may calculate a distance of each sampling point value pair in the first audio frame and the second audio frame and rank the distances in ascending order. Then, the terminal 1200 may add positions corresponding to a preset quantity of distances that are ranked at front to the candidate position set. In this case, the distance near condition is that distances of sampling point value pairs corresponding to candidate positions in the first audio frame and the second audio frame are a preset quantity of distances that are ranked at front of all calculated distances that are ranked in ascending order. Alternatively, the candidate position obtaining module 1202a may add positions corresponding to distances that occupy a preset percentage of all calculated distances and that are obtained starting from a smallest distance. In this case, the distance near condition is that distances of sampling point value pairs corresponding to candidate positions in the first audio frame and the second audio frame are distances that are ranked at front of all calculated distances ranked in ascending order and that occupy a preset percentage of the calculated distances.

[0088] In an embodiment, the distance near condition is: A product of a first difference and a second difference is less than or equal to 0. The first difference is a difference between the sampling point value at the candidate position in the first audio frame and the sampling point value at the corresponding candidate position in the second audio frame. The second difference is a difference between a sampling point value at a position following the candidate position in the first audio frame and a sampling point value at a corresponding position in the second audio frame.

[0089] Specifically, assuming that a first audio frame A is [a1, a2, ..., am], and a second audio frame B is [b1, b2, ..., bm], a distance near condition may be expressed by using the following formula (1):

where i represents a candidate position in the first audio frame A and the second audio frame B, and may be referred to as a sampling point value sequence number, and m is a length of one audio frame; (ai-bi) is a first difference, which represents a difference between a sampling point value a; at the candidate position i in the first audio frame A and a sampling point value bi at a corresponding candidate position i in the second audio frame B; (ai+1-bi+1) is a second difference, which represents a difference between a sampling point value ai+1 at a position i+1 following the candidate position i in the first audio frame A and a sampling point value bi+1 at a corresponding position i+1 in the second audio frame B; and formula (1) represents that a product of the first difference (ai-bi) and the second difference (ai+1-bi+1) is less than or equal to 0.

[0090] The distance near condition expressed in formula (1) is used for finding an intersection of a first fitted curve constituted by sampling point values of the first audio frame and a second fitted curve constituted by sampling point values of the second audio frame. Alternatively, the intersection of the two curves may be determined in another manner. If the intersection is right a position of a sampling point value, the position is added to the candidate position set. If the intersection is not a position of any sampling point value, a position nearest to the intersection among all positions in the audio frames may be added to the candidate position set. For example, an intersection X is formed by the first fitted curve and the second fitted curve in FIG. 10, so that a position S1 or S2 nearest to the intersection X may be added to the candidate position set. There is another manner in which the intersection of the two curves is determined. For example, algebraic expressions of the two fitted curves are respectively obtained first, and then the intersection is directly determined by means of function computation. The distance near condition expressed in formula (1) is more efficient for determining an intersection.

[0091] The similarity measurement module 1202b is further configured to obtain a sum of distances of all sampling point value pairs within a discrete position range that has a preset length and covers the candidate position in the first audio frame and the second audio frame.

[0092] The discrete position range that has a preset length and covers the candidate position includes a candidate position, and the discrete position set includes a specific quantity of discrete positions, that is, the discrete position set has a preset length. Specifically, the similarity measurement module 1202b may select candidate positions one by one from the candidate position set, and obtain a sum of distances of all sampling point value pairs within the discrete position range that has a preset length and covers the selected candidate positions in the first audio frame and the second audio frame.

[0093] In an embodiment, the similarity measurement module 1202b may obtain the sum of distances of all sampling point value pairs within a discrete position range that has a preset length and covers the candidate positions in the first audio frame and the second audio frame by using the following formula (2):

where n is obtained by subtracting N from the candidate position, and N may be selected from [1, (m-1)/2], may be preferentially selected from [2, (m-1)/100], and may be best to be 5. The candidate position is n+N, and the discrete position range is a discrete position range [n, ..., n+N, ..., 2N+n] that has a length of 2N+1 and that is constituted by the candidate position n+N together with N positions selected leftward and rightward from the candidate position n+N. |aj-bj| is a distance of each sampling point value pair (aj, bj) within the discrete position range in the first audio frame A and the second audio frame B. Rn is a sum of distances of all sampling point value pairs (aj, bj) within the discrete position range in the first audio frame A and the second audio frame B.

[0094] The determining module 1202c is configured to determine a candidate position corresponding to a smallest distance sum as a frame segmentation position.

[0095] The similarity measurement module 1202b is configured to obtain a local similarity of the first audio frame and the second audio frame at the corresponding candidate position, and the determining module 1202c is configured to determine a frame segmentation position according to the local similarity.

[0096] Specifically, to select a best candidate position from the candidate position set as the frame segmentation position, distance sums of all candidate positions in the candidate position set may be respectively calculated first, and then a candidate position corresponding to a smallest distance sum may be selected as the frame segmentation position. This may be specifically expressed by using the following formula (3):

where T is a target function. By optimizing the target function T and obtaining the candidate position n that corresponds to the smallest distance sum, the frame segmentation position n+N is obtained. The determined frame segmentation position also satisfies the distance near condition: A product of a first difference and a second difference is less than or equal to 0. The first difference is a difference between a sampling point value at a frame segmentation position in the first audio frame and a sampling point value at a corresponding frame segmentation position in the second audio frame. The second difference is a difference between a sampling point value at a position following the frame segmentation position in the first audio frame and a sampling point value at a corresponding position in the second audio frame.

[0097] In this embodiment, a frame segmentation position is a candidate position found nearest to an intersection of the first fitted curve and the second fitted curve. The local similarity at the candidate position refers to a degree to which the first fitted curve is similar to the second fitted curve within a specific range near the candidate position. A smaller local similarity calculated by using formula (2) indicates a higher degree of similarity. The more similar the first fitted curve is to the second fitted curve near the candidate position, the more similar the slopes of the two curves are, and the more steadily the third audio frame obtained by segmentation and stitching transitions, so that noise is better suppressed.

[0098] The local similarity may further be obtained by calculating cross-correlation by using cross-correlation functions. The cross-correlation functions may further represent similarity between two signals. When applied to this solution for calculating cross-correlation on a few points, two independent and codirectional sampling point values may have a relatively great cross-correlation, which indicates a greater similarity between the two curves. However, the determined position is not the best frame segmentation position. The disadvantage of calculating cross-correlation by using the cross-correlation functions is overcome by the local similarity obtained by using formula (2). Sampling point values at all positions play relatively the same roles in calculating cross-correlation by using formula (2). In addition, by using an absolute value of a difference as a functioning value measuring the help a sampling point value at a position brings, a difference between slopes preceding and following an intersection may be well described, and a most suitable candidate position may be found as a frame segmentation position.

[0099] In an embodiment, the audio frame insertion module 1203 is further configured to: for the first audio frame and the second audio frame adjacent to each other and obtained from the audio data stream of a designated channel when a sound effect is turned on, obtain sampling point values preceding a frame segmentation position in the second audio frame and sampling point values following the frame segmentation position in the first audio frame and sequentially stitch the sampling point values, to generate a third audio frame, and insert the third audio frame between the first audio frame and the second audio frame, and perform fade-in processing on the inserted third audio frame, so that the inserted third audio frame gradually transitions from a no-sound effect state to an intact-sound effect state according to a time sequence.

[0100] In this embodiment, the audio frame replacement module 1204 is further configured to: when a sound effect is turned off, obtain sampling point values preceding the frame segmentation position in the first audio frame and sampling point values following the frame segmentation position in the second audio frame and sequentially stitch the sampling point values, to generate a fourth audio frame, and replace the first audio frame and the second audio frame with the fourth audio frame, and perform fade-out processing on the obtained fourth audio frame, so that the obtained fourth audio frame gradually transitions from an intact-sound effect state to a no-sound effect state according to a time sequence.

[0101] In an embodiment, the audio frame replacement module 1204 is further configured to: for the first audio frame and the second audio frame adjacent to each other and obtained from the audio data stream of a designated channel when a sound effect is turned on, obtain sampling point values preceding a frame segmentation position in the first audio frame and sampling point values following the frame segmentation position in the second audio frame and sequentially stitch the sampling point values, to generate a fourth audio frame, and replace the first audio frame and the second audio frame with the fourth audio frame, and perform fade-out processing on the obtained fourth audio frame, so that the obtained fourth audio frame gradually transitions from an intact-sound effect state to a no-sound effect state according to a time sequence.

[0102] In an embodiment, the audio frame insertion module 1203 is further configured to: for the first audio frame and the second audio frame adjacent to each other and obtained from the audio data stream of a designated channel when a sound effect is turned off, obtain sampling point values preceding a frame segmentation position in the second audio frame and sampling point values following the frame segmentation position in the first audio frame and sequentially stitch the sampling point values, to generate a third audio frame, and insert the third audio frame between the first audio frame and the second audio frame, and perform fade-out processing on the third audio frame, so that the inserted third audio frame gradually transitions from an intact-sound effect state to a no-sound effect state according to a time sequence.

[0103] A person of ordinary skill in the art may understand that all or some of the procedures of the methods in the embodiments may be implemented by a computer program instructing relevant hardware. The program may be stored in a computer readable storage medium. When the program runs, the procedures of the methods in the embodiments are performed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disc, or a read-only memory (ROM), a random access memory (RAM), or the like.

[0104] The technical features of the foregoing embodiments may be arbitrarily combined. For brevity, not all possible combinations of the technical features in the foregoing embodiments are described. However, these technical features should be considered as falling within the protection scope of this specification as long as no conflict occurs.

[0105] The described embodiments are merely some embodiments of this application, which are specifically and detailedly described. However, it should not be understood as a limitation to the patent scope of the present disclosure. It should be noted that a person of ordinary skill in the art may further make some variations and improvements without departing from the concept of this application, and the variations and improvements shall fall within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the appended claims.


Claims

1. An audio data processing method, comprising:

obtaining a first audio frame and a second audio frame adjacent to each other from an audio data stream, the first audio frame preceding the second audio frame in a time sequence of the audio data stream;

determining a frame segmentation position, wherein a sampling point value at the frame segmentation position in the first audio frame and a sampling point value at the frame segmentation position in the second audio frame satisfy a distance closeness condition; and

obtaining respective sampling point value preceding the frame segmentation position in the second audio frame and respective sampling point value following the frame segmentation position in the first audio frame;

sequentially stitching the respective sampling point value obtained from the second audio frame and the respective sampling point value obtained from the first audio frame, to generate a third audio frame; and

inserting the third audio frame between the first audio frame and the second audio frame.


 
2. The method according to claim 1, further comprising:

when performing a real-time play processing on the audio data stream, reserving copies of sampling point values with a length of at least one audio frame, wherein

obtaining the first audio frame and the second audio frame adjacent to each other from the audio data stream comprises:
when an instruction for inserting an audio frame is detected, obtaining, as the first audio frame, a reserved copy of sampling point values on which play processing was performed before a sampling point value on which play processing is currently being performed, and obtaining, as the second audio frame, sampling point values with a length of one audio frame following the sampling point value on which play processing is currently being performed.


 
3. The method according to claim 1, wherein determining the frame segmentation position comprises:

obtaining a plurality of candidate positions, wherein, for each of the plurality of candidate positions, a sampling point value at the candidate position in the first audio frame and a sampling point value at a corresponding candidate position in the second audio frame satisfy the distance closeness condition;

for each of the plurality of candidate positions, obtaining a respective local similarity of the first audio frame and the second audio frame at the candidate position; and

determining the frame segmentation position according to the respective local similarity.


 
4. The method according to claim 1, wherein determining the frame segmentation position comprises:

obtaining a plurality of candidate positions, wherein, for each of the at least one candidate position, a sampling point value at the candidate position in the first audio frame and a sampling point value at a corresponding candidate position in the second audio frame satisfy the distance closeness condition;

for each of the plurality of candidate positions, obtaining a respective sum of distances of all sampling point value pairs within a discrete position range that has a preset length and covers the candidate position in the first audio frame and the second audio frame; and

determining a candidate position corresponding to a smallest distance sum as the frame segmentation position.


 
5. The method according to claim 4, wherein the distance closeness condition comprises:

a first requirement that a product of a first difference and a second difference is less than or equal to 0, wherein

the first difference is a difference between the sampling point value at the candidate position in the first audio frame and the sampling point value at the corresponding candidate position in the second audio frame; and

the second difference is a difference between a sampling point value at a position following the candidate position in the first audio frame and a sampling point value at a corresponding position in the second audio frame.


 
6. The method according to claim 1, further comprising:
for the first audio frame and the second audio frame adjacent to each other and obtained from the audio data stream of a designated channel, when a sound effect is turned on, obtaining respective sampling point value preceding the frame segmentation position in the second audio frame and respective sampling point value following the frame segmentation position in the first audio frame, sequentially stitching the respective sampling point value, to generate the third audio frame, and inserting the third audio frame between the first audio frame and the second audio frame, and performing a fade-in processing on the inserted third audio frame, so that the inserted third audio frame gradually transitions from a no-sound effect state to an intact-sound effect state according to a time sequence.
 
7. An audio data processing method, comprising:

obtaining a first audio frame and a second audio frame adjacent to each other from an audio data stream, the first audio frame preceding the second audio frame in a time sequence of the audio data stream;

determining a frame segmentation position, wherein a sampling point value at the frame segmentation position in the first audio frame and a sampling point value at the frame segmentation position in the second audio frame satisfy a distance closeness condition;

obtaining respective sampling point value preceding the frame segmentation position in the first audio frame and respective sampling point value following the frame segmentation position in the second audio frame, sequentially stitching the respective sampling point value, to generate a fourth audio frame; and

replacing the first audio frame and the second audio frame with the fourth audio frame.


 
8. The method according to claim 7, further comprising:

when performing a real-time play processing on the audio data stream, reserving copies of sampling point values with a length of at least one audio frame, and

obtaining the first audio frame and the second audio frame adjacent to each other from the audio data stream comprises:
when an instruction for inserting an audio frame is detected, obtaining, as the first audio frame, a reserved copy of sampling point values on which play processing was performed before a sampling point value on which play processing is currently being performed, and obtaining, as the second audio frame, sampling point values with a length of one audio frame following the sampling point value on which the play processing is currently being performed.


 
9. The method according to claim 7, wherein determining the frame segmentation position comprises:

obtaining a plurality of candidate positions, wherein, for each of the plurality of candidate positions, a sampling point value at the candidate position in the first audio frame and a sampling point value at a corresponding candidate position in the second audio frame satisfy the distance closeness condition;

for each of the plurality of candidate positions, obtaining a respective local similarity of the first audio frame and the second audio frame at the corresponding candidate position; and determining the frame segmentation position according to the respective local similarity.


 
10. The method according to claim 7, wherein determining the frame segmentation position comprises:

obtaining a plurality of candidate positions, wherein, for each of the plurality of candidate positions, a sampling point value at the candidate position in the first audio frame and a sampling point value at a corresponding candidate position in the second audio frame satisfy a distance closeness condition;

for each of the plurality of candidate positions, obtaining a respective sum of distances of all sampling point value pairs within a discrete position range that has a preset length and covers the candidate position in the first audio frame and the second audio frame; and

determining a candidate position corresponding to a smallest distance sum as the frame segmentation position.


 
11. The method according to claim 10, wherein the distance closeness condition comprises:

a first requirement that a product of a first difference and a second difference is less than or equal to 0, wherein

the first difference is a difference between the sampling point value at the candidate position in the first audio frame and the sampling point value at the corresponding candidate position in the second audio frame; and

the second difference is a difference between a sampling point value at a position following the candidate position in the first audio frame and a sampling point value at a corresponding position in the second audio frame.


 
12. The method according to claim 7, further comprising:
for the first audio frame and the second audio frame adjacent to each other and obtained from the audio data stream of a designated channel, when a sound effect is turned on, obtaining respective sampling point value preceding the frame segmentation position in the first audio frame and respective sampling point value following the frame segmentation position in the second audio frame, sequentially stitching the respective sampling point value, to generate the fourth audio frame, replacing the first audio frame and the second audio frame with the fourth audio frame, and performing a fade-in processing on the obtained fourth audio frame, so that the obtained fourth audio frame gradually transitions from a no-sound effect state to an intact-sound effect state according to a time sequence.
 
13. A terminal, comprising a memory and a processor, wherein computer readable instructions are stored in the memory, the computer readable instructions, when executed by the processor, causing the processor to perform operations comprising:

obtaining a first audio frame and a second audio frame adjacent to each other from an audio data stream, the first audio frame preceding the second audio frame in a time sequence of the audio data stream;

determining a frame segmentation position, wherein a sampling point value at the frame segmentation position in the first audio frame and a sampling point value at the frame segmentation position in the second audio frame satisfy a distance closeness condition; and

obtaining respective sampling point value preceding the frame segmentation position in the second audio frame and respective sampling point value following the frame segmentation position in the first audio frame, sequentially stitching the respective sampling point value, to generate a third audio frame, and inserting the third audio frame between the first audio frame and the second audio frame.


 
14. The terminal according to claim 13, wherein the computer readable instructions, when executed by the processor, further cause the processor to perform operations comprising:

when performing a real-time play processing on the audio data stream, reserving copies of sampling point values with a length of at least one audio frame, and

obtaining the first audio frame and the second audio frame adjacent to each other from the audio data stream comprises:
when an instruction for inserting an audio frame is detected, obtaining, as the first audio frame, a reserved copy of sampling point values on which play processing was performed before a sampling point value on which play processing is currently being performed, obtaining, as the second audio frame, sampling point values with a length of one audio frame following the sampling point value on which play processing is currently being performed.


 
15. A terminal, comprising a memory and a processor, wherein computer readable instructions are stored in the memory, the computer readable instructions, when executed by the processor, causing the processor to perform operations comprising:

obtaining a first audio frame and a second audio frame adjacent to each other from an audio data stream, the first audio frame preceding the second audio frame in a time sequence of the audio data stream;

determining a frame segmentation position, wherein a sampling point value at the frame segmentation position in the first audio frame and a sampling point value at the frame segmentation position in the second audio frame satisfy a distance closeness condition; and

obtaining respective sampling point value preceding the frame segmentation position in the first audio frame and respective sampling point value following the frame segmentation position in the second audio frame, sequentially stitching the respective sampling point value, to generate a fourth audio frame, and replacing the first audio frame and the second audio frame with the fourth audio frame.


 


Ansprüche

1. Audiodatenverarbeitungsverfahren, umfassend:

Erhalten eines ersten Audio-Frames und eines zweiten Audio-Frames, die nebeneinander liegen, aus einem Audiodatenstrom, wobei der erste Audio-Frame dem zweiten Audio-Frame in einer Zeitsequenz des Audiodatenstroms vorausgeht;

Bestimmen einer Rahmensegmentierungsposition, wobei ein Abtastpunktwert an der Rahmensegmentierungsposition in dem ersten Audio-Frame und ein Abtastpunktwert an der Rahmensegmentierungsposition in dem zweiten Audio-Frame eine Entfernungsnähebedingung erfüllen; und

Erhalten eines jeweiligen Abtastpunktwerts vor der Rahmensegmentierungsposition in dem zweiten Audio-Frame und eines jeweiligen Abtastpunktwerts nach der Rahmensegmentierungsposition in dem ersten Audio-Frame;

sequentielles Zusammenfügen des jeweiligen Abtastpunktwerts, der aus dem zweiten Audio-Frame erhalten wurde, und des jeweiligen Abtastpunktwerts, der aus dem ersten Audio-Frame erhalten wurde, um einen dritten Audio-Frame zu erzeugen; und

Einfügen des dritten Audio-Frames zwischen dem ersten Audio-Frame und dem zweiten Audio-Frame.


 
2. Verfahren nach Anspruch 1, ferner umfassend:

Reservieren von Kopien von Abtastpunktwerten mit einer Länge von wenigstens einem Audio-Frame, wenn eine Echtzeit-Abspielverarbeitung des Audiodatenstroms ausgeführt wird, wobei

Erhalten des ersten Audio-Frames und des zweiten Audio-Frames, die nebeneinanderliegen, aus dem Audiodatenstrom Folgendes umfasst:
wenn eine Anweisung zum Einfügen eines Audio-Frames erkannt wird, Erhalten als erster Audio-Frame einer reservierten Kopie von Abtastpunktwerten, an denen eine Abspielverarbeitung vor einem Abtastpunktwert, an dem eine Abspielverarbeitung gerade ausgeführt wird, ausgeführt wurde, und Erhalten von Abtastpunktwerten mit einer Länge von einem Audio-Frame als dem zweiten Audio-Frame nach dem Abtastpunktwert, an dem die Abspielverarbeitung gerade ausgeführt wird.


 
3. Verfahren nach Anspruch 1, wobei das Bestimmen der Rahmensegmentierungsposition Folgendes umfasst:

Erhalten mehrerer Kandidatenpositionen, wobei für jede der mehreren Kandidatenpositionen ein Abtastpunktwert an der Kandidatenposition in dem ersten Audio-Frame und ein Abtastpunktwert an einer entsprechenden Kandidatenposition in dem zweiten Audio-Frame die

Entfernungsnähebedingung erfüllen;

Erhalten einer jeweiligen lokalen Ähnlichkeit des ersten Audio-Frames und des zweiten Audio-Frames an der Kandidatenposition für jede der mehreren Kandidatenpositionen; und

Bestimmen der Rahmensegmentierungsposition nach der jeweiligen lokalen Ähnlichkeit.


 
4. Verfahren nach Anspruch 1, wobei das Bestimmen der Rahmensegmentierungsposition Folgendes umfasst:

Erhalten mehrerer Kandidatenpositionen, wobei für jede der wenigstens einen Kandidatenposition ein Abtastpunktwert an der Kandidatenposition in dem ersten Audio-Frame und ein Abtastpunktwert an einer entsprechenden Kandidatenposition in dem zweiten Audio-Frame die Entfernungsnähebedingung erfüllen;

Erhalten einer jeweiligen Summe von Abständen aller Abtastpunktwertepaare innerhalb eines diskreten Positionsbereichs, der eine voreingestellte Länge aufweist und die Kandidatenposition in dem ersten Audio-Frame und dem zweiten Audio-Frame abdeckt, für jede der mehreren Kandidatenpositionen; und

Bestimmen einer Kandidatenposition, die einer kleinsten Abstandssumme entspricht, als die Rahmensegmentierungsposition.


 
5. Verfahren nach Anspruch 4, wobei die Entfernungsnähebedingung Folgendes umfasst:

eine erste Anforderung, dass ein Produkt aus einer ersten Differenz und einer zweiten Differenz kleiner oder gleich 0 ist, wobei

die erste Differenz eine Differenz zwischen dem Abtastpunktwert an der Kandidatenposition in dem ersten Audio-Frame und dem Abtastpunktwert an der entsprechenden Kandidatenposition in dem zweiten Audio-Frame ist; und

die zweite Differenz eine Differenz zwischen einem Abtastpunktwert an einer Position nach der Kandidatenposition in dem ersten Audio-Frame und einem Abtastpunktwert an einer entsprechenden Position in dem zweiten Audio-Frame ist.


 
6. Verfahren nach Anspruch 1, ferner umfassend:
für den ersten Audio-Frame und den zweiten Audio-Frame, die nebeneinander liegen und aus dem Audiodatenstrom eines bestimmten Kanals erhalten werden, wenn ein Klangeffekt eingeschaltet wird, Erhalten des jeweiligen Abtastpunktwerts vor der Rahmensegmentierungsposition in dem zweiten Audio-Frame und des jeweiligen Abtastpunktwertes nach der Bildsegmentierungsposition in dem ersten Audio-Frame, sequentielles Zusammenfügen des jeweiligen Abtastpunktwerts, um den dritten Audio-Frame zu erzeugen, und Einfügen des dritten Audio-Frames zwischen den ersten Audio-Frame und den zweiten Audio-Frame und Ausführen einer Einblendverarbeitung auf dem eingefügten dritten Audio-Frame, so dass der eingefügte dritte Audio-Frame nach einer Zeitsequenz allmählich von einem Zustand ohne Klangeffekt in einen Zustand mit intaktem Klangeffekt übergeht.
 
7. Audiodatenverarbeitungsverfahren, umfassend:

Erhalten eines ersten Audio-Frames und eines zweiten Audio-Frames, die nebeneinander liegen, aus einem Audiodatenstrom, wobei der erste Audio-Frame dem zweiten Audio-Frame in einer Zeitsequenz des Audiodatenstroms vorausgeht;

Bestimmen einer Rahmensegmentierungsposition, wobei ein Abtastpunktwert an der Rahmensegmentierungsposition in dem ersten Audio-Frame und ein Abtastpunktwert an der Rahmensegmentierungsposition in dem zweiten Audio-Frame eine Entfernungsnähebedingung erfüllen;

Erhalten eines jeweiligen Abtastpunktwerts vor der Rahmensegmentierungsposition in dem ersten Audio-Frame und eines jeweiligen Abtastpunktwerts nach der Rahmensegmentierungsposition in dem zweiten Audio-Frame, sequentielles Zusammenfügen des jeweiligen Abtastpunktwerts, um einen vierten Audio-Frame zu erzeugen; und

Ersetzen des ersten Audio-Frames und des zweiten Audio-Frames durch den vierten Audio-Frame.


 
8. Verfahren nach Anspruch 7, ferner umfassend:

Reservieren von Kopien von Abtastpunktwerten mit einer Länge von wenigstens einem Audio-Frame, wenn eine Echtzeit-Abspielverarbeitung des Audiodatenstroms ausgeführt wird, und

Erhalten des ersten Audio-Frames und des zweiten Audio-Frames, die nebeneinanderliegen, aus dem Audiodatenstrom Folgendes umfasst:
wenn eine Anweisung zum Einfügen eines Audio-Frames erkannt wird, Erhalten als erster Audio-Frame einer reservierten Kopie von Abtastpunktwerten, an denen eine Abspielverarbeitung vor einem Abtastpunktwert, an dem eine Abspielverarbeitung gerade ausgeführt wird, ausgeführt wurde, und Erhalten von Abtastpunktwerten mit einer Länge von einem Audio-Frame als dem zweiten Audio-Frame nach dem Abtastpunktwert, an dem die Abspielverarbeitung gerade ausgeführt wird.


 
9. Verfahren nach Anspruch 7, wobei das Bestimmen der Rahmensegmentierungsposition Folgendes umfasst:

Erhalten mehrerer Kandidatenpositionen, wobei für jede der mehreren Kandidatenpositionen ein Abtastpunktwert an der Kandidatenposition in dem ersten Audio-Frame und ein Abtastpunktwert an einer entsprechenden Kandidatenposition in dem zweiten Audio-Frame die Entfernungsnähebedingung erfüllen;

Erhalten einer jeweiligen lokalen Ähnlichkeit des ersten Audio-Frames und des zweiten Audio-Frames an der entsprechenden Kandidatenposition für jede der mehreren Kandidatenpositionen; und Bestimmen der Rahmensegmentierungsposition nach der jeweiligen lokalen Ähnlichkeit.


 
10. Verfahren nach Anspruch 7, wobei das Bestimmen der Rahmensegmentierungsposition Folgendes umfasst:

Erhalten mehrerer Kandidatenpositionen, wobei für jede der mehreren Kandidatenpositionen ein Abtastpunktwert an der Kandidatenposition in dem ersten Audio-Frame und ein Abtastpunktwert an einer entsprechenden Kandidatenposition in dem zweiten Audio-Frame eine Entfernungsnähebedingung erfüllen;

Erhalten einer jeweiligen Summe von Abständen aller Abtastpunktwertepaare innerhalb eines diskreten Positionsbereichs, der eine voreingestellte Länge aufweist und die Kandidatenposition in dem ersten Audio-Frame und dem zweiten Audio-Frame abdeckt, für jede der mehreren Kandidatenpositionen; und

Bestimmen einer Kandidatenposition, die einer kleinsten Abstandssumme entspricht, als die Rahmensegmentierungsposition.


 
11. Verfahren nach Anspruch 10, wobei die Entfernungsnähebedingung Folgendes umfasst:

eine erste Anforderung, dass ein Produkt aus einer ersten Differenz und einer zweiten Differenz kleiner oder gleich 0 ist, wobei

die erste Differenz eine Differenz zwischen dem Abtastpunktwert an der Kandidatenposition in dem ersten Audio-Frame und dem Abtastpunktwert an der entsprechenden Kandidatenposition in dem zweiten Audio-Frame ist; und Die zweite Differenz eine Differenz zwischen einem Abtastpunktwert an einer Position nach der Kandidatenposition in dem ersten Audio-Frame und einem Abtastpunktwert an einer entsprechenden Position in dem zweiten Audio-Frame ist.


 
12. Verfahren nach Anspruch 7, ferner umfassend:
für den ersten Audio-Frame und den zweiten Audio-Frame, die nebeneinander liegen und aus dem Audiodatenstrom eines bestimmten Kanals erhalten werden, wenn ein Klangeffekt eingeschaltet wird, Erhalten des jeweiligen Abtastpunktwerts vor der Rahmensegmentierungsposition in dem ersten Audio-Frame und des jeweiligen Abtastpunktwertes nach der Bildsegmentierungsposition in dem zweiten Audio-Frame, sequentielles Zusammenfügen des jeweiligen Abtastpunktwerts, um den vierten Audio-Frame zu erzeugen, Ersetzen des ersten Audio-Frames und des zweiten Audio-Frames durch den vierten Audio-Frame und Ausführen einer Einblendverarbeitung auf dem erhaltenen vierten Audio-Frames, so dass der erhaltene vierte Audio-Frame nach einer Zeitsequenz allmählich von einem Zustand ohne Klangeffekt in einen Zustand mit intaktem Klangeffekt übergeht.
 
13. Terminal, umfassend einen Speicher und einen Prozessor, wobei computerlesbare Anweisungen in dem Speicher gespeichert werden, wobei die computerlesbaren Anweisungen, wenn sie von dem Prozessor ausgeführt werden, bewirken, dass der Prozessor Operationen auszuführt, umfassend:

Erhalten eines ersten Audio-Frames und eines zweiten Audio-Frames, die nebeneinander liegen, aus einem Audiodatenstrom, wobei der erste Audio-Frame dem zweiten Audio-Frame in einer Zeitsequenz des Audiodatenstroms vorausgeht;

Bestimmen einer Rahmensegmentierungsposition, wobei ein Abtastpunktwert an der Rahmensegmentierungsposition in dem ersten Audio-Frame und ein Abtastpunktwert an der Rahmensegmentierungsposition in dem zweiten Audio-Frame eine Entfernungsnähebedingung erfüllen; und

Erhalten eines jeweiligen Abtastpunktwerts vor der Rahmensegmentierungsposition in dem zweiten Audio-Frame und eines jeweiligen Abtastpunktwerts nach der Rahmensegmentierungsposition in dem ersten Audio-Frame, sequentielles Zusammenfügen des jeweiligen Abtastpunktwerts, um einen dritten Audio-Frame zu erzeugen, und Einfügen des dritten Audio-Frames zwischen dem ersten Audio-Frame und dem zweiten Audio-Frame.


 
14. Terminal nach Anspruch 13, wobei die computerlesbaren Anweisungen, wenn sie vom Prozessor ausgeführt werden, bewirken, dass der Prozessor Operationen auszuführt, die Folgendes umfassen:

Reservieren von Kopien von Abtastpunktwerten mit einer Länge von wenigstens einem Audio-Frame, wenn eine Echtzeit-Abspielverarbeitung des Audiodatenstroms ausgeführt wird, und

Erhalten des ersten Audio-Frames und des zweiten Audio-Frames, die nebeneinanderliegen, aus dem Audiodatenstrom Folgendes umfasst:
wenn eine Anweisung zum Einfügen eines Audio-Frames erkannt wird, Erhalten als erster Audio-Frame einer reservierten Kopie von Abtastpunktwerten, an denen eine Abspielverarbeitung vor einem Abtastpunktwert, an dem eine Abspielverarbeitung gerade ausgeführt wird, ausgeführt wurde, Erhalten von Abtastpunktwerten mit einer Länge von einem Audio-Frame als dem zweiten Audio-Frame nach dem Abtastpunktwert, an dem die Abspielverarbeitung gerade ausgeführt wird.


 
15. Terminal, umfassend einen Speicher und einen Prozessor, wobei computerlesbare Anweisungen in dem Speicher gespeichert werden, wobei die computerlesbaren Anweisungen, wenn sie von dem Prozessor ausgeführt werden, bewirken, dass der Prozessor Operationen auszuführt, umfassend:

Erhalten eines ersten Audio-Frames und eines zweiten Audio-Frames, die nebeneinander liegen, aus einem Audiodatenstrom, wobei der erste Audio-Frame dem zweiten Audio-Frame in einer Zeitsequenz des Audiodatenstroms vorausgeht;

Bestimmen einer Rahmensegmentierungsposition, wobei ein Abtastpunktwert an der Rahmensegmentierungsposition in dem ersten Audio-Frame und ein Abtastpunktwert an der Rahmensegmentierungsposition in dem zweiten Audio-Frame eine Entfernungsnähebedingung erfüllen; und

Erhalten eines jeweiligen Abtastpunktwerts vor der Rahmensegmentierungsposition in dem ersten Audio-Frame und eines jeweiligen Abtastpunktwerts nach der Rahmensegmentierungsposition in dem zweiten Audio-Frame, sequentielles Zusammenfügen des jeweiligen Abtastpunktwerts, um einen vierten Audio-Frame zu erzeugen, und Ersetzen des ersten Audio-Frames und des zweiten Audio-Frames mit dem vierten Audio-Frame.


 


Revendications

1. Procédé de traitement de données audio, comprenant
l'obtention d'une première trame audio et d'une deuxième trame audio adjacentes l'une à l'autre à partir d'un flux de données audio, la première trame audio précédant la deuxième trame audio dans une séquence temporelle du flux de données audio ;
la détermination d'une position de segmentation de trame, une valeur de point d'échantillonnage à la position de segmentation de trame dans la première trame audio et une valeur de point d'échantillonnage à la position de segmentation de trame dans la deuxième trame audio satisfaisant une condition de proximité en distance ; et
l'obtention de valeur de point d'échantillonnage respective précédant la position de segmentation de trame dans la deuxième trame audio et de valeur de point d'échantillonnage respective suivant la position de segmentation de trame dans la première trame audio ;
le raccordement séquentiel de la valeur de point d'échantillonnage respective obtenue à partir de la deuxième trame audio et de la valeur de point d'échantillonnage respective obtenue à partir de la première trame audio, afin de générer une troisième trame audio ; et
l'insertion de la troisième trame audio entre la première trame audio et la deuxième trame audio.
 
2. Procédé selon la revendication 1, comprenant en outre :

lors de la réalisation d'un traitement de lecture en temps réel sur le flux de données audio, la réservation de copies de valeurs de point d'échantillonnage avec une longueur d'au moins une trame audio, dans lequel

l'obtention de la première trame audio et de la deuxième trame audio adjacentes l'une à l'autre à partir du flux de données audio comprend :
lorsqu'une instruction d'insertion d'une trame audio est détectée, l'obtention, comme première trame audio, d'une copie réservée de valeurs de point d'échantillonnage sur lesquelles un traitement de lecture a été effectué avant une valeur de point d'échantillonnage sur laquelle un traitement de lecture est actuellement en train d'être effectué, et l'obtention, comme deuxième trame audio, de valeurs de point d'échantillonnage avec une longueur d'une trame audio suivant la valeur de point d'échantillonnage sur laquelle un traitement de lecture est actuellement en train d'être effectué.


 
3. Procédé selon la revendication 1, dans lequel la détermination de la position de segmentation de trame comprend :

l'obtention d'une pluralité de positions candidates, une valeur de point d'échantillonnage à la position candidate dans la première trame audio et une valeur de point d'échantillonnage à une position candidate correspondante dans la deuxième trame audio satisfaisant la condition de proximité en distance pour chaque position candidate de la pluralité de positions candidates ;

pour chaque position candidate de la pluralité de positions candidates, l'obtention d'une similarité locale respective de la première trame audio et de la deuxième trame audio à la position candidate ; et

la détermination de la position de segmentation de trame en fonction de la similarité locale respective.


 
4. Procédé selon la revendication 1, dans lequel la détermination de la position de segmentation de trame comprend :

l'obtention d'une pluralité de positions candidates, une valeur de point d'échantillonnage à la position candidate dans la première trame audio et une valeur de point d'échantillonnage à une position candidate correspondante dans la deuxième trame audio satisfaisant la condition de proximité en distance pour chaque position candidate de l'au moins une position candidate ;

pour chaque position candidate de la pluralité de positions candidates, l'obtention d'une somme respective de distances de toutes les paires de valeurs de point d'échantillonnage à l'intérieur d'une plage de positions discrètes qui a une longueur prédéfinie et couvre la position candidate dans la première trame audio et la deuxième trame audio ; et

la détermination d'une position candidate correspondant à une plus petite somme de distances comme position de segmentation de trame.


 
5. Procédé selon la revendication 4, dans lequel la condition de proximité en distance comprend :

une première exigence qu'un produit d'une première différence et d'une deuxième différence soit inférieur ou égal à 0, dans lequel

la première différence est une différence entre la valeur de point d'échantillonnage à la position candidate dans la première trame audio et la valeur de point d'échantillonnage à la position candidate correspondante dans la deuxième trame audio ; et

la deuxième différence est une différence entre une valeur de point d'échantillonnage à une position suivant la position candidate dans la première trame audio et une valeur de point d'échantillonnage à une position correspondante dans la deuxième trame audio.


 
6. Procédé selon la revendication 1, comprenant en outre :
pour la première trame audio et la deuxième trame audio adjacentes l'une à l'autre et obtenues à partir du flux de données audio d'un canal désigné, lorsqu'un effet sonore est activé, l'obtention de valeur de point d'échantillonnage respective précédant la position de segmentation de trame dans la deuxième trame audio et de valeur de point d'échantillonnage respective suivant la position de segmentation de trame dans la première trame audio, le raccordement séquentiel de la valeur de point d'échantillonnage respective, afin de générer la troisième trame audio, et l'insertion de la troisième trame audio entre la première trame audio et la deuxième trame audio, et la réalisation d'un traitement d'ouverture en fondu sur la troisième trame audio insérée, de manière que la troisième trame audio insérée passe progressivement d'un état sans effet sonore à un état d'effet sonore entier selon une séquence temporelle.
 
7. Procédé de traitement de données audio, comprenant :

l'obtention d'une première trame audio et d'une deuxième trame audio adjacentes l'une à l'autre à partir d'un flux de données audio, la première trame audio précédant la deuxième trame audio dans une séquence temporelle du flux de données audio ;

la détermination d'une position de segmentation de trame, une valeur de point d'échantillonnage à la position de segmentation de trame dans la première trame audio et une valeur de point d'échantillonnage à la position de segmentation de trame dans la deuxième trame audio satisfaisant une condition de proximité en distance ;

l'obtention de valeur de point d'échantillonnage respective précédant la position de segmentation de trame dans la première trame audio et de valeur de point d'échantillonnage respective suivant la position de segmentation de trame dans la deuxième trame audio, le raccordement séquentiel de la valeur de point d'échantillonnage respective, afin de générer une quatrième trame audio ; et

le remplacement de la première trame audio et de la deuxième trame audio par la quatrième trame audio.


 
8. Procédé selon la revendication 7, comprenant en outre :

lors de la réalisation d'un traitement de lecture en temps réel sur le flux de données audio, la réservation de copies de valeurs de point d'échantillonnage avec une longueur d'au moins une trame audio, et

l'obtention de la première trame audio et de la deuxième trame audio adjacentes l'une à l'autre à partir du flux de données audio comprend :
lorsqu'une instruction d'insertion d'une trame audio est détectée, l'obtention, comme première trame audio, d'une copie réservée de valeurs de point d'échantillonnage sur lesquelles un traitement de lecture a été effectué avant une valeur de point d'échantillonnage sur laquelle un traitement de lecture est actuellement en train d'être effectué, et l'obtention, comme deuxième trame audio, de valeurs de point d'échantillonnage avec une longueur d'une trame audio suivant la valeur de point d'échantillonnage sur laquelle un traitement de lecture est actuellement en train d'être effectué.


 
9. Procédé selon la revendication 7, dans lequel la détermination de la position de segmentation de trame comprend :

l'obtention d'une pluralité de positions candidates, une valeur de point d'échantillonnage à la position candidate dans la première trame audio et une valeur de point d'échantillonnage à une position candidate correspondante dans la deuxième trame audio satisfaisant la condition de proximité en distance pour chaque position candidate de la pluralité de positions candidates ;

pour chaque position candidate de la pluralité de positions candidates, l'obtention d'une similarité locale respective de la première trame audio et de la deuxième trame audio à la position candidate ;

et la détermination de la position de segmentation de trame en fonction de la similarité locale respective.


 
10. Procédé selon la revendication 7, dans lequel la détermination de la position de segmentation de trame comprend :

l'obtention d'une pluralité de positions candidates, une valeur de point d'échantillonnage à la position candidate dans la première trame audio et une valeur de point d'échantillonnage à une position candidate correspondante dans la deuxième trame audio satisfaisant une condition de proximité en distance pour chaque position candidate de la pluralité de positions candidates ;

pour chaque position candidate de la pluralité de positions candidates, l'obtention d'une somme respective de distances de toutes les paires de valeurs de point d'échantillonnage à l'intérieur d'une plage de positions discrètes qui a une longueur prédéfinie et couvre la position candidate dans la première trame audio et la deuxième trame audio ; et

la détermination d'une position candidate correspondant à une plus petite somme de distances comme position de segmentation de trame.


 
11. Procédé selon la revendication 10, dans lequel la condition de proximité en distance comprend :

une première exigence qu'un produit d'une première différence et d'une deuxième différence soit inférieur ou égal à 0, dans lequel

la première différence est une différence entre la valeur de point d'échantillonnage à la position candidate dans la première trame audio et la valeur de point d'échantillonnage à la position candidate correspondante dans la deuxième trame audio ; et

la deuxième différence est une différence entre une valeur de point d'échantillonnage à une position suivant la position candidate dans la première trame audio et une valeur de point d'échantillonnage à une position correspondante dans la deuxième trame audio.


 
12. Procédé selon la revendication 7, comprenant en outre :
pour la première trame audio et la deuxième trame audio adjacentes l'une à l'autre et obtenues à partir du flux de données audio d'un canal désigné, lorsqu'un effet sonore est activé, l'obtention de valeur de point d'échantillonnage respective précédant la position de segmentation de trame dans la première trame audio et de valeur de point d'échantillonnage respective suivant la position de segmentation de trame dans la deuxième trame audio, le raccordement séquentiel de la valeur de point d'échantillonnage respective, afin de générer la quatrième trame audio, le remplacement de la première trame audio et de la deuxième trame audio par la quatrième trame audio, et la réalisation d'un traitement d'ouverture en fondu sur la quatrième trame audio obtenue, de manière que la quatrième trame audio obtenue passe progressivement d'un état sans effet sonore à un état d'effet sonore entier selon une séquence temporelle.
 
13. Terminal, comprenant une mémoire et un processeur, dans lequel des instructions lisibles par ordinateur sont stockées dans la mémoire, les instructions lisibles par ordinateur, lorsqu'elles sont exécutées par le processeur, amenant le processeur à effectuer des opérations comprenant :

l'obtention d'une première trame audio et d'une deuxième trame audio adjacentes l'une à l'autre à partir d'un flux de données audio, la première trame audio précédant la deuxième trame audio dans une séquence temporelle du flux de données audio ;

la détermination d'une position de segmentation de trame, une valeur de point d'échantillonnage à la position de segmentation de trame dans la première trame audio et une valeur de point d'échantillonnage à la position de segmentation de trame dans la deuxième trame audio satisfaisant une condition de proximité en distance ; et

l'obtention de valeur de point d'échantillonnage respective précédant la position de segmentation de trame dans la deuxième trame audio et de valeur de point d'échantillonnage respective suivant la position de segmentation de trame dans la première trame audio, le raccordement séquentiel de la valeur de point d'échantillonnage respective, afin de générer une troisième trame audio, et l'insertion de la troisième trame audio entre la première trame audio et la deuxième trame audio.


 
14. Terminal selon la revendication 13, dans lequel les instructions lisibles par ordinateur, lorsqu'elles sont exécutées par le processeur, amènent en outre le processeur à effectuer des opérations comprenant :

lors de la réalisation d'un traitement de lecture en temps réel sur le flux de données audio, la réservation de copies de valeurs de point d'échantillonnage avec une longueur d'au moins une trame audio, et

l'obtention de la première trame audio et de la deuxième trame audio adjacentes l'une à l'autre à partir du flux de données audio comprend :
lorsqu'une instruction d'insertion d'une trame audio est détectée, l'obtention, comme première trame audio, d'une copie réservée de valeurs de point d'échantillonnage sur lesquelles un traitement de lecture a été effectué avant une valeur de point d'échantillonnage sur laquelle un traitement de lecture est actuellement en train d'être effectué, l'obtention, comme deuxième trame audio, de valeurs de point d'échantillonnage avec une longueur d'une trame audio suivant la valeur de point d'échantillonnage sur laquelle un traitement de lecture est actuellement en train d'être effectué.


 
15. Terminal, comprenant une mémoire et un processeur, dans lequel des instructions lisibles par ordinateur sont stockées dans la mémoire, les instructions lisibles par ordinateur, lorsqu'elles sont exécutées par le processeur, amenant le processeur à effectuer des opérations comprenant :

l'obtention d'une première trame audio et d'une deuxième trame audio adjacentes l'une à l'autre à partir d'un flux de données audio, la première trame audio précédant la deuxième trame audio dans une séquence temporelle du flux de données audio ;

la détermination d'une position de segmentation de trame, une valeur de point d'échantillonnage à la position de segmentation de trame dans la première trame audio et une valeur de point d'échantillonnage à la position de segmentation de trame dans la deuxième trame audio satisfaisant une condition de proximité en distance ; et

l'obtention de valeur de point d'échantillonnage respective précédant la position de segmentation de trame dans la première trame audio et de valeur de point d'échantillonnage respective suivant la position de segmentation de trame dans la deuxième trame audio, le raccordement séquentiel de la valeur de point d'échantillonnage respective, afin de générer une quatrième trame audio, et le remplacement de la première trame audio et de la deuxième trame audio par la quatrième trame audio.


 




Drawing






























REFERENCES CITED IN THE DESCRIPTION



This list of references cited by the applicant is for the reader's convenience only. It does not form part of the European patent document. Even though great care has been taken in compiling the references, errors or omissions cannot be excluded and the EPO disclaims all liability in this regard.

Patent documents cited in the description