(19)
(11)EP 3 662 468 B1

(12)EUROPEAN PATENT SPECIFICATION

(45)Mention of the grant of the patent:
04.11.2020 Bulletin 2020/45

(21)Application number: 19783935.0

(22)Date of filing:  26.09.2019
(51)International Patent Classification (IPC): 
G10L 21/0364(2013.01)
H03G 7/00(2006.01)
G10L 25/18(2013.01)
H03G 9/02(2006.01)
(86)International application number:
PCT/US2019/053142
(87)International publication number:
WO 2020/069120 (02.04.2020 Gazette  2020/14)

(54)

DISTORTION REDUCING MULTI-BAND COMPRESSOR WITH DYNAMIC THRESHOLDS BASED ON SCENE SWITCH ANALYZER GUIDED DISTORTION AUDIBILITY MODEL

VERZERRUNGSREDUZIERENDER MULTIBANDVERDICHTER MIT DYNAMISCHEN SCHWELLWERTEN AUF BASIS EINES DURCH EINEN SZENENUMSCHALTANALYSATOR GEFÜHRTEN VERZERRUNGSHÖRBARKEITSMODELLS

COMPRESSEUR MULTIBANDE AVEC RÉDUCTION DE DISTORSIONS AVEC SEUILS DYNAMIQUES BASÉ SUR UN MODÈLE D'AUDIBILITÉ DE DISTORSION GUIDÉ PAR ANALYSEUR DE COMMUTATEUR DE SCÈNE


(84)Designated Contracting States:
AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

(30)Priority: 28.09.2018 WO PCT/CN2018/108287
29.01.2019 US 201962798149 P
04.02.2019 EP 19155298

(43)Date of publication of application:
10.06.2020 Bulletin 2020/24

(73)Proprietor: Dolby Laboratories Licensing Corporation
San Francisco, CA 94103 (US)

(72)Inventor:
  • MA, Yuanxing
    Beijing 100020 (CN)

(74)Representative: Dolby International AB Patent Group Europe 
Apollo Building, 3E Herikerbergweg 1-35
1101 CN Amsterdam Zuidoost
1101 CN Amsterdam Zuidoost (NL)


(56)References cited: : 
WO-A1-2014/179021
US-B2- 9 419 577
US-A1- 2012 321 096
US-B2- 9 619 199
  
      
    Note: Within nine months from the publication of the mention of the grant of the European patent, any person may give notice to the European Patent Office of opposition to the European patent granted. Notice of opposition shall be filed in a written reasoned statement. It shall not be deemed to have been filed until the opposition fee has been paid. (Art. 99(1) European Patent Convention).


    Description

    Technical Field



    [0001] The present application disclosure generally relates to audio presentation and, in particular, to distortion reduction during presentation.

    Background



    [0002] Many audio playback systems contain amplifiers and speakers with limited output capabilities. Mobile phones and tablets are two extreme examples where the design is rigidly limited by the dimension and power requirements of the device. In such systems it is common for the audio to distort as the playback level is increased, and oftentimes the characteristics of this distortion are frequency dependent. Therefore, it is common practice to apply multi-band compression to the audio signal prior to playback to reduce distortion and attempt to maximize playback level on a playback device. A distortion threshold is specified for each frequency band of the signal, and a compressor applies an independent gain to each band to ensure that the signal level in each band does not exceed the corresponding distortion threshold. A problem with such a compressor is that the gains applied for the purposes of distortion reduction might be content dependent. The thresholds set in order to eliminate perceived distortion for a narrowband signal are oftentimes more than what is required for broadband signals, since the broadband signal itself may significantly mask some of the distortion which it induces whereas a narrowband signal may be much less effective at masking its induced distortion. To address this problem, the applicant proposed the multiband compressor augmented with a distortion audibility model that gives audibility measure which is then utilized to dynamically modify the thresholds of the compressor to achieve maximum playback level with minimal perceived distortion, as illustrated in figure 1.

    [0003] The international search report cites WO 2014/179021 A1 ("D1"), US 9 619 199 B2 ("D2"), US 2012/321096 A1 ("D3") and US 9 419 577 B2 ("D4"). D1 describes dynamically adjusting thresholds of a compressor. An input audio signal having a number of frequency band components is processed. A compressor performs, on each frequency band component, a compression operation having a corresponding time-varying threshold to produce gains. Each gain is applied to a delayed corresponding frequency band component to produce processed band components, which are summed to produce an output signal. D2 describes generating, for audio content received in a source audio format, default gains based on a default dynamic range compression (DRC) curve, and generating non-default gains for a non-default gain profile. Based on the default gains and non-default gains, differential gains are generated. An audio signal comprising the audio content, the default DRC curve, and differential gains is generated. D3 describes applying dynamic gain modifications to an audio signal at least partly in response to auditory events. An input channel is divided into auditory events by detecting changes in a measurable characteristic of the input signal with respect to time.

    [0004] D4 describes a distortion reducing multi-band compressor with timbre preservation. Timbre preservation is achieved by determining a time-varying threshold in each of a plurality frequency bands as a function of a respective fixed threshold for the frequency band and, at least in part, an audio signal level and a fixed threshold outside such frequency band. If a particular frequency band receives significant gain reduction due to being above or approaching its fixed threshold, then a time-varying threshold of one or more other frequency bands are also decreased to receive some gain reduction.

    Summary



    [0005] The present application introduces a scene switch analyzer, to determine if a scene switch has occurred in the input audio signal, to guide the distortion audibility model. This scene switch analyzer makes sure that the rapid change of compressor thresholds only happens at the same moment as the scene switches, so as to give a more natural experience. Generally, a scene switch occurs when a paragraph of content is comprised of narrowband signals, and the following paragraph is comprised of broadband signals, or vice versa. For example, if the vocal comes in after a piano solo, it is considered as a scene switch, thus the compressor thresholds could change rapidly as the distortion audibility measure changes. A scene switch also occurs when one piece of content is comprised of narrowband signals, and the next piece of content in the playlist is comprised of broadband signals, or vice versa. For example, a low-quality narrowband user-generated content (UGC) is followed by a professional broadband content.

    [0006] Hence, when there is no scene switch in the input audio signal, slow smoothing of the dynamic compressor thresholds is applied such that they change slowly. This can be obtained by using a large attack time constant and/or release time constant of a one pole smoother used for the smoothing. When a scene switch is detected, fast smoothing is applied to allow for a rapid change of the compressor thresholds by using a smaller attack time constant and/or release time constant of the smoother.

    [0007] In some implementations, a scene switch analyzer receives an input audio signal having a plurality of frequency band components. The scene switch analyzer determines whether a scene switch has occurred in the input audio signal. The frequency band components of the input audio signal are processed. In response to determining that scene switch has not occurred, a distortion audibility model applies slow smoothing to compressor thresholds of the frequency band components. In response to determining that scene switch has occurred, the distortion audibility model applies fast smoothing or no smoothing to the compressor thresholds of the frequency band components.

    [0008] In some implementations, the scene switch includes a switch between a broadband signal and a narrowband signal, or vice versa. The broadband signal corresponds to a vocal sound or a professional movie content, and the narrowband signal corresponds to an instrumental sound, e.g., a piano sound or a low-quality narrowband UGC content.

    [0009] In some implementations, determining whether a scene switch has occurred in the input audio signal is based on all frequency band components of an input audio signal. For example, determining whether a scene switch has occurred in the input audio signal is based on a time-varying estimation of the centroid of or the estimation of the cutoff band of the signal power spectrum by smoothing each frequency band component signal. Specifically, the scene switch analyzer computes the time-varying estimation of the signal power spectrum centroid by performing operation including estimating a signal power spectrum by smoothing each frequency band component signal and determining the centroid of the signal power spectrum using the estimated signal power spectrum. Determining whether the scene switch has occurred in the input audio signal can include the following operations: smoothing the centroid; determining a difference between the centroid and the smoothed centroid; and determining whether the scene switch has occurred based on whether the difference satisfies a threshold. In addition, the scene switch analyzer computes the estimation of the cutoff band of the signal power spectrum at least by performing operations including estimating a signal power spectrum by smoothing each frequency band component signal and determining the cutoff band of the signal power spectrum using the estimated signal power spectrum. Determining whether the scene switch has occurred in the input audio signal can include the following operations: smoothing the cutoff band; determining a difference between the cutoff band and the smoothed cutoff band; and determining whether the scene switch has occurred based on whether the difference satisfies a threshold.

    [0010] In some implementations, the scene switch analyzer provides one or more control signals to the distortion audibility model to guide the smoothing to compressor thresholds of the frequency band components of the input audio signal after determining whether the scene switch has occurred. In addition, in some implementations, one or more control signals guide the change of the time constants including attack time constant and/or release time constant. In some implementations, the function of one or more control signals is mapped to the range [0, 1], which can be the step function or the sigmoid function.

    [0011] In some implementations, a scene switch analyzer for determining whether a scene switch has occurred in the input audio signal includes one or more computing devices operable to cause some or all of the operations described above to be performed.

    [0012] In some implementations, a computer-readable medium stores instructions executable by one or more processors to cause some or all of operations described above to be performed.

    Brief Description of the Figures



    [0013] The included Figures are for illustrative purposes and serve only to provide examples of possible and operations for the disclosed inventive methods, system and computer-readable medium. These figures in no way limit any changes in form and detail that may be made by one skilled in the art without departing from the scope of the disclosed implementations.

    Figure 1 shows a schematic view of a prior compressor incorporating a distortion audibility model (DAM) for dynamically adjusting thresholds of the compressor responsive to in input audio signal.

    Figure 2 shows a schematic view of a compressor 100 incorporating a scene switch analyzer (SSA) to guide a distortion audibility model (DAM) dynamically adjusting thresholds of the compressor responsive to an input audio signal, according to some implementations.

    Figure 3 shows a flow chart of a method 200 of audio signal processing by a compressor 100 disclosed herein, performed according to some implementations.

    Figure 4 shows an example of a method 300 of dynamically adjusting thresholds of the compressor responsive to in input audio signal based on determining whether a scene switch has occurred in the input audio signal, performed according to some implementations.

    Figure 5 shows another example of a method 400 of dynamically adjusting thresholds of the compressor responsive to in input audio signal based on determining whether a scene switch has occurred in the input audio signal, performed according to some implementations.

    Figure 6A and 6B show two examples of the function of one or more control signals, i.e., step function and sigmoid function, respectively, according to some implementations.


    Detailed Description



    [0014] As above mentioned, now, the multiband compressor augmented with a distortion audibility model is used to give audibility measure which is then utilized to dynamically modify the thresholds of the compressor to achieve maximum playback level with minimal perceived distortion. A plurality of dynamic (time-varying) thresholds are determined according to the plurality of frequency band components, wherein each time-varying threshold corresponds to a respective frequency band component. The compressor then performs a compression operation on each frequency band component, wherein the compression has the corresponding time-varying threshold to produce a gain for each freqnency band component. However, the problem with such a distortion audibility model augmented compressor is that when applied to mobile devices, whose dimensions are rigidly limited, the perceived distortion for a narrowband signal is harder to eliminate, thus the threshold set for narrowband signals is oftentimes much lower than that is required for broadband signals. That means a small change in distortion audibility measure will cause a large threshold change, resulting in considerable output volume change. When the rapid and remarkable change occurs at unexpected moments, it will have a negative impact on listening experience.

    [0015] To address this problem, the present application discloses techniques that incorporate a scene switch analyzer configured to guide a distortion audibility model to smooth the dynamic (time-varying) thresholds, which can be applied by a multi-band compressor. Some examples of methods, systems and computer-readable medium implementing said techniques for dynamically adjusting the thresholds of a compressor responsive to an input audio signal are disclosed as follows.

    [0016] Figure 2 depicts the multi-band compressor 100 incorporating a scene switch analyzer (SSA) to guide a distortion audibility model (DAM) dynamically adjusting thresholds of the compressor responsive to an input audio signal, according to some implementations. In figure 2, a filtering module in the form of a filterbank 104 receives an input signal x[n]. Filterbank 104 is configured to filter input signal x[n] to separate input signal x[n] into a number of frequency band components x1[n]-xB[n]. In some implementations, filterbank 104 is configured as a multi-band filter implemented as a number B of bandpass filters, where each bandpass filter corresponds to a respective frequency band component. For example, the output of each band b may be computed as the input signal x[n] convolved with a bandpass filter response hb[n] as represented in Equation (1):

    In Figure 2, a scene switch analyzer 108 receives the frequency band components x1[n]-xB[n] output from filterbank 104; and based on its analysis, a scene switch analyzer 108 creates one or more control signals Ck[n]. In some implementations, Ck[n] is computed, potentially, as a function of all band signals xb[n] across bands b=1 ...B, as represented in Equation (2):

    Next, one or more control signals Ck[n] are fed into a distortion audibility model 112 to guide it to compute each time-varying threshold Db[n] based on all frequency band components x1[n]-xB[n] and fixed thresholds Lb across bands b=1 ...B, as represented in Equation (3):

    Wherein, in some implementations, a scene switch analyzer 108 can create only one control signal to guide computing all time-varying thresholds Db[n] for all frequency band components x1[n]-xB[n]; in some other implementations, rather than only one control signal, a scene switch analyzer 108 can create a plurality of control signals to guide computing all time-varying thresholds Db[n] for all frequency band components x1[n]-xB[n], for example, the number of the control signals correspond to the number of the frequency band components. Next, each frequency band component is passed into a compression function 116 along with the limit thresholds Db[n] to create the time-varying gains gb[n], as represented in Equation (4):

    Finally, the processed output signal y[n] is computed by summing delayed versions of all of frequency band components x1[n]-xB[n] multiplied with their corresponding gains g1[n]-gB[n]. In figure 2, the multiple units 120 are configured to multiple the gains with delayed frequency band components to produce the processed band components y1[n]-yB[n], which are summed at a summing unit 124 to produce output signal y[n]. For example, a delay d can be designed to take into account any processing delay associated with the computation of the gains. Equation (5) shows a representation of the generation of processed signal y[n]:



    [0017] Figure 3 shows a flow chart of a method 200 of audio signal processing by a compressor 100 disclosed herein, performed according to some implementations. Figure 3 is described with the example of figure 2. At 204 of figure 3, the frequency band components x1[n]-xB[n] are received as inputs to SSA 108, as explained above. At 208, SSA 108 produces one or more control signals Ck[n] based on all of frequency bands components x1[n]-xB[n]. At 212, DAM 112 computes the time-varying thresholds Db[n] based on all frequency band components x1[n]-xB[n] and fixed thresholds across bands according to the control signals Ck[n]. At 216, each compression function 116 is configured to perform a compression operation on corresponding frequency band components x1[n]-xB[n] using corresponding time-varying thresholds Db[n] to produce gains g1[n]-gB[n]. At 220, each gain gb[n] is applied to a delayed version of each corresponding frequency band component xb[n], for instance, using multiplier units 120, to produce processed band components y1[n]-yB[n], At 224, processed band components y1[n]-yB[n] are summed at summing unit 124 to produce output signal y[n].

    [0018] Therefore, rather than solely decided by DAM, SSA will also take the frequency band components x1[n]-xB[n], and based on its analysis give one or more control signals Ck[n] to control DAM to guide the smoothing to Db[n]. For example, Ck[n] guides the change of the time constants, which could give smaller time constants during a scene switch, to allow rapid changes, and give larger time constants when there is not a scene switch, to smooth out the fluctuations, since the attack and release time constants of a typical fast-attack/slow-release one pole smoother for Db[n] applied by the prior compressor would be fixed.

    [0019] Figure 4 shows an example of a method 300 of dynamically adjusting thresholds of the compressor responsive to input audio signal based on determining whether a scene switch has occurred in the input audio signal, performed according to some implementations. Preferably, it is found that the centroid of the signal power spectrum could be a good indicator of the scene switch cases, especially like when the vocal comes in after a piano-solo, or vice versa. Therefore, in this exemplary embodiment, a scene switch analyzer 108 operates by computing a time-varying estimation of the signal power spectrum centroid. At 304, the signal power spectrum sb[n] may be estimated by smoothing the per-band signal, i.e., each frequency band component signal xb[n] with a fast-attack/slow-release one pole smoother, as represented in Equation (6):

    Where αA is the attack time constant and αR is the release time constant of a fast-attack/slow-release one pole smoother. This signal power spectrum sb[n] is then represented in dB, in Equation (7):

    Next, at 308, the centroid of the signal power spectrum C[n] is determined by the estimated signal power spectrum, as represented in Equation (8):

    wherein fb is the center frequency of the band and, preferably, the fixed offset 130 dB is chosen so that all potentially audible signal, generally louder than -130 dB, would be counted into the signal power spectrum. Then, at 312, the centroid of the signal power spectrum would also be smoothed with a fast-attack/slow-release one pole smoother to obtain the smoothed version centroid Cs[n], as represented in Equation (9):

    Next, at 316, the difference between the centroid C[n] and the smoothed centroid Cs[n] is determined and then compared with the threshold, preferably, the threshold of 500Hz is chosen which is effective to indicate the occurrence of scene switch, to produce one or more control signals Ck[n], which could be mapped to the range [0, 1], as represented in Equation (10):

    At 320, Ck[n] guides the change of the time constants, such as, the attack time constant αA, as represented in Equation (11):

    Where αAtast and the αAslow could be set to a plurality of different values, for example, could be set to slightly different values or same value for each band; wherein, preferably, αAfast is set to one half of αAslow, or even smaller, to create a potentially more natural listening experience during dramatic scene switch.

    [0020] Next, at 324, the time constants, such as, the attack time constant αA in Equation (11) is applied to guide the smoothing to Db[n], as represented in Equations (12) and (13), respectively:



    Where db[n] is the unsmoothed per-band limit threshold generated in DAM. In some implementations, the Equation (12) illustrates the regular fast-attack/slow-release smoothing to Db[n]; in addition, if the most rapid changes are needed, the αA and the αAfast could even be set as zero; in this case, the DAM is guided to apply no smoothing when a scene switch is detected during an attack of db[n], as represented in Equation (13).

    [0021] In addition to or instead of utilizing the centroid as represented in figure 4, other characteristics of the input signal spectrum could be leveraged to assist the detection of scene switch as well. Figure 5 shows another example of a method 400 of dynamically adjusting thresholds of the compressor responsive to an input audio signal based on determining whether a scene switch has occurred in the input audio signal, performed according to some implementations. In this exemplary embodiment, the cutoff band of the signal power spectrum could be an alternative indicator of the scene switch cases, preferably, the cutoff band could be a good indicator of introduction of the music instruments that features different bandwidths. At 404, the signal power spectrum may be estimated by smoothing the per-band signal with a fast-attack/slow-release one pole smoother and then represented in dB, as represented similarly in Equation (6). Next, at 408, the cutoff band of the signal power spectrum bcutoff[n] is determined by the estimated signal power spectrum, as represented in Equation (14):



    [0022] Then, at 412, the cutoff band of the signal power spectrum would also be smoothed with a fast-attack/slow-release one pole smoother to obtain the smoothed version cutoff band bcutoff[n], as represented similarly in Equation (9). Next, at 416, the difference between the cutoff band and the smoothed cutoff band is determined and then compared with the threshold to produce one or more control signals Ck[n], as represented similarly in Equation (10). At 420, Ck[n] guides the change of the time constants, as represented similarly in Equation (11). Next, at 424, the time constants could be applied to guide the smoothing to Db[n], as represented similarly in Equations (12) and (13).

    [0023] Figures 6A and 6B show two examples of the function of one or more control signals Ck[n], i.e., step function and sigmoid function, respectively, according to some implementations. Generally, the function f(.) of the control signals Ck[n] could be mapped to the range [0, 1]. In one embodiment as illustrated by figure 6A, the mapping function f(.) would be a very simple example, i.e., the step function, as represented in Equation (15):

    Where xTh is the threshold. In addition, in the other preferable embodiment as illustrated by figure 6B, the mapping function f(.) would be the sigmoid function as represented in Equation (16):

    Where xTh is the threshold and a is a scale factor. Figure 6B further shows the three specific embodiments of the Sigmoid function where the scale factor is set as 1, 2 and 10 respectively. Using the sigmoid function could potentially assist to generate more consistent audio outputs across floating point and fixed point platforms with different word lengths.

    [0024] Instead of guiding the attack time constant, an alternative is that one or more control signals Ck[n] could be created to guide the other parameters, such as the release time constant αR, etc., by following the generate steps from 304/404 to 320/420 described above; wherein some of parameters used in steps from 304/404 to 320/420 could be changed, such as changing the smoothing scheme, by changing the time constants used, of the signal power spectrum Sb[n] at 312/412, or changing the mapping function at 316/416, etc.

    [0025] The techniques of the scene switch analyzer described herein could be implemented by one or more computing devices. For example, a controller of a special-purpose computing device may be hard-wired to perform the disclosed operations or cause such operations to be performed and may include digital electronic circuitry such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGA) persistently programmed to perform operations or cause operations to be performed. In some implementations, custom hard-wired logic, ASICs and/or FPGAs with custom programming are combined to accomplish the techniques.

    [0026] In some other implementations, a general purpose computing device could include a controller incorporating a central processing unit (CPU) programmed to cause one or more of the disclosed operations to be performed pursuant to program instruction in firmware, memory, other storage, or a combination thereof.

    [0027] The term "computer-readable storage medium" as used herein refers to any medium that storage instructions and/or data that cause a computer or type of machine to operate in a specific fashion. Any of the models, analyzer and operations described herein may be implemented as or caused to be implemented by software code executable by a processor of a controller using suitable computer language. The software code may be stored as a series of instructions on a computer-readable medium for storage. Example of suitable computer-readable storage medium include random access memory (RAM), read only memory (ROM), a magnetic medium, optical medium, a solid state drive, flash memory, and any other memory chip or cartridge. The computer-readable storage medium may be any combination of such storage devices. Any such computer-readable storage medium may reside on or within a single computing device or an entire computer system, and may be among other computer-readable storage medium within a system or network.

    [0028] While the subject matter of this application has been particularly shown and described with reference to implementations thereof, it will be understood by those skilled in the art that changes in the form and details of the disclosed implementations may be made without departing from the scope of this disclosure. Examples of some of these implementations are illustrated in the accompany drawings, and specific details are set forth in order to provide a thorough understanding thereof. It should be noted that implementations may be practiced without some or all of these specific details. In addition, well known features may not have been described in detail to promote clarity. Finally, although advantages have been discussed herein with reference to some implementations, it will be understood that the scope should not be limited by reference to such advantages. Rather, the scope should be determined with reference to the appended claims.


    Claims

    1. A method (200, 300, 400) of dynamically adjusting thresholds of a compressor (100) responsive to an input audio signal, the method comprising:

    receiving, by a scene switch analyzer (108), an input audio signal having a plurality of frequency band components;

    determining, by the scene switch analyzer, whether a scene switch has occurred in the input audio signal, wherein a scene switch is determined to have occurred when the input audio signal transitions from being a broadband signal to being a narrowband signal or vice versa;

    providing, by the scene switch analyzer, one or more control signals to a distortion audibility model (112) to guide smoothing to compressor thresholds of the frequency band components by guiding a change of an attack time constant and/or a release time constant of a smoother; and

    processing the frequency band components of the input audio signal, including:

    in response to determining that scene switch has not occurred, the control signal indicating use of a large time constant thereby applying slow smoothing to compressor thresholds of the frequency band components; and

    in response to determining that scene switch has occurred, the one or more control signals indicating use of a small or zero-valued time constant thereby applying fast smoothing or no smoothing to the compressor thresholds of the frequency band components.


     
    2. The method of claim 1, wherein the broadband signal corresponds to a vocal sound or a professional movie content, and the narrowband signal corresponds to an instrumental sound or a low-quality narrowband user-generated content (UGC).
     
    3. The method of claim 1 or 2, wherein determining whether a scene switch has occurred in the input audio signal is based on all frequency band components of an input audio signal.
     
    4. The method of claim 3, wherein determining whether a scene switch has occurred in the input audio signal is based on a time-varying estimation of a signal power spectrum centroid.
     
    5. The method of claim 4, wherein the scene switch analyzer computes the time-varying estimation of the signal power spectrum centroid at least by performing operations comprising:

    estimating (304) a signal power spectrum by smoothing each frequency band component signal; and

    determining (308) the centroid of the signal power spectrum using the estimated signal power spectrum.


     
    6. The method of claim 5, wherein determining whether the scene switch has occurred in the input audio signal comprises:

    smoothing (312) the centroid;

    determining (316) a difference between the centroid and the smoothed centroid; and

    determining whether the scene switch has occurred based on whether the difference satisfied a threshold.


     
    7. The method of any of claims 3-6, wherein determining whether a scene switch has occurred in the input audio signal is based on the estimation of the cutoff band of the signal power spectrum.
     
    8. The method of claim 7, wherein the scene switch analyzer computes the estimation of the cutoff band of the signal power spectrum at least by performing operations comprising:

    estimating (404) a signal power spectrum by smoothing each frequency band component signal; and

    determining (408) the cutoff band of the signal power spectrum using the estimated signal power spectrum.


     
    9. The method of claim 8, wherein determining whether the scene switch has occurred in the input audio signal comprises:

    smoothing (412) the cutoff band;

    determining (416) a difference between the cutoff band and the smoothed cutoff band; and

    determining whether the scene switch has occurred based on whether the difference satisfies a threshold.


     
    10. The method of any one of the preceding claims, wherein a function of one or more control signals for guiding the change of the attack time constant and/or the release time constant is mapped to the range [0, 1], and wherein said attack time constant and/or release time constant is changed by being multiplied by said function.
     
    11. The method of any one of the preceding claims, further comprising:
    performing (216), by the compressor, on each frequency band component, a compression operation having the corresponding threshold to produce a plurality of gains, each gain corresponding to a respective frequency band component.
     
    12. A scene switch analyzer (108) comprising:

    one or more computing devices; and

    a computer-readable storage medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations of any one of claims 1 to 11.


     
    13. A computer-readable storage medium storing instructions that, when executed by one or more computing devices, cause the one or more computing devices to perform operations of any one of claims 1 to 11.
     


    Ansprüche

    1. Verfahren (200, 300, 400) zum dynamischen Einstellen von Schwellenwerten eines Verdichters (100) als Reaktion auf ein Eingangsaudiosignal, wobei das Verfahren umfasst:

    Empfangen eines Eingangsaudiosignals mit einer Vielzahl von Frequenzbandkomponenten durch einen Szenenumschaltanalysator (108);

    Bestimmen, durch den Szenenumschaltanalysator, ob eine Szenenumschaltung in dem Eingangsaudiosignal aufgetreten ist, wobei bestimmt wird, dass eine Szenenumschaltung aufgetreten ist, wenn das Eingangsaudiosignal von einem Breitbandsignal zu einem Schmalbandsignal übergeht oder umgekehrt;

    Bereitstellen eines oder mehrerer Steuersignale durch ein Szenenumschaltanalysator an ein Verzerrungshörbarkeitsmodell (112), um die Glättung zu Verdichterschwellenwerten der Frequenzbandkomponenten zu führen, indem eine Änderung einer Angriffszeitkonstante und/oder einer Freigabezeitkonstante von einem Glätter gesteuert wird; und

    Verarbeiten der Frequenzbandkomponenten des Eingangsaudiosignals, einschließlich, dass:

    als Reaktion auf das Bestimmen, dass keine Szenenumschaltung aufgetreten ist, das Steuersignal die Verwendung einer großen Zeitkonstante angibt, wodurch eine langsame Glättung auf die Verdichterschwellenwerte der Frequenzbandkomponenten angewendet wird; und dass

    als Reaktion auf das Bestimmen, dass eine Szenenumschaltung aufgetreten ist, das eine oder die mehreren Steuersignale die Verwendung einer kleinen oder nullwertigen Zeitkonstante angeben, wodurch eine schnelle Glättung oder keine Glättung auf die Verdichterschwellenwerte der Frequenzbandkomponenten angewendet wird.


     
    2. Verfahren nach Anspruch 1, wobei das Breitbandsignal einem Vokalton oder einem professionellen Filminhalt entspricht und das Schmalbandsignal einem Instrumentalton oder einem vom Benutzer erzeugten Schmalbandinhalt geringer Qualität (UGC) entspricht.
     
    3. Verfahren nach Anspruch 1 oder 2, wobei das Bestimmen, ob eine Szenenumschaltung in dem Eingangsaudiosignal aufgetreten ist, auf allen Frequenzbandkomponenten eines Eingangsaudiosignals basiert.
     
    4. Verfahren nach Anspruch 3, wobei das Bestimmen, ob eine Szenenumschaltung in dem Eingangsaudiosignal aufgetreten ist, auf einer zeitvariablen Schätzung eines Signalleistungsspektrum-Schwerpunkts basiert.
     
    5. Verfahren nach Anspruch 4, wobei der Szenenumschaltanalysator die zeitvariable Schätzung des Signalleistungsspektrum-Schwerpunkts mindestens durch Durchführen von Operationen berechnet, die umfassen:

    Schätzen (304) eines Signalleistungsspektrums durch Glätten jedes Frequenzbandkomponentensignals; und

    Bestimmen (308) des Schwerpunkts des Signalleistungsspektrums unter Verwendung des geschätzten Signalleistungsspektrums.


     
    6. Verfahren nach Anspruch 5, wobei das Bestimmen, ob die Szenenumschaltung in dem Eingangsaudiosignal aufgetreten ist, umfasst:

    Glätten (312) des Schwerpunkts;

    Bestimmen (316) eines Unterschieds zwischen dem Schwerpunkt und dem geglätteten Schwerpunkt; und

    Bestimmen, ob die Szenenumschaltung aufgetreten ist, basierend darauf, ob die Differenz einen Schwellenwert erfüllt hat.


     
    7. Verfahren nach einem der Ansprüche 3 bis 6, wobei das Bestimmen, ob eine Szenenumschaltung in dem Eingangsaudiosignal aufgetreten ist, auf der Schätzung des Trennbandes des Signalleistungsspektrums basiert.
     
    8. Verfahren nach Anspruch 7, wobei der Szenenumschaltanalysator die Schätzung des Trennbandes des Signalleistungsspektrums mindestens durch Durchführen von Operationen berechnet, die umfassen:

    Schätzen (404) eines Signalleistungsspektrums durch Glätten jedes Frequenzbandkomponentensignals; und

    Bestimmen (408) des Trennbandes des Signalleistungsspektrums unter Verwendung des geschätzten Signalleistungsspektrums.


     
    9. Verfahren nach Anspruch 8, wobei das Bestimmen, ob die Szenenumschaltung in dem Eingangsaudiosignal aufgetreten ist, umfasst:

    Glätten (412) des Trennbandes;

    Bestimmen (416) eines Unterschieds zwischen dem Trennband und dem geglätteten Trennband; und

    Bestimmen, ob die Szenenumschaltung aufgetreten ist, basierend darauf, ob die Differenz einen Schwellenwert erfüllt.


     
    10. Verfahren nach einem der vorstehenden Ansprüche, wobei eine Funktion eines oder mehrerer Steuersignale zum Führen der Änderung der Angriffszeitkonstante und/oder der Freigabezeitkonstante auf den Bereich [0, 1] abgebildet wird und wobei die Angriffszeitkonstante und/oder die Freigabezeitkonstante geändert werden, indem sie mit dieser Funktion multipliziert werden.
     
    11. Verfahren nach einem der vorstehenden Ansprüche, weiter umfassend: Durchführen (216) einer Verdichtungsoperation durch den Verdichter an jeder Frequenzbandkomponente mit dem entsprechenden Schwellenwert, um eine Vielzahl von Verstärkungen zu erzeugen, wobei jede Verstärkung einer jeweiligen Frequenzbandkomponente entspricht.
     
    12. Szenenumschaltanalysator (108), umfassend:

    eine oder mehrere Computervorrichtungen; und

    ein computerlesbares Speichermedium, das Anweisungen speichert, die, wenn sie durch einen oder mehrere Prozessoren ausgeführt werden, den einen oder die mehreren Prozessoren veranlassen, die Operationen nach einem der Ansprüche 1 bis 11 durchzuführen.


     
    13. Computerlesbares Speichermedium, das Anweisungen speichert, die, wenn sie durch einen oder mehrere Prozessoren ausgeführt werden, den einen oder die mehreren Computervorrichtungen veranlassen, die Operationen nach einem der Ansprüche 1 bis 11 durchzuführen.
     


    Revendications

    1. Procédé (200, 300,400) de réglage dynamique de seuils d'un compresseur (100) en réponse à un signal audio d'entrée, le procédé comprenant les étapes consistant à :

    recevoir, par un analyseur de commutation de scène (108), un signal audio d'entrée ayant une pluralité de composantes de bande de fréquence ;

    déterminer, par l'analyseur de commutation de scène, si une commutation de scène s'est produite dans le signal audio d'entrée, dans lequel une commutation de scène est déterminée comme s'étant produite lorsque le signal audio d'entrée passe d'un signal à large bande à un signal à bande étroite ou vice versa ;

    fournir, par l'analyseur de commutation de scène, un ou plusieurs signaux de commande à un modèle d'audibilité de distorsion (112) pour guider un lissage à des seuils de compresseur des composantes de bande de fréquence en guidant une modification d'une constante de temps d'attaque et/ou d'une constante de temps de libération d'un dispositif de lissage ; et

    traiter les composantes de bande de fréquence du signal audio d'entrée, incluant :

    en réponse à la détermination du fait qu'une commutation de scène ne s'est pas produite, le signal de commande indiquant l'utilisation d'une constante de temps élevée appliquant ainsi un lissage lent à des seuils de compresseur des composantes de bande de fréquence ; et

    en réponse à la détermination du fait qu'une commutation de scène s'est produite, les un ou plusieurs signaux de commande indiquant l'utilisation d'une constante de temps faible ou nulle appliquant ainsi un lissage rapide ou aucun lissage aux seuils de compresseur des composantes de bande de fréquence.


     
    2. Procédé selon la revendication 1, dans lequel le signal à large bande correspond à un son vocal ou un contenu de film professionnel, et le signal à bande étroite correspond à un son instrumental ou un contenu généré par les utilisateurs (UGC) à bande étroite de faible qualité.
     
    3. Procédé selon la revendication 1 ou 2, dans lequel la détermination du fait qu'une commutation de scène s'est produite dans le signal audio d'entrée se base sur toutes les composantes de bande de fréquence d'un signal audio d'entrée.
     
    4. Procédé selon la revendication 3, dans lequel la détermination du fait qu'une commutation de scène s'est produite dans le signal audio d'entrée se base sur une estimation variable dans le temps d'un centroïde de spectre de puissance de signal.
     
    5. Procédé selon la revendication 4, dans lequel l'analyseur de commutation de scène calcule l'estimation variable dans le temps du centroïde de spectre de puissance de signal au moins en effectuant des opérations comprenant :

    l'estimation (304) d'un spectre de puissance de signal en lissant chaque signal de composante de bande de fréquence ; et

    la détermination (308) du centroïde du spectre de puissance de signal en utilisant le spectre de puissance de signal estimé.


     
    6. Procédé selon la revendication 5, dans lequel la détermination du fait que la commutation de scène s'est produite dans le signal audio d'entrée comprend :

    le lissage (312) du centroïde ;

    la détermination (316) d'une différence entre le centroïde et le centroïde lissé ; et

    la détermination du fait que la commutation de scène s'est produite sur la base du fait que la différence satisfait à un seuil.


     
    7. Procédé selon l'une quelconque des revendications 3 à 6, dans lequel la détermination du fait qu'une commutation de scène s'est produite dans le signal audio d'entrée se base sur l'estimation de la bande de découpe du spectre de puissance de signal.
     
    8. Procédé selon la revendication 7, dans lequel l'analyseur de commutation de scène calcule l'estimation de la bande de découpe du spectre de puissance de signal au moins en effectuant des opérations comprenant :

    l'estimation (404) d'un spectre de puissance de signal en lissant chaque signal de composante de bande de fréquence ; et

    la détermination (408) de la bande de découpe du spectre de puissance de signal en utilisant le spectre de puissance de signal estimé.


     
    9. Procédé selon la revendication 8, dans lequel la détermination du fait que la commutation de scène s'est produite dans le signal audio d'entrée comprend :

    le lissage (412) de la bande de découpe ;

    la détermination (416) d'une différence entre la bande de découpe et la bande de découpe lissée ; et

    la détermination du fait que la commutation de scène s'est produite sur la base du fait que la différence satisfait à un seuil.


     
    10. Procédé selon l'une quelconque des revendications précédentes, dans lequel une fonction d'un ou plusieurs signaux de commande pour guider la modification de la constante de temps d'attaque et/ou de la constante de temps de libération est cartographiée sur la plage [0, 1], et dans lequel ladite constante de temps d'attaque et/ou constante de temps de libération est modifiée en étant multipliée par ladite fonction.
     
    11. Procédé selon l'une quelconque des revendications précédentes, comprenant en outre :
    la réalisation (216), par le compresseur, sur chaque composante de bande de fréquence, d'une opération de compression présentant le seuil correspondant pour produire une pluralité d'augmentations, chaque augmentation correspondant à une composante de bande de fréquence respective.
     
    12. Analyseur de commutation de scène (108) comprenant :

    un ou plusieurs dispositifs informatiques ; et

    un support de stockage lisible par ordinateur stockant des instructions qui, lorsqu'elles sont exécutées par un ou plusieurs processeurs, amènent les un ou plusieurs processeurs à effectuer des opérations selon l'une quelconque des revendications 1 à 11.


     
    13. Support de stockage lisible par ordinateur stockant des instructions qui, lorsqu'elles sont exécutées par un ou plusieurs dispositifs informatiques, amènent les un ou plusieurs dispositifs informatiques à effectuer des opérations selon l'une quelconque des revendications 1 à 11.
     




    Drawing




















    Cited references

    REFERENCES CITED IN THE DESCRIPTION



    This list of references cited by the applicant is for the reader's convenience only. It does not form part of the European patent document. Even though great care has been taken in compiling the references, errors or omissions cannot be excluded and the EPO disclaims all liability in this regard.

    Patent documents cited in the description