[Technical Field]
[0001] The present invention relates to an apparatus for independently controlling a volume
of a speech signal extracted from an audio signal and method thereof, and more particularly,
to an apparatus for independently controlling a volume of a speech signal by inverting
a phase of a gain value corresponding to one channel of left and right channel whose
phase is inverted and method thereof.
[Background Art]
[0002] Generally, an audio amplifying technology is used to amplify a low-frequency signal
in a home entertainment system, a stereo system and other consumer electronic devices
and implement various listening environments (e.g., concert hall, etc.). For instance,
a separate dialog volume (SDV) means a technology for extracting a speech signal (e.g.,
dialog) from a stereo/multi-channel audio signal and then independently controlling
a volume of the extracted speech signal in order to solve a problem of having difficulty
in delivering speech in viewing a television or movie.
[Disclosure]
[Technical Problem]
[0003] Generally, a method and apparatus for controlling a volume of a speech signal included
in an audio/video signal enable a speech signal to be efficiently controlled according
to a request made by a user in various devices for playing back an audio signal such
as television receivers, digital multimedia broadcast (DMB) players, personal media
players (PMP) and the like.
[0004] However, as phases of left and right channels signals are inverted due to such a
cause as error in transmission or intentionally, if correlation between the left and
right channel signals has a negative value despite a mono signal e.g., if an input
signal is spread widely rather than concentrated on a specific point on sound), the
corresponding signal is not recognized as a speech signal due to the characteristics
of SDV algorithm. Therefore, it is unable to control a corresponding volume.
[0005] Meanwhile, operation of the SDV algorithm needs to be manually controlled according
to a request made by a user, it may be inconvenient for the user to use the television
receiver or the like.
[Technical Solution]
[0006] Accordingly, the present invention is directed to an apparatus for independently
controlling a volume of a speech signal extracted from an audio signal and method
thereof that substantially obviate one or more of the problems due to limitations
and disadvantages of the related art.
[0007] An object of the present invention is to provide an apparatus for independently controlling
a volume of a speech signal of a inverse-phase audio signal and method thereof, in
which a sign of a final gain value corresponding to one channel of the audio signal
is changed or a value of the final gain corresponding to one channel of the audio
signal is adjusted through a process for determining whether an input signal is an
inverse-phase mono signal including left and right channel whose phase is inverted.
[0008] Another object of the present invention is to provide an apparatus for independently
controlling a volume of a speech signal by automatically controlling a timing point
of activating an SDV.
[Advantageous Effects]
[0009] Accordingly, the present invention provides the following effects or advantages.
[0010] First of all, in an inverse-phase input audio signal, it is able to control a volume
of a speech signal by changing a sign of a final gain or adjusting a value of the
final gain corresponding to one channel of left and right channel of the audio signal.
[0011] Secondly, in an inverse-phase input audio signal, it is able to control a volume
of a speech signal by inverting a phase of either a left or right channel of the audio
signal.
[0012] Thirdly, by determining an inter-channel correlation of an input audio signal, it
is able to check whether a phase of the input audio signal is inverted.
[0013] Fourthly, by automatically controlling a timing point of activating SDV, it is able
to independently control a volume of a speech signal.
[Description of Drawings]
[0014] The accompanying drawings, which are included to provide a further understanding
of the invention and are incorporated in and constitute a part of this specification,
illustrate embodiments of the invention and together with the description serve to
explain the principles of the invention.
[0015] In the drawings:
FIG. 1 is a diagram for a process for playing back an audio signal via TV or the like;
FIG. 2 is a diagram for a process for playing back an audio signal via a TV or the
like in a general mono signal environment or an inverse-phase mono signal environment;
FIG. 3 is a diagram of a mixing model for a speech signal controlling technology;
FIG. 4 is a graph of analysis of a stereo signal using time-frequency tiles;
FIG. 5 is a block diagram of a speech signal control system including an inverse phase
detecting unit according to an embodiment of the present invention;
FIG. 6 is a block diagram of a speech signal control system including an auto SDV
e detecting unit according to an embodiment of the present invention;
FIG. 7 is a block diagram of an audio signal processing apparatus due to characteristics
of a detected sound according to an embodiment of the present invention;
FIG. 8 is a block diagram of a speech signal control system including an ICLD detecting
unit according to an embodiment of the present invention;
FIG. 9 is a partial diagram of a remote controller including a remote controller volume
button having an SDV controller for controlling a dialog volume;
FIG. 10 and FIG. 11 are diagrams for a method of notifying dialog volume control information
via OSD (on screen display) of a television receiver; and
FIG. 12 is a block diagram for an example of a digital television system 1200 performing
a dialog amplification technology.
[Best Mode]
[0016] Additional features and advantages of the invention will be set forth in the description
which follows, and in part will be apparent from the description, or may be learned
by practice of the invention. The objectives and other advantages of the invention
will be realized and attained by the structure particularly pointed out in the written
description and claims thereof as well as the appended drawings.
[0017] To achieve these and other advantages and in accordance with the purpose of the present
invention, as embodied and broadly described, a method for processing an audio signal
includes obtaining a stereophonic audio signal including a speech component signal
and other component signals, obtaining gain values for each channel of the audio signal,
determining whether the audio signal is an inverse-phase mono signal including left
and right channel whose phase is inverted, inverting a phase of the obtained gain
value corresponding to the one channel of the audio signal when the audio signal is
an inverse-phase mono signal, modifying the speech component signal based on the inverted
phase of the gain value, and generating a modified audio signal including the modified
speech component signal, wherein the modified audio signal is in-phase mono signal.
[0018] Preferably, the modified audio signal is inverse-phase mono signal.
[0019] Preferably, the determining further includes determining inter-channel correlation
between two channels of the audio signal, comparing one or more threshold values with
the inter-channel correlation, and determining whether the audio signal is an inverse-phase
mono signal based on results of the comparison.
[0020] More preferably, the inter-channel correlation is determined per sub-band. In this
case, the audio signal is an inverse-phase mono signal if a sum of the inter-channel
correlations is smaller than one or more threshold.
[0021] More preferably, the inter-channel correlation is determined per sub-band, and the
audio signal is an inverse-phase mono signal if a sum of the inter-channel correlations
is smaller than one or more threshold.
[0022] Preferably, the determining further includes determining inter-channel correlation
between two channels of the audio signal, comparing one or more threshold values with
the number of the inter-channel correlation which is minus and determining whether
the audio signal is an inverse-phase mono signal based on results of the comparison.
[0023] More preferably, the inter-channel correlation is determined per sub-band, and the
audio signal is an inverse-phase mono signal if the number of the inter-channel correlation
which is minus is larger than one or more threshold.
[0024] To further achieve these and other advantages and in accordance with the purpose
of the present invention, a method for processing an audio signal includes obtaining
a stereophonic audio signal including a speech component signal and other component
signals, determining whether the audio signal is an inverse-phase mono signal including
left and right channel whose phase is inverted, inverting a phase of the one channel
of the audio signal when the audio signal is an inverse-phase mono signal, obtaining
gain values for each channel of the audio signal, modifying the speech component signal
based on the obtained gain values, and generating a modified audio signal including
the modified speech component signal, wherein the modified audio signal is in-phase
mono signal.
[0025] To further achieve these and other advantages and in accordance with the purpose
of the present invention, an apparatus for processing an audio signal includes a gain
obtaining unit obtaining a stereophonic audio signal including a speech component
signal and other component signals, and obtaining gain values for each channel of
the audio signal, an inverse phase detecting unit determining whether the audio signal
is an inverse-phase mono signal including left and right channel whose phase is inverted,
a gain modification unit inverting a phase of the obtained gain value corresponding
to the one channel of the audio signal when the audio signal is an inverse-phase mono
signal, and a signal modification unit modifying the speech component signal based
on the inverted phase of the gain values, and generating a modified audio signal including
the modified speech component signal, wherein the modified audio signal is in-phase
mono signal.
[0026] To further achieve these and other advantages and in accordance with the purpose
of the present invention, an apparatus for processing an audio signal includes a gain
obtaining unit obtaining a stereophonic audio signal including a speech component
signal and other component signals, an inverse phase detecting unit determining whether
the audio signal is an inverse-phase mono signal including left and right channel
whose phase is inverted, and a signal modification unit inverting a phase of the one
channel of the audio channel when the audio signal is an inverse-phase mono signal,
obtaining gain values for each channel of the audio signal, modifying the speech component
signal based on the obtained gain values, and generating a modified audio signal including
the modified speech component signal, wherein the modified audio signal is in-phase
mono signal.
[0027] It is to be understood that both the foregoing general description and the following
detailed description are exemplary and explanatory and are intended to provide further
explanation of the invention as claimed.
[Mode for Invention]
[0028] Reference will now be made in detail to the preferred embodiments of the present
invention, examples of which are illustrated in the accompanying drawings. First of
all, terminologies or words used in this specification and claims are not construed
as limited to the general or dictionary meanings and should be construed as the meanings
and concepts matching the technical idea of the present invention based on the principle
that an inventor is able to appropriately define the concepts of the terminologies
to describe the inventor's invention in best way. The embodiment disclosed in this
disclosure and configurations shown in the accompanying drawings are just one preferred
embodiment and do not represent all technical idea of the present invention. Therefore,
it is understood that the present invention covers the modifications and variations
of this invention provided they come within the scope of the appended claims and their
equivalents at the timing point of filing this application.
[0029] Particularly, 'information' in this disclosure is the terminology that generally
includes values, parameters, coefficients, elements and the like and its meaning can
be construed as different occasionally, by which the present invention is non-limited.
[0030] A speech signal (particularly, dialog component) volume control technology according
to the present invention may relate to an audio signal processing apparatus and method
for modifying a speech signal in an inverse-phase mono signal environment in which
phases of left and right channels are inverted due to error in transmission or intentionally.
First of all, in the following description, an audio signal processing apparatus and
method for modifying a speech signal in a general environment instead of an inverse-phase
mono signal environment will be explained.
[0031] FIG. 1 is a diagram for a process for playing back an audio signal via TV or the
like.
[0032] Referring to FIG. 1, a speech signal C is applied as an equal signal to left and
right speakers and is then delivered to both ears of a listener trough a listening
space where the viewer is located. In doing so, SDV extracts the speech signal C applied
as the same signal to the left and right channels and then controls a volume of the
extracted speech signal to be heard by a listener clearly or unclearly. In case of
such a mono signal as news, when the SDV extracts the same signal from the left and
right channel signals, a whole signal is extracted. When the SDV controls a speech
signal, and more particularly, when a dialog volume is controlled, it brings an effect
of controlling a whole volume.
[0033] FIG. 2 is a diagram for a process for playing back an audio signal via a TV or the
like in a general mono signal environment or an inverse-phase mono signal environment.
[0034] Referring to FIG. 2, powers and phases of left and right channel signals are equal
in a general mono signal environment. Yet, in order to give a slight stereo effect
to a mono signal environment of a specific broadcast, right left and right channel
signal can be transmitted in a manner of phases of the left and right channel signals
are inverted. This is called an inverse-phase mono signal environment. In this case,
the inverse-phase mono signal environment can be made if a signal intentionally inverted
by a broadcasting station is transmitted, if an erroneous signal attribute to error
in transmission is transmitted, or if an original signal has this characteristic.
In the inverse-phase mono signal environment, although left and right channel signals
construct the same signal, since phases of the left and right signals are inverted,
a general SDV fails to find the same component of the left and right channel signals.
Hence, it is unable to extract any speech component at all.
[0035] FIG. 3 is block diagram of a mixing model 300 for dialog enhancement techniques.
In the model 100, a listener receives audio signals from left and right channels.
An audio signal s corresponds to localized sound from a direction determined by a
factor a. Independent audio signals n
1 and n
2, correspond to laterally reflected or reverberated sound, often referred to as ambient
sound or ambience. Stereo signals can be recorded or mixed such that for a given audio
source the source audio signal goes coherently into the left and right audio signal
channels with specific directional cues (e.g. level difference, time difference),
and the laterally reflected or reverberated independent signals n
1 and n
2 go into channels determining auditory event width and listener envelopment cues.
The model 300 can be represented mathematically as a perceptually motivated decomposition
of a stereo signal with one audio source capturing the localization of the audio source
and ambience.

[0036] To get a decomposition that is effective in non-stationary scenarios with multiple
concurrently active audio sources, the decomposition of [1] can be carried out independently
in a number of frequency bands and adaptively in time

where i is a subband index and k is a subband time index.
[0037] FIG. 2 is a graph illustrating a decomposition of a stereo signal using time-frequency
tiles. In each time-frequency tile 200 with indices i and k, the signals S, N
1, N
2 and decomposition gain factor A can be estimated independently. For brevity of notation,
the subband and time indices i and k are ignored in the following description.
[0038] When using a subband decomposition with perceptually motivated subband bandwidths,
the bandwidth of a subband can be chosen to be equal to one critical band. S, N
1, N
2, and A can be estimated approximately every t milliseconds (e.g., 20 ms) in each
subband. For low computation complexity, a short time Fourier transform (STFT) can
be used to implement a fast Fourier transform (FFT). Given stereo subband signals,
X
1 and X
2, estimates S, A, N
1, N
2 can be determined. A short-time estimate of a power of X
1 can be donoted

[0039] Where E{.} is a short-time averaging operation. For other signals, the same convention
can be used, i.e., P
X2, P
S and P
N = P
N1 = P
N2 are the corresponding short-time power estimates. The power of N
1 and N
2 is assumed to be the same, i.e., it is assumed that the amount of lateral independent
sound is the same for left and right channels.
[0040] Given the subband representation of the stereo signal, the power (P
X1, P
X2) and the normalized cross-correlation can be determined. The normalized cross-correlation
between left and right channels is

[0041] A, P
S, P
N can be computed as a function of the estimated P
X1, P
X2 and Φ. Three equations relating the known and unknown variables are:

[0042] Equantions [5] can be solved for A, P
S, and P
N, to yield

with

[0043] Next, the least squares estimates of S, N
1, N
2 are computed as a function of A, P
S, and P
N. For each i and k, the signal S can be estimated as

where w
1 and w
2 are real-valued weights. The estimation error is

[0044] The weights w
1 and w
2 are optimal in a least square sense when the error E is orthogonal to X1 and X2,
i.e.,

yielding two equations

from which the weights are computed,

[0045] The estimate of N
1 can be

[0046] The estimation error is

[0047] Again, the weights are computed such that the estimation error is orthogonal to X
1 and X
2, resulting in

[0048] The weights for computing the least squares estimate of N
2,

are

[0049] In some implementations, the least squares estimates can be post-scaled, such that
the power of the estimates equals to P
S and P
N = P
N1 = P
N2. The power of Ŝ is

[0050] Thus, for obtaining an estimate of S with power P
S, Ŝ is scaled

with similar reasoning,
N̂1 and
N̂2 are scaled

[0051] Given the previously described signal decomposition, a signal that is similar to
the original stereo signal can be obtained by applying [2] at each time and for each
subband and converting the subbands back to the time domain.
[0052] For generating the signal with modified dialog gain, the subbands are computed as

where g(i,k) is a gain factor in dB which computed such that the dialog gain is modified
as desired.
[0053] These observations imply g(i,k) is set to 0 dB at very low frequencies and above
8 kHz, to potentially modify the stereo signal as little as possible.
[0054] As mentioned in the foregoing description, X
1 and X
2 indicate let and right input signals of SDV in Formula 2, respectively. And, Y
1 and Y
2 indicate let and right output signals of the SDV in Formula 21, respectively. Yet,
in the inverse-phase mono signal environment where an input has an inverse phase,
it becomes X
2 = -X
1 in left and right input signals of SDV. If this is inserted in a formula and then
developed, it becomes Y
1 = X
1 and Y
2 = X
2 [A = 1]. Consequently, if an input has an opposite phase, a general SDV recognizes
a background sound having any speech signal not exist in the input at all and then
outputs the input intact.
[0055] Yet, the inverse-phase mono signal environment is not a situation having no speech
signal at all. Instead, the inverse-phase mono signal environment is generated to
force to give a stereo effect or occurs due to error in the course of transmission.
Hence, a whole signal is recognized as a speech signal and is then processed.
[0056] In order to prevent X
1 and X
2 from being canceled out in generating Y
1 and Y
2 in Formula 21, it is necessary to invert a phase of either X
1 or X
2 or a phase of a gain value corresponding to either X
1 or X
2.
[0057] Using the above formulas, the relation between X and Y can be represented as follows.

[0058] In this case,

indicates a gain X
1Y
1,
w2+
w4 indicates a gain X
1Y
2, 
indicates a gain X
2Y
2, and
Aw2+
w4 indicates a gain X
2Y
1.
[0059] In Formula 22, since a speech signal is canceled out by adding a phase having the
gains X
1Y
2 and X
2Y
1 inverted to an original phase, it is able to output a non-canceled speech signal
by inverting a phase of either X
1 or X
2 or a phase of a gain.
[0060] The present invention relates to a method of independently controlling a speech signal
in an input signal having an inverted phase generated from inverting a phase of a
gain, by which the present invention is non-limited. In an inverse-phase mono signal
environment, if phases of the gains X
1Y
2 and X
2Y
1 are inverted, Y
1 and Y
2 can be outputted while phases of X
1 and X
2 are maintained. Namely, a speech signal can be outputted by being controlled (e.g.,
a dialog volume is controlled) while an inverse-phase mono signal environment is maintained.
On the other hand, if phase of gains X
2Y
1 and X
2Y
2 are inverted, Y
1 and Y
2 are outputted as a general mono environment signal having the same phase of the input
X
1 instead of the inverse-phase mono signal environment. If phases of gains X
1Y
1 and X
1Y
2 are inverted, Y
1 and Y
2 are outputted as a general mono environment signal having the same phase of the input
X
2.
[0061] FIG. 5 is a block diagram of a speech signal control system including an inverse
phase detecting unit according to an embodiment of the present invention.
[0062] Referring to FIG. 5, a speech signal is estimated by a speech signal estimation unit
520 using an input signal. A prescribed gain (e.g., a gain set by a user) is applicable
to the estimated speech signal. Subsequently, a gain of an output signal is obtained
by a gain obtaining unit 540. Meanwhile, it is determined whether an input signal
is an inverse-phase mono signal through an inverse phase detecting unit 520. A sign
or value of the gain obtained by the gain obtaining unit 540 is modified by a gain
modification unit 550. Thus, the speech signal can be modified. For clarity and convenience
of description of the present invention, a method of estimating or controlling a speech
signal on a whole band of an input audio signal is explained, by which the present
invention is non-limited. Namely, according to a prescribed embodiment, the system
500 includes an analysis filterbank, a power estimator, a signal estimator, a post
scaling module, a signal synthesis module and a synthesis filterbank. Hence, it may
be more efficient if an input audio signal is divided on a plurality of subbands and
a speech signal is then estimated per subband by a speech signal estimator [not shown
in the drawing]. The elements of the speech signal control system 500 can exist as
separated processes. And, processes of at least two or more elements can be combined
into one element.
[0063] The present invention needs to determine whether an input signal environment is an
inverse-phase mono signal environment through the inverse phase detecting unit 520.
According to a prescribed embodiment, the inverse phase detecting unit 520 checks
inter-channel correlation of an input signal frame per subband. If a sum of them fails
to reach a threshold value, the corresponding frame is regarded as an inverse-phase
mono signal frame. Alternatively, the inverse phase detecting unit 520 checks inter-channel
correlation of an input signal frame per subband. If the subband number, which is
negative, is greater than a threshold value, it is able to regard the corresponding
frame as an inverse-phase mono signal frame. Furthermore, the above method is usable
together.
[0064] FIG. 6 is a block diagram of a speech signal control system including an auto SDV
e detecting unit according to an embodiment of the present invention. If a dialog
of an audio signal is considerably greater than a noise component of an audio signal
or an outside nose, necessity of SDV is reduced. Hence, it is able to determine a
method of SDV operation by automatically determining necessity of the SDV operation.
Referring to FIG. 6, the speech signal control system includes an auto SDV detecting
unit 610 and an SDV processing unit 620. It is able to vary a presence or non-presence
of the SDV operation and an extent of gain by automatically determining the necessity
of the SDV operation via the auto SDV detecting unit 610. In particular, a speech
signal is estimated by a speech signal estimation unit 630. A gain of an output signal
is obtained by a gain obtaining unit 640. And, a gain modification unit 650 changes
a sign of a gain or modifies a value of the gain determined by the auto SDV detecting
unit 610. And, a signal modification unit 660 can modify the speech signal based on
the modified gain.
[0065] According to a prescribed embodiment, first of all, the auto SDV detecting unit 610
determines to perform the SDV operation only if a power P
C of a dialog component signal is smaller than a power P
n of a noise component within a signal or a power Ps of an outside noise (it can be
limited to a specific ratio). Secondly, the auto SDV detecting unit 610 is able to
determine to perform the SDV operation by attaching such a device for measuring an
outside noise as a microphone and the like to an outside of an application provided
with an SDV device and then measuring an extent of an outside noise obtained through
this device. Optionally, the auto SDV detecting unit 610 can use both of the above
methods together.
[0066] By determining a presence or non-presence of the SDV operation according to the above
method, the SDV is activated according to an input signal or a noise extent of an
outside environment or an input can be outputted intact. According to an input signal
or a value of noise of an outside, environment, it is able to vary a value of a gain
for a dialog component of an audio signal. An auto SDV method with reference to a
power according to an embodiment of the present invention is explained, by which the
present invention is non-limited. And, the present invention is able to take other
formulas and parameters including absolute values and the like into consideration.
[0067] FIG. 7 is a block diagram of an audio signal processing apparatus due to characteristics
of a detected sound according to an embodiment of the present invention.
[0068] Referring to FIG. 7, independent sound quality reinforcing methods are applicable
to a dialog, directional sound and surround sound, which are detected using an SDV
process unit 710, respectively. In particular, a signal processing can be differently
performed according to a characteristic of a detected sound. For instance, it is able
to perform equalization for sound quality reinforcement or sound color change per
signal, watermark and other signal processes using a sound discriminated after SDV
as an input. In case of a dialog, such a signal process as voice cancellation for
commercial and other usages can be performed. In case of a directional sound, such
a signal process as sound widening for surround effect enhancement can be performed.
In case of a surround sound, such a signal process as 3D sound effect enhancement
can be performed. Meanwhile, by obtaining a characteristic of a signal inputted from
the SDV process unit 710, it is ale to discriminate a dialog or a directional sound
through a frequency, an imaged position or the like. And, the dialog is mostly located
at a center due to its characteristics and its position is not changed. In particular,
in case that an inter-channel level difference (ICLD) varies less, it is highly possible
that an input signal is a dialog.
[0069] FIG. 8 is a block diagram of a speech signal control system including an ICLD detecting
unit according to an embodiment of the present invention.
[0070] Referring to FIG. 8, an SDV process unit 820 calculates an ICLD per band for an input
signal frame and then delivers the information to an ICLD variation detecting unit
810. The ICLD variation detecting unit 810 then compares the delivered ICLD information
per band of a current frame to per-band ICLD information of a preceding frame. If
there is no variation of the ICLD or small variation of the ICLD exists (determined
as a dialog), classification of the input signal frame is handed over to the SDV process
unit. If the ICLD variation is large, the ICLD variation detecting unit 810 determines
that the input signal frame is not the dialog despite that the SDV process unit determines
that the input signal frame is a dialog and is then able to use the information for
the gain control.
[0071] FIG. 9 is a partial diagram of a remote controller including a remote controller
volume button having an SDV controller for controlling a dialog volume.
[0072] Referring to FIG. 9, a main volume control button 910 for increasing or decreasing
a main volume (e.g., a volume of a whole signal) is located top to bottom. And, a
speech signal volume control button 920 for increasing or decreasing a volume of such
a specific audio signal as a speech signal computed via a speech signal estimation
unit can be located right to left. The remote controller volume button is one embodiment
of a device for controlling a speech signal volume, by which the present invention
is non-limited.
[0073] FIG. 10 and FIG. 11 are diagrams for a method of notifying dialog volume control
information via OSD (on screen display) of a television receiver.
[0074] Referring to FIG. 10, a length of a volume bar indicates a main volume, while a width
of the volume bar indicates a level of a dialog volume. In particular, if the length
of the volume bar increases more, it may indicate that a level of the main volume
is raised higher. If the width of the volume bar increases more, it may mean that
a level of the dialog volume is raised higher.
[0075] Referring to FIG. 11, a dialog volume level can be represented using a color of a
volume bar instead of a width of the volume bar. In particular, if a density of color
of a volume bar increases,' it may mean that a level of a dialog volume is raised.
[0076] FIG. 12 is a block diagram of an example digital television system 1200 for implementing
the features and process described in reference to FIG. 1-11. Digital television (DTV)
is a telecommunication system for broadcasting and receiving moving pictures and sound
by means of digital signals. DTV uses digital modulation data, which is digitally
compressed and requires decoding by a specially designed television set, or a standard
receiver with a set-top box, or a PC fitted with a television card. Although the system'
in FIG. 12 is a DTV system, the disclosed implementations for dialog enhancement can
also be applied to analog TV systems or any other systems capable of dialog enhancement.
[0077] In some implementations, the system 1200 can include an interface 1202, a demodulator
1204, a decoder 1206, and audio/visual output 1208, a user input interface 1210, one
or more processors 1212 and one or more computer readable mediums 1214 (e.g., RAM,
ROM, SDRAM, hard disk, optical disk, flash memory, SAN, etc.). Each of these components
are coupled to one or more communication channels 1216 (e.g., buses). In some implementations,
the interface 1202 includes various circuits for obtaining an audio signal or a combined
audio/video signal. For example, in an analog television system ah interface can include
antenna electronics, a tuner or mixer, a radio frequency (RF) amplifier, a local oscillator,
an intermediate frequency (IF) amplifier, one or more filters, a demodulator, an audio
amplifier, etc. Other implementations of the system 1200 are possible, including implementations
with more or fewer components.
[0078] The tuner 1202 can be a DTV tuner for receiving a digital televisions signal including
video and audio content. The demodulator 1204 extracts video and audio signals from
the digital television signal. If the video and audio signals are encoded (e.g., MPEG
encoded), the decoder 1206 decodes those signals. The A/V output can be any device
capable of display video and playing audio (e.g., TV display, computer monitor, LCD,
speakers, audio systems).
[0079] In some implementations, dialog volume levels can be displayed to the user using
a display device on a remote controller or an On Screen Display (OSD), for example,
and the user input interface can include circuitry (e.g., a wireless or infrared receiver)
and/or software for receiving and decoding infrared or wireless signals generated
by a remote controller. A remote controller can include a separate dialog volume control
key or button, or a master volume control button and dialog volume control button
described in reference to FIG. 10-11.
[0080] In some implementations, the one or more processors can execute code stored in the
computer-readable medium 1214 to implement the features and operations 1218, 1220,
1222, 1226, 1228, 1230 and 1232.
[0081] The computer-readable medium further includes an operating system 1218, analysis/synthesis
filterbanks 1220, a power estimator 1222, a signal estimator 1224, a post-scaling
module 1226 and a signal synthesizer 1228.
[0082] While the present invention has been described and illustrated herein with reference
to the preferred embodiments thereof, it will be apparent to those skilled in the
art that va-rious modifications and variations can be made therein without departing
from the spirit and scope of the invention. Thus, it is intended that the present
invention covers the modifications and variations of this invention that come within
the scope of the appended claims and their equivalents.
[Industrial Applicability]
[0083] Accordingly, the present invention is applicable to encoding/decoding of audio signals.
1. A method for processing an audio signal, comprising:
obtaining a stereophonic audio signal including a speech component signal and other
component signals;
obtaining gain values for each channel of the audio signal;
determining whether the audio signal is an inverse phase mono signal including left
and right channel whose phase is inverted;
inverting a phase of the obtained gain value corresponding to the one channel of the
audio signal when the audio signal is an inverse-phase mono signal;
modifying the speech component signal based on the inverted phase of the gain value;
and
generating a modified audio signal including the modified speech component signal,
wherein the modified audio signal is in-phase mono signal.
2. The method of claim 1, wherein the modified audio signal is inverse-phase mono signal.
3. The method of claim 1 or 2, wherein the determining further comprising:
determining inter-channel correlation between two channels of the audio signal;
comparing one or more threshold values with the inter-channel correlation; and
determining whether the audio signal is an inverse-phase mono signal based on results
of the comparison.
4. The method of claim 3, wherein the inter-channel correlation is determined per sub-band,
and the audio signal is an inverse-phase mono signal if a sum of the inter-channel
correlations is smaller than one or more threshold.
5. The method of claim 1 or 2, wherein the determining further comprising:
determining inter-channel correlation between two channels of the audio signal;
comparing one or more threshold values with the number of the inter-channel correlation
which is minus; and
determining whether the audio signal is an inverse-phase mono signal based on results
of the comparison.
6. The method of claim 5, wherein the inter-channel correlation is determined per sub-band,
and the audio signal is an inverse-phase mono signal if the number of the inter-channel
correlation which is minus is larger than one or more threshold.
7. A method for processing an audio signal, the method comprising:
obtaining a stereophonic audio signal including a speech component signal and other
component signals;
determining whether the audio signal is an inverse-phase mono signal including left
and right channel whose phase is inverted;
inverting a phase of the one channel of the audio signal when the audio signal is
an inverse-phase mono signal;
obtaining gain values for each channel of the audio signal;
modifying the speech component signal based on the obtained gain values; and
generating a modified audio signal including the modified speech component signal,
wherein the modified audio signal is in-phase mono signal.
8. The method of claim 7, wherein the determining further comprising:
determining inter-channel correlation between two channels of the audio signal;
comparing one or more threshold values with the inter-channel correlation; and
determining whether the audio signal is an inverse-phase mono signal based on results
of the comparison.
9. The method of claim 8, wherein the inter-channel correlation is determined per sub-band,
and the audio signal is an inverse-phase mono signal if a sum of the inter-channel
correlations is smaller than one or more threshold.
10. The method of claim 7, wherein the determining further comprising:
determining inter-channel correlation between two channels of the audio signal;
comparing one or more threshold values with the number of the inter-channel correlation
which is minus; and
determining whether the audio signal is an inverse-phase mono signal based on results
of the comparison.
11. The method of claim 10, wherein the inter-channel correlation is determined per sub-band,
and the audio signal is an inverse-phase mono signal if the number of the inter-channel
correlation which is minus is larger than one or more threshold.
12. An apparatus for processing an audio signal, the apparatus comprising:
a gain obtaining unit obtaining a stereophonic audio signal including a speech component
signal and other component signals, and obtaining gain values for each channel of
the audio signal;
an inverse phase detecting unit determining whether the audio signal is an inverse-phase
mono signal including left and right channel whose phase is inverted;
a gain modification unit inverting a phase of the obtained gain value corresponding
to the one channel of the audio signal when the audio signal is an inverse-phase mono
signal; and
a signal modification unit modifying the speech component signal based on the inverted
phase of the gain values, and generating a modified audio signal including the modified
speech component signal,
wherein the modified audio signal is in-phase mono signal.
13. An apparatus for processing an audio signal, the apparatus comprising:
a gain obtaining unit obtaining a stereophonic audio signal including a speech component
signal and other component signals;
an inverse phase detecting unit determining whether the audio signal is an inverse-phase
mono signal including left and right channel whose phase is inverted; and
a signal modification unit inverting a phase of the one channel of the audio channel
when the audio signal is an inverse-phase mono signal, obtaining gain values for each
channel of the audio signal, modifying the speech component signal based on the obtained
gain values, and generating a modified audio signal including the modified speech
component signal,
wherein the modified audio signal is in-phase mono signal.