[Technical Field]
[0001] Apparatuses and methods consistent with exemplary embodiments relate to audio encoding/decoding,
and more particularly, to an audio signal processing method capable of minimizing
deterioration of sound quality when a multi-channel audio signal is restored, an audio
encoding apparatus, an audio decoding apparatus, and a terminal adopting the same.
[Background Art]
[0002] Recently, along with the spread of multimedia content, demands of users desiring
to experience a relatively realistic and rich sound source environment have increased.
To satisfy these demands of users, researches of multi-channel audio have been briskly
conducted.
[0003] A multi-channel audio signal requires a high-efficiency data compression ratio according
to a transmission environment. Specifically, a spatial parameter is used to restore
the multi-channel audio signal. In a process of extracting the spatial parameter,
distortion may occur due to affection of a reverberation signal. Then, deterioration
of sound quality may occur when the multi-channel audio signal is restored.
[0004] Therefore, there is required a multi-channel audio codec technique capable of reducing
or removing deterioration of sound quality which may occur when a multi-channel audio
signal is restored using a spatial parameter.
[Disclosure]
[Technical Problem]
[0005] Aspects of one or more exemplary embodiments provide an audio signal processing method
capable of minimizing deterioration of sound quality when a multi-channel audio signal
is restored, an audio encoding apparatus, an audio decoding apparatus, and a terminal
adopting the same.
[Technical Solution]
[0006] According to an aspect of one or more exemplary embodiments,, there is provided an
audio signal processing method including: when a first plurality of input channels
are down-mixed to a second plurality of output channels, comparing locations of the
first plurality of input channels with locations of the second plurality of output
channels; down-mixing channels of the first plurality of input channels, which have
the same locations as those of the second plurality of output channels, to channels
at the same locations among the second plurality of output channels; searching for
at least one adjacent channel for each of the remaining channels among the first plurality
of input channels; determining a weighting factor for the searched adjacent channel
in consideration of at least one of a distance between channels, a correlation between
signals, and an error during restoration; and down-mixing each of the remaining channels
among the first plurality of input channels to the adjacent channel based on the determined
weighting factor.
[Description of Drawings]
[0007]
FIG. 1 is a block diagram of an audio signal processing system according to an exemplary
embodiment;
FIG. 2 is a block diagram of an audio encoding apparatus according to an exemplary
embodiment;
FIG. 3 is a block diagram of an audio decoding apparatus according to an exemplary
embodiment;
FIG. 4 illustrates channel matching between a 10.2-channel audio signal and a 5.1-channel
audio signal, according to an exemplary embodiment;
FIG. 5 is a flowchart of a down-mixing method according to an exemplary embodiment;
FIG. 6 is a flowchart of an up-mixing method according to an exemplary embodiment;
FIG. 7 is a block diagram of a spatial parameter encoding apparatus according to an
exemplary embodiment;
FIGS. 8A and 8B illustrate variable quantization steps according to energy values
in frequency bands of each frame for a down-mixed channel;
FIG. 9 is a graph showing per-frequency band energy distribution of spectral data
for the whole channels;
FIGS. 10A to 10C are graphs showing a total bitrate adjusted by changing a threshold
value;
FIG. 11 is a flowchart of a method of generating a spatial parameter, according to
an exemplary embodiment;
FIG. 12 is a flowchart of a method of generating a spatial parameter, according to
another exemplary embodiment;
FIG. 13 is a flowchart of an audio signal processing method according to an exemplary
embodiment;
FIGS. 14A to 14C show an example for describing operation 1110 of FIG. 11 or operation
1330 of FIG. 13;
FIG. 15 shows another example for describing operation 1110 of FIG. 11 or operation
1330 of FIG. 13;
FIGS. 16A to 16D show another example for describing operation 1110 of FIG. 11 or
operation 1330 of FIG. 13;
FIG. 17 is a graph showing a total sum of angle parameters;
FIG. 18 is to describe calculation of angle parameters, according to an exemplary
embodiment;
FIG. 19 is a block diagram of an audio signal processing system integrating a multi-channel
codec and a core codec, according to an exemplary embodiment;
FIG. 20 is a block diagram of an audio encoding apparatus according to an exemplary
embodiment; and
FIG. 21 is a block diagram of an audio decoding apparatus according to an exemplary
embodiment;
[Mode for Invention]
[0008] The present invention may allow various kinds of change or modification and various
changes in form, and specific embodiments will be illustrated in drawings and described
in detail in the specification. However, it should be understood that the specific
embodiments do not limit the present invention to a specific disclosing form but include
every modified, equivalent, or replaced one within the spirit and technical scope
of the present invention. In the following description, well-known functions or constructions
are not described in detail so as not to obscure the invention with unnecessary detail.
[0009] Although terms, such as 'first' and 'second', can be used to describe various elements,
the elements cannot be limited by the terms. The terms can be used to classify a certain
element from another element.
[0010] The terminology used in the present invention is used only to describe specific embodiments
and does not have any intention to limit the present invention. Although general terms
as currently widely used as possible are selected as the terms used in the present
invention while taking functions in the present invention into account, they may vary
according to an intention of one of ordinary skill in the art, judicial precedents,
or the appearance of new technology. In addition, in specific cases, terms intentionally
selected by the applicant may be used, and in this case, the meaning of the terms
will be disclosed in a corresponding description of the invention. Accordingly, the
terms used in the present invention should be defined not by simple names of the terms
but by the meaning of the terms and the contents over the present invention.
[0011] The singular forms are intended to include the plural forms as well, unless the context
clearly indicates otherwise. In the present invention, it should be understood that
terms, such as 'include' and 'have', are used to indicate the existence of an implemented
feature, number, step, operation, element, part, or a combination thereof without
excluding in advance the possibility of the existence or addition of one or more other
features, numbers, steps, operations, elements, parts, or combinations thereof.
[0012] The present invention will now be described more fully with reference to the accompanying
drawings, in which exemplary embodiments of the present invention are shown. Like
reference numerals in the drawings denote like elements, and thus their repetitive
description will be omitted.
[0013] FIG. 1 is a block diagram of an audio signal processing system 100 according to an
exemplary embodiment. The audio signal processing system 100 corresponds to a multimedia
device and may include a voice communication exclusive terminal including a telephone,
a mobile phone, and the like, a broadcasting or music exclusive terminal including
a TV, an MP3 player, and the like, or a hybrid terminal of the voice communication
exclusive terminal and the broadcasting or music exclusive terminal but is not limited
thereto. The audio signal processing system 100 may be used as a client, a server,
or a transducer disposed between a client and a server.
[0014] Referring to FIG. 1, the audio signal processing system 100 includes an encoding
apparatus 110 and a decoding apparatus 120. According to an exemplary embodiment,
the audio signal processing system 100 may include both the encoding apparatus 110
and the decoding apparatus 120, and according to another exemplary embodiment, the
audio signal processing system 100 may include any one of the encoding apparatus 110
and the decoding apparatus 120.
[0015] The encoding apparatus 110 receives an original signal formed with a plurality of
channels, i.e., a multi-channel audio signal, and generates a down-mixed audio signal
by down-mixing the original signal. The encoding apparatus 110 generates a prediction
parameter and encodes the prediction parameter. The prediction parameter is applied
to restore the original signal from the down-mixed audio signal. In detail, the prediction
parameter is a value associated with a down-mix matrix used for down-mixing the original
signal, each coefficient value included in the down-mix matrix, and the like. For
example, the prediction parameter may include a spatial parameter. The prediction
parameter may vary according to a product specification, a design specification, and
the like of the encoding apparatus 110 or the decoding apparatus 120 and may be set
as an experimentally optimized value. Herein, a channel may indicate a speaker.
[0016] The decoding apparatus 120 generates a restored signal corresponding to the original
signal, i.e., the multi-channel audio signal, by up-mixing the down-mixed audio signal
using the prediction parameter.
[0017] FIG. 2 is a block diagram of an audio encoding apparatus 200 according to an exemplary
embodiment.
[0018] Referring to FIG. 2, the audio encoding apparatus 200 may include a down-mixing unit
210, a side information generation unit 220, and an encoding unit 230. The components
may be integrated as at least one module and implemented as at least one processor
(not shown).
[0019] The down-mixing unit 210 receives an N-channel audio signal and down-mixes the received
N-channel audio signal. The down-mixing unit 210 may generate a mono-channel audio
signal or an M-channel audio signal by down-mixing the N-channel audio signal (M<N).
For example, the down-mixing unit 210 may generate a three-channel audio signal or
six-channel audio signal by down-mixing a 10.2-channel audio signal so as to correspond
to a 2.1-channel audio signal or a 5.1-channel audio signal.
[0020] According to an exemplary embodiment, the down-mixing unit 210 generates a first
mono channel by selecting and down-mixing two of the N channels and generates a second
mono channel by down-mixing the generated first mono channel and another channel.
A final mono-channel audio signal or the M-channel audio signal may be generated by
repeating a process of down-mixing a mono channel generated as a down-mixing result
and another channel.
[0021] To down-mix the N-channel audio signal while minimizing entropy, it is preferable
that similar channels are down-mixed. Therefore, the down-mixing unit 210 may down-mix
a multi-channel audio signal at a relatively high compression ratio by down-mixing
channels having a high correlation therebetween.
[0022] The side information generation unit 220 generates side information required to restore
multiple channels from a down-mixed channel. Every time the down-mixing unit 210 sequentially
down-mixes multiple channels, the side information generation unit 220 generates side
information required to restore the multiple channels from the down-mixed channel.
At this time, the side information generation unit 220 may generate information for
determining intensities of two channels to be down-mixed and information for determining
phases of the two channels.
[0023] In addition, every time down-mixing is performed, the side information generation
unit 220 generates information indicating which channels have been down-mixed. When
channels are down-mixed in an order based on a correlation calculation instead of
a fixed order, the side information generation unit 220 may generate a down-mixing
order of the channels as the side information.
[0024] The side information generation unit 220 repeats generation of information required
to restore channels down-mixed to a mono channel every time down-mixing is performed.
For example, if a mono channel is generated by sequentially down-mixing 12 channels
11 times, information on a down-mixing order, information for determining intensities
of channels, and information for determining phases of the channels are generated
11 times. According to an exemplary embodiment, when information for determining intensities
of channels and information for determining phases of the channels are generated for
each of a plurality of frequency bands, if the number of frequency bands is k, 11*k
pieces of information for determining intensities of channels may be generated, and
11*k pieces of information for determining phases of channels may be generated.
[0025] The encoding unit 230 may encode the mono-channel audio signal or the M-channel audio
signal down-mixed and generated by the down-mixing unit 210. If the audio signal output
from the down-mixing unit 210 is an analog signal, the analog signal is converted
into a digital signal, and symbols are encoded according to a predetermined algorithm.
An encoding algorithm is not limited, and all algorithms for generating a bitstream
by encoding an audio signal may be used in the encoding unit 230. In addition, the
encoding unit 230 may encode the side information generated to restore a multi-channel
audio signal from a mono-channel audio signal by the side information generation unit
220.
[0026] FIG. 3 is a block diagram of an audio decoding apparatus 300 according to an exemplary
embodiment.
[0027] Referring to FIG. 3, the audio decoding apparatus 300 may include an extraction unit
310, a decoding unit 320, and an up-mixing unit 330. The components may be integrated
as at least one module and implemented as at least one processor (not shown).
[0028] The extraction unit 310 extracts encoded audio and encoded side information from
received audio data, i.e., a bitstream. The encoded audio may be generated by down-mixing
N channels to a mono channel or M channels (M<N) and encoding an audio signal according
to a predetermined algorithm.
[0029] The decoding unit 320 decodes the encoded audio and the encoded side information
extracted by the extraction unit 310. In this case, the decoding unit 320 decodes
the encoded audio and the encoded side information by using the same algorithm as
used for encoding. As a result of the audio decoding, a mono-channel audio signal
or an M-channel audio signal is restored.
[0030] The up-mixing unit 330 restores an N-channel audio signal before down-mixing by up-mixing
the audio signal decoded by the decoding unit 720. At this time, the up-mixing unit
330 restores the N-channel audio signal based on the side information decoded by the
decoding unit 320.
[0031] That is, the up-mixing unit 330 up-mixes a down-mixed audio signal to a multi-channel
audio signal by reversely performing a down-mixing process with reference to side
information that is a spatial parameter. At this time, channels are sequentially separated
from a mono channel by referring to the side information in which information on a
down-mixing order of the channels is included. The channels may be sequentially separated
from the mono channel by determining intensities and phases of channels, which have
been down-mixed, according to information for determining the intensities and phases
of the channels, which have been down-mixed.
[0032] FIG. 4 illustrates channel matching between a 10.2-channel audio signal 410 and a
5.1-channel audio signal 420, according to an exemplary embodiment.
[0033] When an input multi-channel audio signal is a 10.2-channel audio signal, a multi-channel
audio signal, such as a 7.1-channel audio signal, a 5.1-channel audio signal, or a
2.0-channel audio signal, down-mixed to a smaller number of channels than the 10.2
channels may be used as an output multi-channel audio signal.
[0034] As shown in FIG. 4, when the 10.2-channel audio signal 410 is down-mixed to the 5.1-channel
audio signal 420, if FL and RL channels in the 5.1 channels are confirmed as adjacent
channels of an LW channel in the 10.2 channels, weighting factors of the FL and RL
channels may be determined in consideration of a location, a correlation or an error
during restoration. According to an exemplary embodiment, if it is determined that
a weighting factor of the FL channel is 0, and a weighting factor of the RL channel
is 1, a channel signal of the LW channel in the 10.2 channels may be down-mixed to
the RL channel in the 5.1 channels.
[0035] In addition, L and Ls channels in the 10.2 channels may be allocated to the FL and
RL channels in the 5.1 channels at the same locations, respectively.
[0036] FIG. 5 is a flowchart of a down-mixing method according to an exemplary embodiment.
[0037] Referring to FIG. 5, in operation 510, the number and locations of input channels
are checked from first layout information. For example, the first layout information
is IC(1), IC(2), ..., IC(N), and locations of N input channels may be checked from
the first layout information.
[0038] In operation 520, the number and locations of down-mixed channels, i.e., output channels,
are checked from second layout information. For example, the second layout information
is DC(1), DC(2), ..., DC(N), and locations of M output channels (M<N) may be checked
from the second layout information.
[0039] In operation 530, it is determined whether a channel having a same output location
among the input channels and the output channels exists by starting from a first channel
IC(1) of the input channels.
[0040] In operation 540, if a channel having a same output location among the input channels
and the output channels exists, a channel signal of a corresponding input channel
is allocated to an output channel at the same location. For example, if output locations
of an input channel IC(n) and an output channel DC(m) are the same, DC(m) may be DC(m)+IC(n).
[0041] In operation 550, if no channel having a same output location among the input channels
and the output channels exists, it is determined whether a channel adjacent to an
input channel IC(n) among the output channels exists by starting from the first channel
IC(1) of the input channels.
[0042] In operation 560, if it is determined in operation 550 that a plurality of adjacent
channels exist, a channel signal of the input channel IC(n) is distributed to each
of the plurality of adjacent channels by using a predetermined weighting factor corresponding
to each of the plurality of adjacent channels. For example, if it is determined that
DC(i), DC(j), and DC(k) of the output channels are adjacent channels of the input
channel IC(n), weighting factors w
i, w
j, and w
k may be set for the input channel IC(n) and the output channel DC(i), the input channel
IC(n) and the output channel DC(j), and the input channel IC(n) and the output channel
DC(k), respectively. The channel signal of the input channel IC(n) may be distributed
by using the set weighting factors w
i, w
j, and w
k such that DC(i)=DC(i)+w
i*IC(n), DC(j)=DC(j)+w
j*IC(n), and DC(k)=DC(k)+w
k*IC(n).
[0043] A weighting factor may be set by a method described below.
[0044] According to an exemplary embodiment, a weighting factor may be determined according
to a relationship between a plurality of adjacent channels and the input channel IC(n).
For the relationship between a plurality of adjacent channels and the input channel
IC(n), at least one of a channel length between the plurality of adjacent channels
and the input channel IC(n), a correlation between a channel signal of each of the
plurality of adjacent channels and the channel signal of the input channel IC(n),
and an error during restoration in the plurality of adjacent channels.
[0045] According to another exemplary embodiment, a weighting factor may be determined as
0 or 1 according to a relationship between a plurality of adjacent channels and the
input channel IC(n). For example, an adjacent channel closest to the input channel
IC(n) among the plurality of adjacent channels may be determined as 1, and the remaining
adjacent channels may be determined as 0. Alternatively, an adjacent channel having
a channel signal which has the highest correlation with the channel signal of the
input channel IC(n) among channel signals of the plurality of adjacent channels may
be determined as 1, and the remaining adjacent channels may be determined as 0. Alternatively,
an adjacent channel having the least error during restoration among the plurality
of adjacent channels may be determined as 1, and the remaining adjacent channels may
be determined as 0.
[0046] In operation 570, it is determined whether all the input channels have been checked,
and if all the input channels have not been checked, the method proceeding to operation
530 to repeat operations 530 to 560.
[0047] In operation 580, if all the input channels have been checked, configuration information
and a corresponding spatial parameter of down-mixed channels having the signals allocated
in operation 540 and the signals distributed in operation 560 are finally generated.
[0048] The down-mixing method according to an exemplary embodiment may be performed in units
of channels, frames, frequency bands or frequency spectra, and thus, the accuracy
of performance improvement may be adjusted according to circumstances. Herein, a frequency
band is a unit of grouping samples of an audio spectrum and may have a uniform or
non-uniform length by reflecting a threshold band. In a case of the non-uniform length,
one frame may be set so that the number of samples included in each frequency band
gradually increases from a starting sample to the last sample. If multiple bitrates
are supported, the number of samples included in each of frequency bands corresponding
at different bitrates may be set to be the same. The number of samples included in
one frame or one frequency band may be determined in advance.
[0049] In the down-mixing method according to an exemplary embodiment, a weighting factor
used for channel down-mixing may be determined in correspondence with a layout of
down-mixed channels and a layout of input channels. Accordingly, the down-mixing method
may adaptively deal with various layouts, and a weighting factor may be determined
in consideration of locations of channels, a correlation between channel signals,
or an error during restoration, thereby improving sound quality. In addition, channels
down-mixed in consideration of locations of channels, a correlation between channel
signals, or an error during restoration are configured, and thus, if an audio decoding
apparatus has the same channels as the number of down-mixed channels, even though
a user listens to only the down-mixed channels without a separate up-mixing process,
that the user cannot recognize subjective deterioration of sound quality.
[0050] FIG. 6 is a flowchart of an up-mixing method according to an exemplary embodiment.
[0051] Referring to FIG. 6, in operation 610, configuration information and a corresponding
spatial parameter of down-mixed channels, which are generated through the process
as shown in FIG. 5, are received.
[0052] In operation 620, an input channel audio signal is restored by up-mixing the down-mixed
channels using the configuration information and the corresponding spatial parameter
of the down-mixed channels, which are received in operation 610.
[0053] FIG. 7 is a block diagram of a spatial parameter encoding apparatus 700 according
to an exemplary embodiment, which may be included in the encoding unit 230 of FIG.
2.
[0054] Referring to FIG. 7, the spatial parameter encoding apparatus 700 may include an
energy calculation unit 710, a quantization step determination unit 720, a quantization
unit 730, and a multiplexing unit 740. The components may be integrated as at least
one module and implemented as at least one processor (not shown).
[0055] The energy calculation unit 710 receives a down-mixed channel signal provided from
the down-mixing unit (refer to 210 of FIG. 2) and calculates an energy value in units
of channels, frames, frequency bands, or frequency spectra. Herein, an example of
an energy value may be a norm value.
[0056] The quantization step determination unit 720 determines a quantization step by using
the energy value calculated in units of channels, frames, frequency bands, or frequency
spectra, which is provided from the energy calculation unit 710. For example, a quantization
step may be small for a channel, a frame, a frequency band, or a frequency spectrum
having a large energy value, and a quantization step may be large for a channel, a
frame, a frequency band, or a frequency spectrum having a small energy value. In this
case, two quantization steps may be set, and one of the two quantization steps may
be selected according to a result of comparing an energy value with a predetermined
threshold value. When a quantization step is adaptively allocated in correspondence
with a distribution of energy values, a quantization step matching a distribution
of energy values may be selected. Accordingly, bits to be allocated for quantization
may be adjusted based on auditory importance, thereby improving sound quality. According
to an exemplary embodiment, a total bitrate may be adjusted by variably changing a
threshold frequency while maintaining a weighting factor allocated according to energy
distribution of each down-mixed channel.
[0057] The quantization and lossless encoding unit 730 quantizes a spatial parameter in
units of channels, frames, frequency bands, or frequency spectra by using the quantization
step determined by the quantization step determination unit 720 and lossless-encodes
the quantized spatial parameter.
[0058] The multiplexing unit 740 generates a bitstream by multiplexing the lossless-encoded
spatial parameter and a lossless-encoded down-mixed audio signal.
[0059] FIGS. 8A and 8B illustrate variable quantization steps according to energy values
in frequency bands of each frame for a down-mixed channel, wherein a channel 1 and
a channel 2 are down-mixed, and a channel 3 and a channel 4 are down-mixed. In FIGS.
8A and 8B, d0 denotes energy values of a down-mixed channel of the channel 1 and the
channel 2, and d1 denotes energy values of a down-mixed channel of the channel 3 and
the channel 4.
[0060] FIGS. 8A and 8B indicate that two quantization steps are set, and a hatched portion
corresponds to a frequency band having an energy value that is equal to or greater
than a predetermined threshold value, and thus, a small quantization step is set for
the hatched portion.
[0061] FIG. 9 is a graph showing per-frequency band energy distribution of spectral data
for the whole channels, and FIGS. 10A to 10C are graphs showing a total bitrate adjusted
by changing a threshold frequency in consideration of energy distribution in a state
where a weighting factor is allocated according to an energy value for each channel.
[0062] FIG. 10A shows an example in which a small quantization step is set to left parts,
i.e., low-frequency regions 110a, 120a, and 130a less than an initial threshold frequency
100a, and a large quantization step is set to right parts, i.e., high-frequency bands
110b, 120b, and 130b greater than the initial threshold frequency 100a, based on the
initial threshold frequency 100a. FIG. 10B shows an example in which a threshold frequency
100b higher than the initial threshold frequency 100a is used to increase regions
140a, 150a, and 160a for which the small quantization step is set, thereby increasing
a total bitrate. FIG. 10C shows an example in which a threshold frequency 100c lower
than the initial threshold frequency 100a is used to decrease regions 170a, 180a,
and 190a for which the small quantization step is set, thereby decreasing a total
bitrate.
[0063] FIG. 11 is a flowchart of a method of generating a spatial parameter, according to
an exemplary embodiment, which may be performed by the encoding apparatus 200 of FIG.
2.
[0064] Referring to FIG. 11, in operation 1110, N angle parameters are generated.
[0065] In operation 1120, (N-1) angle parameters among the N angle parameters are independently
encoded.
[0066] In operation 1130, the remaining one angle parameter is predicted from the (N-1)
angle parameters.
[0067] In operation 1140, the predicted angle parameter is residual-encoded to generate
a residue of the remaining one angle parameter.
[0068] FIG. 12 is a flowchart of a method of generating a spatial parameter, according to
another exemplary embodiment, which may be performed by the decoding apparatus 200
of FIG. 3.
[0069] Referring to FIG. 12, in operation 1210, (N-1) angle parameters among N angle parameters
are received.
[0070] In operation 1220, the remaining one angle parameter is predicted from the (N-1)
angle parameters.
[0071] In operation 1230, the remaining one angle parameter is generated by adding the predicted
angle parameter and a residue.
[0072] FIG. 13 is a flowchart of an audio signal processing method according to an exemplary
embodiment.
[0073] Referring to FIG. 13, in operation 1310, first to nth channel signals ch1 to chn
that are multiple channel signals are down-mixed. In detail, the first to nth channel
signals ch1 to chn may be down-mixed to one mono signal DM. Operation 1310 may be
performed by the down-mixing unit 210.
[0074] In operation 1320, (n-1) channel signals among the first to nth channel signals ch1
to chn are summed, or the first to nth channel signals ch1 to chn are summed. In detail,
channel signals except for a reference channel signal among the first to nth channel
signals ch1 to chn may be summed, and the summed signal becomes a first sum signal.
Alternatively, the first to nth channel signals ch1 to chn may be summed, and the
summed signal becomes a second sum signal.
[0075] In operation 1330, a first spatial parameter may be generated using a correlation
between the first sum signal that is a signal generated in operation 1320 and the
reference channel signal. Alternatively, in operation 1330, instead of generating
the first spatial parameter, a second spatial parameter may be generated using a correlation
between the second sum signal that is a signal generated in operation 1320 and the
reference channel signal.
[0076] The reference channel signal may be each of the first to nth channel signals ch1
to chn. Therefore, the number of reference channel signals may be n, and n spatial
parameters corresponding to the n reference channel signals may be generated.
[0077] Therefore, operation 1330 may further include generating first to nth spatial parameters
by setting each of the first to nth channel signals ch1 to chn as a reference channel
signal.
[0078] Operations 1320 and 1330 may be performed by the down-mixing unit 210.
[0079] In operation 1340, the spatial parameter SP generated in operation 1330 is encoded
and transmitted to the decoding apparatus (refer to 300 of FIG. 3). In addition, the
mono signal DM generated in operation 1310 is encoded and transmitted to the decoding
apparatus (refer to 300 of FIG. 3). In detail, the encoded spatial parameter SP and
the encoded mono signal DM may be included in a transmission stream TS and transmitted
to the decoding apparatus (refer to 300 of FIG. 3). The spatial parameter SP included
in the transmission stream TS indicates a spatial parameter set including the first
to nth spatial parameters.
[0080] Operation 1340 may be performed by the encoding apparatus (refer to 200 of FIG. 2).
[0081] FIGS. 14A to 14C show an example for describing operation 1110 of FIG. 11 or operation
1330 of FIG. 13. Hereinafter, an operation of generating the first sum signal and
the first spatial parameter is described in detail with reference to FIGS. 14A to
14C.
[0082] FIGS. 14A to 14C illustrate a case where a multi-channel signal includes the first
to third channel signals ch1, ch2, and ch3. In addition, FIGS. 14A to 14C illustrate
a vector sum of signals as a sum of signals, wherein the sum of signals indicates
down mixing, and various down-mixing methods may be used instead of a vector sum method.
[0083] FIGS. 14A to 14C illustrate cases where a reference channel signal is the first channel
signal ch1, the second channel signal ch2, and the third channel signal ch3, respectively.
[0084] Referring to FIG. 14A, when the reference channel signal is the first channel signal
ch1, the side information generation unit (refer to 220 of FIG. 2) generates a sum
signal 1410 by summing (ch2+ch3) the second and third channel signals ch2 and ch3
except for the reference channel signal. Thereafter, the side information generation
unit (refer to 220 of FIG. 2) generates a spatial parameter by using a correlation
(ch1, ch2+ch3) between the first channel signal ch1 that is the reference channel
signal and the sum signal 1410. The spatial parameter includes information indicating
the correlation between the reference channel signal and the sum signal 1410 and information
indicating a relative signal magnitude of the reference channel signal and the sum
signal 1410.
[0085] Referring to FIG. 14B, when the reference channel signal is the second channel signal
ch2, the side information generation unit (refer to 220 of FIG. 2) generates a sum
signal 1420 by summing (ch1+ch3) the first and third channel signals ch1 and ch3 except
for the reference channel signal. Thereafter, the side information generation unit
(refer to 220 of FIG. 2) generates a spatial parameter by using a correlation (ch2,
ch1+ch3) between the second channel signal ch2 that is the reference channel signal
and the sum signal 1420.
[0086] Referring to FIG. 14C, when the reference channel signal is the third channel signal
ch3, the side information generation unit (refer to 220 of FIG. 2) generates a sum
signal 1430 by summing (ch1+ch2) the first and second channel signals ch1 and ch2
except for the reference channel signal. Thereafter, the side information generation
unit (refer to 220 of FIG. 2) generates a spatial parameter by using a correlation
(ch3, ch1+ch2) between the third channel signal ch3 that is the reference channel
signal and the sum signal 1430.
[0087] When a multi-channel signal includes three channel signals, the number of reference
channel signals is 3, and three spatial parameters may be generated. The generated
spatial parameters are encoded by the encoding apparatus (refer to 200 of FIG. 2)
and transmitted to the decoding apparatus (refer to 300 of FIG. 3) via a network (not
shown).
[0088] A mono signal DM obtained by down-mixing the first to third channel signals ch1,
ch2, and ch3 is the same as a sum signal of the first to third channel signals ch1,
ch2, and ch3 and may be represented by DM=ch1+ch2+ch3. Therefore, a relationship ch1=DM-(ch2+ch3)
is valid.
[0089] The decoding apparatus 300 receives and decodes first spatial parameters that are
the spatial parameters described with reference to FIGS. 14A to 14C. The decoding
apparatus (refer to 300 of FIG. 3) restores original channel signals by using the
decoded mono signal and the decoded spatial parameters. As described above, the relationship
ch1=DM-(ch2+ch3) is valid, and the spatial parameter generated with reference to FIG.
14A may include a parameter indicating a relative magnitude of the first channel signal
ch1 and the sum signal 1410 (ch2+ch3) and a parameter indicating similarity between
the first channel signal ch1 and the sum signal 1410 (ch2+ch3), and thus, the first
channel signal ch1 and the sum signal 1410 (ch2+ch3) may be restored by using the
spatial parameter generated with reference to FIG. 14A and the mono signal DM. In
the same way, the second channel signal ch2 and the sum signal 1420 (ch1+ch3), and
the third channel signal ch3 and the sum signal 1430 (ch1+ch2) may be restored by
using the spatial parameters generated with reference to FIGS. 14B and 14C, respectively.
That is, the up-mixing unit (refer to 330 of FIG. 3) may restore all of the first
to third channel signals ch1, ch2, and ch3.
[0090] FIG. 15 shows another example for describing operation 1110 of FIG. 11 or operation
1330 of FIG. 13. Hereinafter, an operation of generating the second sum signal and
the second spatial parameter is described in detail with reference to FIG. 15. FIG.
15 illustrates a case where a multi-channel signal includes the first to third channel
signals ch1, ch2, and ch3. In addition, FIG. 15 illustrates a vector sum of signals
as a sum of signals.
[0091] Referring to FIG. 15, the second sum signal is a signal obtained by summing the first
to third channel signals ch1, ch2, and ch3, and thus, a signal 1520 (ch1+ch2+ch3)
obtained by adding the third channel signal ch3 to a signal 1510 obtained by summing
the first and second channel signals ch1 and ch2 is the second sum signal.
[0092] First, a spatial parameter between the first channel signal ch1 and the second sum
signal 1520 with the first channel signal ch1 as a reference channel signal is generated.
In detail, a spatial parameter including at least one of a first parameter and a second
parameter may be generated by using a correlation (ch1, ch1+ch2+ch3) between the first
channel signal ch1 and the second sum signal 1520.
[0093] Next, a spatial parameter is generated by using a correlation (ch2, ch1+ch2+ch3)
between the second channel signal ch2 and the second sum signal 1520 with the second
channel signal ch2 as a reference channel signal. Finally, a spatial parameter is
generated by using a correlation (ch3, ch1+ch2+ch3) between the third channel signal
ch3 and the second sum signal 1520 with the third channel signal ch3 as a reference
channel signal.
[0094] The decoding apparatus (refer to 300 of FIG. 3) receives and decodes second spatial
parameters that are the spatial parameters described with reference to FIG. 15. Thereafter,
the decoding apparatus (refer to 300 of FIG. 3) restores original channel signals
by using the decoded mono signal and the decoded spatial parameters. The decoded mono
signal corresponds to a signal (ch1+ch2+ch3) of summing multiple channel signals.
[0095] Therefore, the first channel signal ch1 may be restored by using the spatial parameter,
which is generated using the correlation (ch1, ch1+ch2+ch3) between the first channel
signal ch1 and the second sum signal 1520, and the decoded mono signal. Similarly,
the second channel signal ch2 may be restored by using the spatial parameter generated
using the correlation (ch2, ch1+ch2+ch3) between the second channel signal ch2 and
the second sum signal 1520. In addition, the third channel signal ch3 may be restored
by using the spatial parameter generated using the correlation (ch3, ch1+ch2+ch3)
between the third channel signal ch3 and the second sum signal 1520.
[0096] FIGS. 16A to 16D show another example for describing operation 1110 of FIG. 11 or
operation 1330 of FIG. 13.
[0097] First, in the encoding apparatus 200 of FIG. 2, the spatial parameter generated by
the side information generation unit 220 may include an angle parameter as a first
parameter. The angle parameter is a parameter indicating as a predetermined angle
value a signal magnitude correlation between a reference channel signal, which is
any one of the first to nth channel signals ch1 to chn, and the remaining channel
signals except for the reference channel signal among the first to nth channel signals
ch1 to chn. The angle parameter may be named a global vector angle (GVA). In addition,
the angle parameter may be a parameter representing a relative magnitude of the reference
channel signal and a first sum signal as an angle value.
[0098] The side information generation unit 220 may generate first to nth angle parameters
with each of the first to nth channel signals ch1 to chn as a reference channel signal.
Hereinafter, an angle parameter generated with a kth channel signal chk as a reference
channel signal is referred to as a kth angle parameter.
[0099] FIG. 16A shows a case where a multi-channel signal received by the encoding apparatus
includes the first to third channel signals ch1, ch2, and ch3. FIGS. 16B, 16C, and
16D show cases where a reference channel signal is the first channel signal ch1, the
second channel signal ch2, and the third channel signal ch3, respectively.
[0100] Referring to FIG. 16B, when a reference channel signal is the first channel signal
ch1, the side information generation unit (refer to 220 of FIG. 2) sums (ch2+ch3)
the second and third channels signals ch2 and ch3 that are the remaining channel signals
except for the reference channel signal and obtains a first angle parameter angle1
1622 that is an angle parameter between a sum signal 1620 and the first channel signal
ch1.
[0101] In detail, the first angle parameter angle1 1622 may be obtained from inverse tangent
of a value obtained by dividing an absolute value of the sum signal (ch2+ch3) 1620
by an absolute value of the first channel signal ch1.
[0102] Referring to FIG. 16C, a second angle parameter angle2 1632 with the second channel
signal ch2 as a reference channel signal may be obtained from inverse tangent of a
value obtained by dividing an absolute value of a sum signal (ch1+ch3) 1630 by an
absolute value of the second channel signal ch2.
[0103] Referring to FIG. 16D, a third angle parameter angle3 1642 with the third channel
signal ch3 as a reference channel signal may be obtained from inverse tangent of a
value obtained by dividing an absolute value of a sum signal (ch1+ch2) 1640 by an
absolute value of the third channel signal ch3.
[0104] FIG. 17 is a graph showing a total sum of angle parameters, wherein an x axis indicates
an angle value, and a y axis indicates a distribution probability. In addition, in
the angle value, one unit corresponds to 6 degrees. For example, a value of 30 in
the x axis indicates 180 degrees.
[0105] In detail, a total sum of n angle parameters calculated with each of the first to
nth channel signals as a reference channel signal is converged to a predetermined
value. The converged predetermined value may vary according to a value of n and may
be optimized through simulations or experiments. For example, when n is 3, the converged
predetermined value may be 180 degrees.
[0106] Referring to FIG. 17, when n is 3, a total sum of three angle parameters is converged
to about 30 units, i.e., about 180 degrees 1710, as shown in FIG. 17. The graph of
FIG. 14 is obtained through a simulation or an experiment.
[0107] Exceptionally, a total sum of the three angle parameters may be converged to about
45 units, i.e., about 270 degrees 1720. A predetermined value may be converged to
about 270 degrees 1720 when each angle parameter has a value of 90 degrees because
all the three channel signals are silent. In this exceptional case, if a value of
any one of the three angle parameters is changed to 0, a total sum of the three angle
parameters is converged to about 180 degrees 1710. When all the three channel signals
are silent, a down-mixed mono signal also has a value of 0, and even though the mono
signal is up-mixed and decoded, a result thereof is 0. Therefore, even though a value
of one angle parameter is changed to 0, an up-mixing and decoding result is not changed,
and thus, it does not matter even though a value of any one of the three angle parameters
is changed to 0.
[0108] FIG. 18 is to describe calculation of angle parameters, according to an exemplary
embodiment, wherein a multi-channel signal includes the first to third channel signals
ch1, ch2, and ch3. According to an exemplary embodiment, a spatial parameter including
angle parameters except for a kth angle parameter among the first to nth angle parameters
and a residue of the kth angle parameter which is used to calculate the kth angle
parameter may be generated.
[0109] Referring to FIG. 18, when the first channel signal ch1 is a reference channel signal,
a first angle parameter is calculated and encoded, and the encoded first angle parameter
is included in a predetermined bit region 1810 and is transmitted to the decoding
apparatus (refer to 300 of FIG. 3). When the second channel signal ch2 is a reference
channel signal, a second angle parameter is calculated and encoded, and the encoded
second angle parameter is included in a predetermined bit region 1830 and is transmitted
to the decoding apparatus (refer to 300 of FIG. 3).
[0110] When a third angle parameter is the kth angle parameter described above, a residue
of the kth angle parameter may be obtained as below.
[0111] Since a total sum of the n angle parameters is converged to a predetermined value,
a value of the kth angle parameter may be obtained by subtracting values of the angle
parameters except for the kth angle parameter among the n angle parameters from the
predetermined value. In detail, when n is 3, if not all the three channel signals
are silent, a total sum of the three angle parameters is converged to about 180 degrees.
Therefore, a value of the third angle parameter = 180 degrees - (a value of the first
angle parameter + a value of the second angle parameter). The third angle parameter
may be predicted using a correlation between the first to third angle parameters.
[0112] In detail, the side information generation unit (refer to 220 of FIG. 2) predicts
a value of the kth angle parameter among the first to nth angle parameters. A predetermined
bit region 1870 indicates a data region in which the predicted value of the kth angle
parameter is included.
[0113] Thereafter, the side information generation unit (refer to 220 of FIG. 2) compares
the predicted value of the kth angle parameter with an original value of the kth angle
parameter. A predetermined bit region 1850 indicates a data region in which a value
of the third angle parameter angle3 1642 calculated with reference to FIG. 16D is
included.
[0114] Thereafter, the side information generation unit (refer to 220 of FIG. 2) generates
a difference between the predicted value 1870 of the kth angle parameter and the original
value 1850 of the kth angle parameter as a residue of the kth angle parameter. A predetermined
bit region 1890 indicates a data region in which the residue of the kth angle parameter
is included.
[0115] The encoding apparatus (refer to 200 of FIG. 2) encodes a spatial parameter including
angle parameters (parameters include in the data regions 1810 and 1830) except for
the kth angle parameter among the first to nth angle parameters and the residue (parameter
included in the data region 1890) of the kth angle parameter and transmits the encoded
spatial parameter to the decoding apparatus (refer to 300 of FIG. 3).
[0116] The decoding apparatus (refer to 300 of FIG. 3) receives a spatial parameter including
the angle parameters except for the kth angle parameter among the first to nth angle
parameters and the residue of the kth angle parameter.
[0117] The decoding unit (refer to 320 of FIG. 3) in the decoding apparatus (refer to 300
of FIG. 3) restores the kth angle parameter by using the received spatial parameter
and a predetermined value.
[0118] In detail, the decoding unit (refer to 320 of FIG. 3) may generate by subtracting
values of the angle parameters except for the kth angle parameter among the first
to nth angle parameters from the predetermined value and compensating the residue
of the kth angle parameter from the subtracting result.
[0119] The residue of the kth angle parameter has a less data size than a value of the kth
angle parameter. Therefore, when the spatial parameter including the angle parameters
except for the kth angle parameter among the first to nth angle parameters and the
residue of the kth angle parameter is transmitted to the decoding apparatus (refer
to 300 of FIG. 3), a data amount transmitted and received between the encoding apparatus
(refer to 200 of FIG. 2) and the decoding apparatus (refer to 300 of FIG. 3) may be
reduced.
[0120] When angle parameters are generated for, for example, three channels, an angle parameter
of which channel has been residual-coded may be perceived by using values of 0, 1,
and 2. That is, when all the three channels are independently encoded, 2 bits * 3
= 6 bits are required, but only 5 bits may be required according to the method described
below.
[0121] When D=A+B*3+C*9 (a range of %D: 0∼26), if a value of D is known in decoding, A,
B, and C may be obtained by C=floor(D/9); D'=mod(D,9); B=floor(D'/3); A=mod(D'/3).
[0122] FIG. 19 is a block diagram of an audio signal processing system 1900 integrating
a multi-channel codec and a core codec, according to an exemplary embodiment.
[0123] The audio signal processing system 1900 shown in FIG. 19 includes an encoding apparatus
1910 and a decoding apparatus 1940. According to an exemplary embodiment, the audio
signal processing system 1900 may include both the encoding apparatus 1910 and the
decoding apparatus 1940, and according to another exemplary embodiment, the audio
signal processing system 1900 may include any one of the encoding apparatus 1910 and
the decoding apparatus 1940.
[0124] The encoding apparatus 1910 may include a multi-channel encoder 1920 and a core encoder
1930, and the decoding apparatus 1940 may include a core decoder 1850 and a multi-channel
decoder 1860.
[0125] Examples of a codec algorithm used in the core encoder 1930 and the core decoder
1850 may be AC-3, enhancement AC-3, and AAC using modified discrete cosine transform
(MDCT) but are not limited thereto.
[0126] FIG. 20 is a block diagram of an audio encoding apparatus 2000 according to an exemplary
embodiment, which integrates a multi-channel encoder 2010 and a core encoder 2040.
[0127] The audio encoding apparatus 2000 shown in FIG. 20 includes the multi-channel encoder
2010 and the core encoder 2040, wherein the multi-channel encoder 2010 may include
a transform unit 2020 and a down-mixing unit 2030, and the core encoder 2040 may include
an envelope encoding unit 2050, a bit allocation unit 2060, a quantization unit 2070,
and a bitstream formatting unit 2080. The components may be integrated as at least
one module and implemented as at least one processor (not shown).
[0128] Referring to FIG. 20, the transform unit 2020 transforms a PCT input of a time domain
into spectral data of a frequency domain. At this time, modified odd discrete Fourier
transform (MODFT) may be applied. Since an MDCT component is generated according to
MODFT = MDCT + jMDST, the existing inverse transform part and the existing analysis
filter bank part are not necessary. Furthermore, since MODFT has a complex value,
a level/phase/correlation may be more accurately obtained than in MDCT.
[0129] The down-mixing unit 2030 extracts a spatial parameter from the spectral data provided
from the transform unit 2020 and generates a down-mixed spectrum by down-mixing the
spectral data. The extracted spatial parameter is provided to the bitstream formatting
unit 2080.
[0130] The envelope encoding unit 2050 acquires an envelope value in a predetermined frequency
band unit from MDCT transform coefficients of the down-mixed spectrum provided from
the down-mixing unit 2030 and lossless encodes the envelope value. Herein, envelope
may be formed from any one of power, an average amplitude, a norm value, and average
energy obtained in the predetermined frequency band unit.
[0131] The bit allocation unit 2060 generates bit allocation information required to encode
a transform coefficient by using an envelope value obtained in each frequency band
unit and normalizes the MDCT transform coefficients. In this case, an envelope value
quantized and lossless-encoded in each frequency band unit may be included in a bitstream
and transmitted to a decoding apparatus (refer to 2100 of FIG. 21). In association
with bit allocation using an envelope value of each frequency band, a dequantized
envelope value may be used so that the encoding apparatus 2000 and the decoding apparatus
(refer to 2100 of FIG. 21) use a same process. When a norm value is used as an envelope
value, a masking threshold value may be calculated using a norm value in each frequency
band unit, and a required number of bits may be perceptually predicted using the masking
threshold value.
[0132] The quantization unit 2070 generates a quantization index by quantizing the MDCT
transform coefficients of the down-mixed spectrum based on the bit allocation information
provided from the bit allocation unit 2060.
[0133] The bitstream formatting unit 2080 generates a bitstream by formatting the encoded
spectral envelope, the quantization index of the down-mixed spectrum, and the spatial
parameter.
[0134] FIG. 21 is a block diagram of an audio decoding apparatus 2100 according to an exemplary
embodiment, which integrates a core decoder 2110 and a multi-channel decoder 2160.
[0135] The audio decoding apparatus 2100 shown in FIG. 21 includes the core decoder 2110
and the multi-channel decoder 2160, wherein the core decoder 2110 may include a bitstream
parsing unit 2120, an envelope decoding unit 2130, a bit allocation unit 2140, and
a dequantization unit 2150, and the multi-channel decoder 2160 may include an up-mixing
unit 2150, and an inverse transform unit 2180. The components may be integrated as
at least one module and implemented as at least one processor (not shown).
[0136] Referring to FIG. 21, the bitstream parsing unit 2120 extracts an encoded spectral
envelope, a quantization index of a down-mixed spectrum, and a spatial parameter by
parsing a bitstream transmitted via a network (not shown).
[0137] The envelope decoding unit 2130 lossless-encodes the encoded spectral envelope provided
from the bitstream parsing unit 2120.
[0138] The bit allocation unit 2140 allocates bits required to decode a transform coefficient
by using the encoded spectral envelope provided in each frequency band unit from the
bitstream parsing unit 2120. The bit allocation unit 2140 may operate the same as
the bit allocation unit 2060 of the audio encoding apparatus 2000 of FIG. 20.
[0139] The dequantization unit 2150 generates spectral data of an MDCT component by dequantizing
the quantization index of the down-mixed spectrum provided from the bitstream parsing
unit 2120 on the basis of bit allocation information provided from the bit allocation
unit 2140.
[0140] The up-mixing unit 2170 up-mixes the spectral data of the MDCT component provided
from the dequantization unit 2150 by using the spatial parameter provided from the
bitstream parsing unit 2120 and inverse-normalizes the up-mixed spectrum by using
the decoded spectral envelope provided from the envelope decoding unit 2130.
[0141] The inverse transform unit 2180 generates a pulse-code modulation (PCM) output of
the time domain by inverse-transforming the up-mixed spectrum provided from the up-mixing
unit 2170. At this time, an inverse MODFT may be applied to correspond to the transform
unit (refer to 2020 of FIG. 20). To this end, spectral data of a modified discrete
sine transform (MDST) component may be generated or predicted from the spectral data
of the MDCT component. The inverse MODFT may be applied by generating spectral data
of an MODFT component using the spectral data of the MDCT component and the generated
or predicted spectral data of the MDST component. The inverse transform unit 2180
may apply an inverse MDCT to the spectral data of the MDCT component. To this end,
a parameter for compensating for an error generated during up-mixing in a MDCT domain
may be transmitted from the audio encoding apparatus (refer to 2000 of FIG. 20).
[0142] According to an exemplary embodiment, for a stationary signal duration, multi-channel
decoding may be performed in the MDCT domain. For a non-stationary duration, an MODFT
component may be generated by generating or predicting an MDST component from an MDCT
component in a transient signal duration and be multi-channel-decoded in an MODFT
domain.
[0143] Whether a current signal corresponds to a stationary signal duration or a non-stationary
signal duration may be checked using flag information or window information added
in a predetermined frequency band or frame unit to a bitstream.
[0144] For example, when a short window is applied, the current signal may correspond to
a non-stationary signal duration, and when a long window is applied, the current signal
may correspond to a stationary signal duration.
[0145] In more detail, characteristics of a current signal may be checked by using blksw
and AHT flag information when an enhancement AC-3 algorithm is applied to a core codec
and using blksw flag information when an AC-3 algorithm is applied to the core codec.
[0146] According to FIGS. 20 and 21, by using MODFT for time/frequency domain transform,
even though a multi-channel codec and a core codec using different transform schemes
are integrated, the complexity in a decoding end may be reduced. In addition, even
though a multi-channel codec and a core codec using different transform schemes are
integrated, the existing synthesis filter bank part and the existing transform part
are not necessary, and thus, overlap add may be omitted, thereby preventing an additional
latency.
[0147] The methods according to the embodiments can be written as computer executable programs
and can be implemented in general-use digital computers that execute the programs
using a computer-readable recording medium. In addition, data structures, program
instructions, or data files usable in the embodiment of the present invention can
be recorded on the computer-readable recording medium in various manners. The computer-readable
recording medium may include all types of storage devices in which data readable by
a computer system is stored. Examples of the computer-readable recording medium include
magnetic media, such as hard disks, floppy disks, and magnetic tapes, optical recording
media, such as CD-ROMs and DVDs, magneto-optical media, such as floptical disks, and
hardware devices, such as read only memory (ROM), random access memory (RAM), and
flash memory, particularly configured to store and execute program instructions. In
addition, the computer-readable recording medium may be a transmission medium for
transmitting a signal designating program instructions, a data structure, or the like.
Examples of the program instructions may include machine language codes created by
a compiler and high-level language codes executable by a computer system using an
interpreter or the like.
[0148] Although exemplary embodiments of the present invention have been described in detail
with reference to the attached drawings, the present invention is not limited to these
embodiments. It is clear that various kinds of change or modification can be performed
within the scope of the technical spirit disclosed in claims by those of ordinary
skill in the art, and it is understood that those changes or modifications belong
to the technical scope of the present invention.