TECHNICAL FIELD
[0001] The present invention relates in general to encoding of audio signals, and in particular
to encoding of multi-channel audio signals.
BACKGROUND
[0002] There is a high market need to transmit and store audio signals at low bit rate while
maintaining high audio quality. Particularly, in cases where transmission resources
or storage is limited low bit rate operation is an essential cost factor. This is
typically the case, e.g. in streaming and messaging applications in mobile communication
systems such as GSM, UMTS, or CDMA.
[0003] Today, there are no standardised codecs available providing high stereophonic audio
quality at bit rates that are economically interesting for use in mobile communication
systems. What is possible with available codecs is monophonic transmission of the
audio signals. To some extent also stereophonic transmission is available. However,
bit rate limitations usually require limiting the stereo representation quite drastically.
[0004] The simplest way of stereophonic or multi-channel coding of audio signals is to encode
the signals of the different channels separately as individual and independent signals.
Another basic way used in stereo FM radio transmission and which ensures compatibility
with legacy mono radio receivers is to transmit a sum and a difference signal of the
two involved channels.
[0005] State-of-the-art audio codecs, such as MPEG-1/2 Layer III and MPEG-2/4 AAC make use
of so-called joint stereo coding. According to this technique, the signals of the
different channels are processed jointly, rather than separately and individually.
The two most commonly used joint stereo coding techniques are known as "Mid/Side"
(M/S) stereo coding and intensity stereo coding, which usually are applied on sub-bands
of the stereo or multi-channel signals to be encoded.
[0006] M/S stereo coding is similar to the described procedure in stereo FM radio, in a
sense that it encodes and transmits the sum and difference signals of the channel
sub-bands and thereby exploits redundancy between the channel sub-bands. The structure
and operation of an encoder based on M/S stereo coding is described, e.g. in
US patent 5,285,498 by J.D. Johnston.
[0007] Intensity stereo on the other hand is able to make use of stereo irrelevancy. It
transmits the joint intensity of the channels (of the different sub-bands) along with
some location information indicating how the intensity is distributed among the channels.
Intensity stereo does only provide spectral magnitude information of the channels.
Phase information is not conveyed. For this reason and since the temporal inter-channel
information (more specifically the inter-channel time difference) is of major psycho-acoustical
relevancy particularly at lower frequencies, intensity stereo can only be used at
high frequencies above e.g. 2 kHz. An intensity stereo coding method is described,
e.g. in the European patent
0497413 by R. Veldhuis et al.
[0008] A recently developed stereo coding method is described, e.g. in a conference paper
with the title "
Binaural cue coding applied to stereo and multi-channel audio compression", 112th
AES convention, May 2002, Munich, Germany by C. Faller et al. This method is a parametric multi-channel audio coding method. The basic principle
is that at the encoding side, the input signals from N channels c
1, c
2, ... c
N are combined to one mono signal m. The mono signal is audio encoded using any conventional
monophonic audio codec. In parallel, parameters are derived from the channel signals,
which describe the multi-channel image. The parameters are encoded and transmitted
to the decoder, along with the audio bit stream. The decoder first decodes the mono
signal m' and then regenerates the channel signals c
1', c
2',..., c
N', based on the parametric description of the multi-channel image.
[0009] The principle of the Binaural Cue Coding (BCC) method is that it transmits the encoded
mono signal and so-called BCC parameters. The BCC parameters comprise coded inter-channel
level differences and inter-channel time differences for sub-bands of the original
multi-channel input signal. The decoder regenerates the different channel signals
by applying sub-band-wise level and phase adjustments of the mono signal based on
the BCC parameters. The advantage over e.g. M/S or intensity stereo is that stereo
information comprising temporal inter-channel information is transmitted at much lower
bit rates.
[0010] A problem with the state-of-the-art multi-channel coding techniques described above
is that they require high bit rates in order to provide good quality. Intensity stereo,
if applied at low bit rates as low as e.g. only a few kbps suffers from the fact that
it does not provide any temporal inter-channel information. As this information is
perceptually important for low frequencies below e.g. 2 kHz, it is unable to provide
a stereo impression at such low frequencies.
[0011] BCC is able to reproduce the multi-channel image even at low frequencies at low bit
rates of e.g. 3 kbps since it also transmits temporal inter-channel information. However,
this technique requires computational demanding time-frequency transforms on each
of the channels, both at the encoder and the decoder. Moreover, BCC optimises the
mapping in a pure mathematical manner. Characteristic artefacts immanent in the coding
method will, however, not disappear.
[0012] Another technique, described in
US patent 5,434,948 by C.E. Holt et al. uses a similar approach of encoding the mono signal and side information. In this
case, side information consists of predictor filters and optionally a residual signal.
The predictor filters, estimated by a least-mean-square algorithm, when applied to
the mono signal allow the prediction of the multi-channel audio signals. With this
technique one is able to reach very low bit rate encoding of multi-channel audio sources,
however, at the expense of a quality drop.
[0013] An approach similar to the above filtering approach is described in
W0 03/090206 by Breebaart and Groenendaal. However, this approach uses a fixed filter applied to the mono signal and combined
together with the non filtered mono signal via a matrixing operation. The matrixing
operation is dependent upon a received correlation parameter and a received level
parameter. The objective of such signal synthesis is to restore the correlation and
the level difference of the original two channels. Because of the inherently fixed
filtering operation, the signal synthesis has a very limited potential for signal
reproduction and does not adapt to the signal characteristics. The approach can be
regarded as an extension of the intensity stereo coding method discussed above, in
which now a temporal component is conveyed to the decoder. Still, only the level and
the correlation parameters allow a certain degree of adaptivity through a matrixing
operation. This operation consists of a mere rotation and scaling of statically filtered
signals, thus limiting the polyphonic reproduction ability. Another drawback of the
approach is the fact that it is not based on a fidelity criterion, e.g. signal-to-noise
ratio, which limits its scalability to transparent quality.
[0014] Finally, for completeness, a technique is to be mentioned that is used in 3D audio.
This technique synthesises the right and left channel signals by filtering sound source
signals with so-called head-related filters. However, this technique requires the
different sound source signals to be separated and can thus not generally be applied
for stereo or multi-channel coding.
SUMMARY
[0015] Although the predictor filters are known to be optimal in the least-mean-square sense,
they do not always fully restore the perceptual characteristics of the original multi-channel
signals. In e.g. the case of stereo encoding, stereo image instability may occur,
where the sound jumps randomly between left to right. Furthermore, spectral nulls
may cause instabilities and lead to a filter whose frequency response at these frequencies
is aberrant. This may cause the filter to perform unnecessary amplification in certain
regions and lead to very annoying audible artefacts, especially if the signals are
low-pass or high-pass filtered.
[0016] An object of the present invention is to provide a method and device for multi-channel
encoding that improves the perceptual quality of the audio signal. A further object
of the present invention is to provide such a method and device, which requires low
bit rate representation.
[0017] The above objects are achieved by methods and devices according to the enclosed patent
claims. In general, at the encoder side, the signals of the different channels are
combined into one main signal. A set of adaptive filters, preferably one for each
channel, is derived. When a filter is applied to the main signal it reconstructs the
signal of the respective channel under a perceptual constraint. The perceptual constraint
is a gain and/or shape constraint. The gain constraint allows the preservation of
the relative energy between the channels while the shape constraint allows stereo
image stability, e.g. by avoiding unnecessary filtering of spectral nulls. The transmitted
parameters are the main signal, in encoded form, and the parameters of the adaptive
filters, preferably also encoded. The receiver reconstructs the signal of the different
channels by applying the adaptive filters and possibly some additional post-processing.
[0018] An advantage with the present invention is that perceptual artefacts are reduced
when decoding audio signals. The required transmission bit rate is at the same time
also kept at a very low level.
BRIEF DESCRIPTION OF THE DRAWINGS
[0019] The invention, together with further objects and advantages thereof, may best be
understood by making reference to the following description taken together with the
accompanying drawings, in which:
FIG. 1 is a block scheme of a system for transmitting multi-channel signals;
FIG. 2a is a block diagram of an embodiment of an encoder in a transmitter according
to the present invention;
FIG. 2b is a block diagram of an embodiment of a decoder in a receiver according to
the present invention;
FIG. 3a is a block diagram of another embodiment of an encoder in a transmitter according
to the present invention;
FIG. 3b is a block diagram of another embodiment of a decoder in a receiver according
to the present invention;
FIG. 4 is a block diagram of an embodiment of a filter adaptation unit according to
the present invention;
FIG. 5 are diagrams illustrating the effects of insufficient reproduction of side
signals in a prior-art system;
FIG. 6 is a diagram illustrating effects of spectral nulls in prior-art systems;
FIG. 7 is a block diagram illustrating combining possibilities in channel filter sections
according to the present invention;
FIG. 8 is a block diagram of an embodiment of an encoder employing partial combined
encoding of a stereo signal;
FIG. 9 is a block diagram illustrating the use of division in frequency sub-bands;
FIG. 10 is a composite diagram illustrating overlapping analysis for encoding and
decoding; and
FIG. 11 is a flow diagram of the basic steps of an embodiment of an encoding method
according to the present invention.
DETAILED DESCRIPTION
[0020] Fig. 1 illustrates a typical system 1, in which the present invention advantageously
can be utilised. A transmitter 10 comprises an antenna 12 including associated hardware
and software to be able to transmit radio signals 5 to a receiver 20. The transmitter
10 comprises among other parts a multi-channel encoder 14, which transforms signals
of a number of input channels 16 into output signals suitable for radio transmission.
Examples of suitable multi-channel encoders 14 are described in detail further below.
The signals of the input channels 16 can be provided from e.g. an audio signal storage
18, such as a data file of digital representation of audio recordings, magnetic tape
or vinyl disc recordings of audio etc. The signals of the input channels 16 can also
be provided in "live", e.g. from a set of microphones 19. The audio signals are digitised,
if not already in digital form, before entering the multi-channel encoder 14.
[0021] At the receiver 20 side, an antenna 22 with associated hardware and software handles
the actual reception of radio signals 5 representing polyphonic audio signals. Here,
typical functionalities, such as e.g. error correction, are performed. A decoder 24
decodes the received radio signals 5 and transforms the audio data carried thereby
into signals of a number of output channels 26. The output signals can be provided
to e.g. loudspeakers 29 for immediate presentation, or can be stored in an audio signal
storage 28 of any kind.
[0022] The system 1 can for instance be a phone conference system, a system for supplying
audio services or other audio applications. In some systems, such as e.g. the phone
conference system, the communication has to be of a duplex type, while e.g. distribution
of music from a service provider to a subscriber can be essentially of a one-way type.
The transmission of signals from the transmitter 10 to the receiver 20 can also be
performed by any other means, e.g. by different kinds of electromagnetic waves, cables
or fibres as well as combinations thereof.
[0023] Fig. 2a illustrates one embodiment of a multi-channel encoder 14 according to the
present invention. A number of channel signals c
1, c
2, ..., c
N are received at separate inputs 16:1-16:N.
[0024] The channel signals are connected to a linear combination unit 34. In the present
embodiment, all channel signals are summed together to form a mono signal x. However,
any predetermined linear combination of one or more of the channel signals may be
used as an alternative, including pure channel signals. However, a pure sum will simplify
most mathematical operations. The mono signal x is provided as an input signal 42
to a channel filter section 130. Furthermore, the mono signal x is provided to, and
encoded in, a mono signal encoder 38 to provide encoding parameters p
x representing the mono signal x. The mono signal encoder operates according to any
suitable mono signal encoding technique. Many such techniques are available in known
technology. The actual details of the encoding technique are not of importance for
enabling the present invention and is therefore not further discussed.
[0025] The channel signals are also connected to the channel filter section 130. In the
present embodiment, each channel signal is connected to a respective filter adaptation
unit 30:1-30:N. The filter adaptation units perform a reconstruction of a respective
channel signal when applied to the mono signal x. Coefficients of the filter adaptation
units 30:1-30:N are according to the present invention optimised under a perceptual
constraint. However, the optimised coefficients of the filter adaptation units 30:1-30:N
may also be obtained at least partly in a joint optimisation of two or more of the
channel signals.
[0026] The output of the channel filter section 130 comprises N sets of filter parameters
p
1-p
N. These filter parameters p
1-p
N are typically encoded separately or jointly to be suitable for transmission. The
filter parameters p
1-p
N and the mono signal x are sufficient to enable reconstruction of all channels signals.
The encoded filter parameters p
1-p
N and the encoding parameters p
x representing the mono signal x are in the present embodiment multiplexed in a multiplexor
40 into one output signal 52, ready for transmission.
[0027] Fig. 2b illustrates one embodiment of a multi-channel decoder 24 according to the
present invention. The decoder 24 in Fig. 2b is suitable for decoding multi-channel
signals encoded by the encoder of Fig. 2a. An input signal 54 is received and provided
to a demultiplexor 56, which divides the input signal 54 into encoding parameters
p
x representing the mono signal x and a number of sets of encoded filter parameters
p
1-p
N.
[0028] The encoding parameters p
x representing the mono signal x are provided to a mono signal decoder 64, in which
the encoding parameters p
x representing the mono signal x are used to generate a decoded mono signal x" according
any suitable decoding technique associated with the encoding technique used in Fig.
2a. Many such techniques are available in known technology. The actual details of
the encoding technique are not of importance for enabling the present invention and
is therefore not further discussed. The decoded mono signal x" is provided to a channel
filter section 160.
[0029] The encoded filter parameters are also provided to the channel filter section 160,
where they are decoded and used to define channel filters 60:1-60:N. The so defined
respective channel filters 60:1-60:N are applied to the decoded mono signal x" whereby
respective channel signals c"
1-c"
N are reconstructed and provided at outputs 26:1-26:N.
[0030] In most embodiments of the present disclosure, a mono signal is used as a main signal
for regenerating the channel signals at the encoding or decoding. However, in a general
approach, any predetermined linear combination of signals selected among the channel
signals may be used as such a main signal. The optimum choice of predetermined linear
combination depends on the actual application and implementation. A single channel
signal can also constitute a possible such predetermined linear combination.
[0031] Another embodiment of a multi-channel encoder 14 according to the present invention
is illustrated in Fig. 3a. Similar parts are denoted by similar reference numbers
and only the differences are discussed below.
[0032] The linear combination unit 34 provides as earlier a predetermined linear combination
of the channel signals to the mono signal encoder 38. However, in this embodiment,
the signal associated with the mono signal x is instead a decoded version x" of the
encoding parameters p
x representing the mono signal x. Such an arrangement, referred to as a closed loop
approach, will allow for certain compensations of mono signal encoding inaccuracies,
as described further below.
[0033] The linear combination unit 34 of the present embodiment also combines the channel
signals in N-1 predetermined linear combinations c*
1-c*
N-1, which serves as actual input signals to the channel filter section 130. The N-1
predetermined linear combinations c*
1-c*
N-1 should be mutually linear independent. The linear combinations c*
1-c*
N-1 do not necessarily comprise any contribution from all channel signals. The term "linear
combination" should in this context be used as also comprising the special cases where
a factor of a component can be set to zero. In fact, in the most simple set-up, the
linear combinations c*
1-c*
N-1 can be identical to the channel signals c
1-c
N-1. By utilising a decoded mono signal x" at the decoder side, the original channel
signals can be recovered.
[0034] The modified channel signals are also in this embodiment connected to the channel
filter section 130, in which N-1 sets of filter coefficients are deduced, now corresponding
to the modified channel signals. The coefficients of the filter adaptation units 30:1-30:N
are according to the present invention optimised under a perceptual constraint.
[0035] The output of the channel filter section 130 comprises N-1 sets of filter parameters
p*
1-p*
N-1. These filter parameters p*
1-p*
N-1 are typically encoded separately or jointly to be suitable for transmission. The
encoded filter parameters p*
1-p*
N-1 and the encoding parameters p
x representing the mono signal x are in the present embodiment transmitted separately.
[0036] Fig. 3b illustrates another embodiment of a multi-channel decoder 24 according to
the present invention. The decoder 24 in Fig. 3b is suitable for decoding multi-channel
signals encoded by the encoder of Fig. 3a. Encoding parameters p
x representing the mono signal x and a set of encoded filter parameters p*
1-p*
N-1 are received. The encoding parameters p
x representing the mono signal x are used to generate a decoded mono signal x" in a
mono signal decoder 64 in analogy with previous embodiment. The filter parameters
p*
1-p*
N-1 are likewise provided to the channel filter section 160 for obtaining N-1 decoded
modified channel signals c*
1-c*
N-1. A linear combination unit 74 is then used to provide reconstructed channel signals
c"
1-c"
N from the modified channel signals c*
1-c*
N-1 and the decoded mono signal x".
[0037] In order to realise the important relevance of the perceptual constraints, an example
of prior art filter encoding will be described more in detail, basically referring
to the
US patent 5,434,948. This multi-channel encoding allows low bit rates if the transmission of residual
signals is omitted. To derive the channel reconstruction filter, an error minimisation
procedure based on a least-mean-square or weighted least-mean-square concept calculates
the filters such that its output signal
ĉ(
n) best matches the target signal
c(
n)
.
[0038] In order to compute the filter, several error measures may be used. The mean square
error or the weighted mean square error are well known and are computationally cheap
to implement. According to the least mean square approach, the filter

where "
uc" refers to "unconstrained", is valid for one frame of data and chosen such that it
minimises the squared error between the target signal and the filter output, i.e.
the square of the difference
ruc(
n)=
c(
n)
-ĉuc(
n)
, n indexing the samples of a data frame. This error is expressed as:

[0039] This leads to the following linear equation system for the filter coefficient vector

where

is the symmetric covariance matrix of the mono signal
x(
n)
: 
and where
rxc is a vector of cross-correlations of signals
x(
n) and
c(
n)
: 
[0040] However, as mentioned further above, the perceptual characteristics may not completely
be determined by a pure mathematical minimisation.
[0041] One very important perceptual characteristic of multi-channel signals is their energy
and especially the relative levels between the multi-channel audio signals. In the
case of stereo encoding with prior-art methods, annoying stereo image instability
where the sound source jumps periodically from left to right may be the result. Moreover,
since only one filter is needed in stereo encoding, no direct control over the left
and right predictions is achieved. According to the present invention, a gain constraint
is therefore advantageously utilised during optimisation procedures. In that context,
it may be noted that one filter per channel basically is necessary, c.f. Fig. 2a and
Fig. 2b above.
[0042] In certain situations, the predicted channels may have no frequency content above
or below a certain frequency. This occurs if, for instance, the channel is high-pass
filtered, or results from a band-splitting procedure. Spectral nulls may cause instabilities
and lead to filter responses that produces unnecessary amplification and low frequency
audible artefacts. According to the present invention, a shape constraint is therefore
advantageously utilised during optimisation procedures.
[0043] Fig. 4 illustrates the basic ideas of the constrained minimisation procedure at the
encoder side according to the present invention in an embodiment having two channels
(the stereo case) and a linear filter 31. A filter 31 responsive for reconstruction
of channel
c1 having filter coefficients
hc1, is derived according to a constrained error minimisation procedure in an optimising
unit 32. The filter
hc1 takes as input the combined channel signal, i.e. the mono signal
x(
n), which in this embodiment is a linear combination of the two channel signals
c1 and
c2:

and derives from it the output signal
ĉ1(
n). The factors γ
c1 and γ
c2 determine how the channel signals are combined. One possibility is to set γ
c1 to a factor 2γ and γ
c2 to 2(1-γ). In this case, the mono signal will be a weighted sum of the channels.
In particular, a suitable setting is γ = 0.5, in which case both channels are equally
weighted. Another suitable setting may be γ
c1 = -γ
c2, in which case the mono signal is the difference of the channel signals.
[0044] The weighted combination of the individual channel signals to form the mono signal
can in general even be the combination of filtered versions of the respective channel
signals. Such an approach will be called pre-filtering. This can be useful if the
approach is implemented in the excitation domain or in general a weighted signal domain.
For instance, the channels can be pre-filtered by a LPC (Linear Predictive Coding)
residual filter of the mono signal.
[0045] In the following, the mono and left and right channel will be assumed to be in general
some pre-filtered versions of the real mono, left and right channels. When restoring
the channels, the step of post-filtering with the mono LPC synthesis filter would
be needed in order to get back to the signal domains.
[0046] In the following, the case γ
c1 =1/2 and γ
c2 = 1/2 is discussed more in detail.
[0047] In case of
hc1 being an FIR (Finite Impulse Response) filter,
ĉ1(
n) is a linear combination of delayed versions of signal
x(
n):

the index set being
I = [
iminKimax]. The filter parameters p
1 comprise the filter coefficients
hc1 and maybe necessary additional data defining the filter.
[0048] If applying e.g. the encoding method presented in
US 5,434,948, the difference signal of two channel signals is reproduced by a filter. In Fig.
5, the right and left signals are illustrated by the curves 301 and 302, respectively.
Assume that the representation is not ideal, giving a slightly larger difference than
the target difference over the entire frame. This will lead to a reproduced right
signal 303 at the decoder side that is slightly lower than the original right signal,
and a reproduced left signal 304 that is slightly higher than the original left signal.
The perception of such an artefact is that the volume of the right channel is decreased
and the volume of the left channel is increased. If such artefacts moreover vary in
time, the sound will swing back and forth between the right and left channel. A gain
constraint may improve such a situation.
[0049] There are several ways of implementing the gain constraint. One possible approach
is to have a hard constraint, i.e. exact energy match between the original channel
and the estimated channel, or to impose a loose gain constraint such as the output
channel has a prescribed energy
Ec1, which is not necessarily equal to the original channel signal energy.
[0050] The constrained minimisation problem can easily be solved by Lagrange method, i.e.
the Lagrange functional:

[0051] The optimal solution gives a filter
hc1 that is proportional to the unconstrained filter

The proportionality factor is:

[0052] The gain constrained filter thereby becomes

[0053] If the present encoder principle is used in a limited frequency band, a channel signal
may look like curve 305 of Fig. 6. No intensity is present below frequency f
1 or above frequency f
2. However, a pure mathematical optimisation gives rise to a curve 306, which presents
some limited power also below and above the frequencies f
1 and f
2, respectively. Such artefacts are perceived.
[0054] In order to impose a certain spectral shape on the filter, a set of linear constraints
have to be imposed on the filter. These constraints should in general be of a number
less than the number of coefficients of the filter.
[0055] For instance, if one wants to set a constraint of a spectral null at 0 kHz, then
a suitable constraint is:

[0056] In general, the shape constraint can be formulated by a matrix and a vector such
that

[0057] From the theory of constrained least squares, the optimal filter satisfying these
constraints is:

[0058] This constraint is especially useful when it is known a priori that the channel has
no frequency content in a certain frequency range.
[0059] The gain and shape constraints can also be combined. In such a case, the shape constraint
is preferably first applied and the gain constraint is then added as a factor, according
to

[0060] Since the filters depend on the unconstrained filter and the latter obeys, since
c1(
n) +
c2(
n) = 2
x(
n)
, the relation:

where δ denotes the identity filter. Useful properties can be derived for the shape-constrained
filters, if the constraints on the two channels are identical,

then

[0061] This equation is useful for bit rate reduction when encoding the channel filters,
since it shows that the channel filters are related by quantities that are available
at the decoder side.
[0062] The relations between the shape constrained filters also opens up for a rational
computation of the filters. In Fig. 7, an illustration shows that one c1 out of two
channels c1, c2 is reproduced by applying the mono signal x to an unconstrained filter
131. The result of the unconstrained filter is modified depending on shape constraints
in a shape constraint section 132. From the shape constrained filter for the c1 channel,
also the shape constrained filter of channel c2 can be calculated and provided to
separate gain constraint sections 133 for each channel.
[0063] A more detailed block scheme of another embodiment using a side signal for applying
the shape constraint is illustrated in Fig. 8. Two channel signals c
1 and c
2 are combined in addition means 55, 57 of a linear combination unit 34 to a mono signal
x and a side signal s. A channel filter section 130 comprises an unconstrained parametric
filter 131, which applied to the mono signal x reproduces an estimate of the side
signal
ŝ. In an unconstrained optimising unit 33, the filter coefficients are adapted to give
the minimum difference between s and
ŝ. The filter obtained in this manner

is provided to a shape constraint section 132, basically according to the discussions
further above. A shape-constrained filter

for the side signal is created. From the relation (1) between channel filters in
a stereo application, a shape-constrained filter for each channel signal is calculated,
based on the shape-constrained filter

for the side signal. These filters, or rather the coefficients thereof, are provided
to a respective gain constraint section 133:1, 133:2. A gain factor for each channel
signal is calculated, and the two filters are provided to a parameter encoding section
66, where the parameters of the two filters are jointly encoded.
[0064] After calculation of the constrained channel filters
hc1 and
hc2, they are quantized and encoded in a representation, which is suitable for transmission
to the receiver. Typically, the coefficients of the filters are quantized using scalar
or vector quantizers and the quantizer indexes are transmitted. The quantizers may
also implement prediction, which is very beneficial for bit rate reduction especially
in this scenario.
[0065] Making use of the complementarities of the filters may further reduce the bit rate
since only one of the filters
hc1 or
hc2 or a linear combination of them is quantized and transmitted while the gains
gc1 and
gc2 are jointly vector quantized and transmitted separately. Such a transmission can
be carried out at bit rates as low as, e.g. 1 kbps.
[0066] The receiver first decodes the transmitted mono signal and channel filters. Then,
it regenerates the different channel signals by filtering the mono signal through
the respective channel filter. Preferably, in the stereo case, the completeness property
is used, and the coefficients are recombined to produce the filters
hc1 and
hc2.
[0067] Certain post-processing steps that further improve the quality of the reconstructed
multi-channel signal may follow the re-generation of the different channels signals.
[0068] It is sometimes beneficial to smooth the gain of the shape-constrained filters or
a linear combination of these filters, before computing the gain constrained channel
filters.
[0069] For instance, in the case of stereo, the equivalent side signal filter is (as used
in Fig. 8):

and in order to reduce possible artefacts, the gain difference of this filter between
successive frames is smoothened leading to a filter

The channel filters are then modified according to:

[0070] This type of modification does not conserve the shape constraints, however, one can
easily see that the shape constraints are still conserved on the side signal filter
and this is enough in the case of stereo coding.
[0071] The gain constraint on the filters assumes previously computed channel energies,
i.e.
Ec1,
Ec2. It is important to control the gains of the filters, e.g.
gc1,
gc2 and to avoid unnecessary amplification by limiting the gains. Depending on the properties
of the different channel signals, it may occur that the channels are anti-correlated
on the whole frequency range or in certain frequency bands. This leads to a certain
cancellation when the mono channel is formed. In this case, since the individual channel
information has been lost, at least partially and in some frequency bands, it is often
beneficial to limit the channels gains when these are greater than a certain amount,
e.g. 0 dB. One way to perform this gain limitation is to compute a certain gain factor:

which is the ratio of the effective mono channel energy and the energy of the mono
channel if the two channels were uncorrelated. When this factor is less than 0 dB,
then we have signal cancellation. In this case,
gF quantifies how severe this cancellation is. The gain limitation can then be computed
as:
gc1(dB) = max(gc1(dB) + gF(dB),0), when gF < 0 dB.
[0072] The same limitation holds for the gain of the other channels.
[0073] Not only the channel filter parameters need to be encoded and transmitted, but also
the mono signal. There are two different principle approaches to consider the mono
signal audio coding when deriving the channel filter coefficients.
[0074] In an open-loop fashion, the filters are derived based on the original mono signal.
This is e.g. the case in Fig. 2a, where the signal 42 is the original mono signal
x. The decoder, however, will use a quantized mono signal as input for the channel
filtering.
[0075] In a closed-loop fashion, the filter calculations are based on the coded and thus
already quantized mono signal. This is e.g. the case in Fig. 3a, where the signal
44 is a decoded mono signal x". This approach has the advantage that the channel filter
design does not only aim to match the respective channel signals in a best possible
way. It also aims to mitigate coding errors, which are the result of the mono signal
encoding.
[0076] The principles described hitherto are applicable on the complete spectrum, i.e. full-band
signals. However, they are equally well or even more beneficially applicable on sub-bands
of the signals. Fig. 9 illustrates the principles of sub-band processing. A number
of channels c
1 - C
N are each divided in K sub-bands SB1, SB2, SBK. The channel signals in each sub-band
is provided to a respective multi-channel encoder unit 80:1-80:K, where the channel
signals are encoded. One or several of the multi-channel encoder units 80:1-80:K can
be multi-channel encoder units according to the present invention. A bit-stream combiner
82 combines the encoded signals into a common encoded signal 53, that is transmitted.
[0077] Advantages of the described sub-band processing are that the multi-channel encoding
for the different sub-bands can be carried out individually optimised with respect
to e.g. assigned bit rate, processing frame sizes and sampling rate.
[0078] One special kind of sub-band processing does not carry out multi-channel encoding
for very low frequencies, e.g. below 200 Hz. That means that for this very low frequency
band, a mere mono signal is transmitted. This principle makes use of the fact that
the human stereo perception is less sensitive for very low frequencies. It is known
from prior art and called sub-woofing.
[0079] In a further embodiment of the sub-band processing the band splitting is done using
a time-frequency transform such as, e.g. a short term Fourier transform (STFT), which
allows decomposing the signal into single frequency components. In this case, the
filtering reduces to a mere multiplication of the individual spectral coefficients
of the mono signal with a complex factor.
[0080] The parametric multi-channel coding method according to the invention will typically
involve fixed frame-wise processing of signal samples. In other words, parameters
describing the multi-channel image are derived and transmitted with a rate corresponding
to a coding frame length of, e.g. 20 ms. The parameters may, however, be obtained
from signal frames which are much larger than the coding frame length. A suitable
choice is to set the length of such analysis frames to values larger than the coding
frame length. This implies that the parameter calculation is performed with overlapping
analysis frames.
[0081] This is illustrated in Fig. 10. Analysis frames 83 at the encoder are slightly longer
than encoding frames 84, as shown in the top of the figure. A consequence of such
overlapping analysis frames is that the parameters evolve smoothly, which is essential
in order to provide a stable multi-channel audio signal impression. The same is performed
at the decoder side, shown in the middle of the figure. It is thus essential in the
decoder to take account of this and to window and overlap-add synthesis frames 85,
with an overlap 86, as shown at the bottom of the figure. This allows a smooth transition
between filters associated with each frame.
[0082] Also at the encoder, smooth filter parameter evolution can be enforced. It is, e.g.
possible to apply low-pass or median filtering to the filter parameters.
[0083] State-of-the-art monophonic audio codecs as well as speech codecs perform so-called
noise shaping of the coding noise. The purpose of this operation is to move coding
noise to frequencies where the signal has high spectral density and thus render the
noise less audible. Noise shaping is usually done adaptively, i.e. in response to
the audio signal. This implies that, in general, the noise shaping performed on the
mono signal will be different from what is required for the various channel signals.
As a result, despite proper noise shaping in the mono audio codec, the subsequent
channel filtering according to the invention may lead to an audible coding noise increase
in the reconstructed multi-channel signal when comparing to the audible coding noise
in the mono signal.
[0084] In order to mitigate this problem, signal-adaptive post-filtering may be applied
to the reconstructed channel signals in a post-processing step of the receiver. Any
state-of-the-art post-filtering techniques can be deployed here, which essentially
emphasise spectral tops or deepen spectral valleys and thereby reduce the audible
noise. One example of such a technique is so-called high-resolution post-filtering
which is described in the European Patent
0 965 123 B1 by E. Ekudden et. al. Other simple methods are so-called pitch- and formant post-filters, which
are known from speech coding.
[0085] In Fig. 11, the main steps of an embodiment of an encoding method according to the
present invention are illustrated as a flow diagram. The procedure starts in step
200. In step 220, a main signal, preferably a mono signal, deduced from the multi-channel
signals is encoded. In step 222, filter coefficients are optimised to give an as good
representation as possible of a channel signal when applied to the main signal. The
optimising takes place under perceptual constraints. The optimal coefficients are
then encoded in step 224. The procedure ends in step 299.
[0086] The embodiments described above are to be understood as a few illustrative examples
of the present invention. It will be understood by those skilled in the art that various
modifications, combinations and changes may be made to the embodiments without departing
from the scope of the present invention. In particular, different part solutions in
the different embodiments can be combined into other configurations, where technically
possible. The scope of the present invention is, however, defined by the appended
claims.
REFERENCES
ANNEX
[0093] There is provided a method of coding multi-channel signals (c
1-c
N) comprising at least a first and a second channel, comprising the steps of:
generating encoding parameters (px) representing a main signal (x) being a first predetermined linear combination of
signals of the multi-channel signals (c1-cN);
deriving optimal parameters (p1-pN) of a first adaptive filter (31; 131, 132, 133:1-2); and encoding the optimal parameters
(p1-pN),
characterised by the further step of:
deriving optimal parameters (p1-PN) of at least a second adaptive filter (31; 131, 132, 133:1-2);
said first adaptive filter (31; 131, 132, 133:1-2) being derived to give a minimum
difference between the signal of the first channel (c1-cN) and a filter output signal when the first adaptive filter (31; 131, 132, 133:1-2)
is applied on the first predetermined linear combination (x);
the minimum difference being defined according to a first criterion;
said second adaptive filter being derived to give a minimum difference between the
signal of the second channel (c1-cN) and a filter output signal when the second adaptive filter is applied on the first
predetermined linear combination (x);
the minimum difference being defined according to a second criterion; and
whereby the deriving steps of said first and said second adaptive filters (31; 131,
132, 133:1-2) are performed under at least one perceptual constraint selected from
the group of gain constraint and shape constraint.
[0094] At least one of the first criterion and the second criterion may be a least mean
square criterion. The perceptual constraint may be at least a gain constraint, striving
to give a total energy of the filter output signal equal to a total energy of the
signal of the first channel. The gain constraint may be an absolute constraint, demanding
that the total energy of the adaptive filter output signal is equal to the total energy
of the signal of the corresponding channel.
[0095] The gain constraint may be a soft constraint, favouring adaptive filters giving the
total energy of the adaptive filter output signal close to the total energy of the
signal of the corresponding channel. The gain constraint may be imposed as a gain
factor (
gc1, gcN) times an adaptive filter derived without gain constraints. The gain constrained
filter

may be given by:

where

is the adaptive filter derived without gain constraints,
Ec a prescribed energy of the adaptive filter output signal and
ĉuc(
n) is an adaptive filter output of main signal
x(
n) without gain constraints.
[0096] The perceptual constraint may be at least a shape constraint, imposing a predefined
spectral shape on the adaptive filter (31; 131, 132, 133:1-2). The shape constraint
may impose null content in a predefined frequency range. The step of encoding the
optimal parameters (p
1-p
N) may comprise jointly coding of the optimal parameters of the first and second adaptive
filters.
[0097] The step of deriving parameters in turn may comprise the steps of:
creating a second predetermined linear combination (s; c*1-c*N-1) of the signals of the multi-channel signals (c1-cN);
deriving parameters of a third filter to give a minimum difference between the second
predetermined linear combination and the filter output signal when the third filter
is applied on the first predetermined linear combination, under the shape constraint;
computing the optimal parameters of the first and second filters as a function of
the optimal parameters of the third filter.
[0098] The step of deriving may be performed based on the encoding parameters (p
x) representing the main signal (x). The step of deriving may be performed based directly
on the first predetermined linear combination (x). The multi-channel signals may comprise
more than two channels, whereby the main signal is based on a first predetermined
linear combination (x) of all the more than two channels, and the signal of each channel
is represented by a separate adaptive filter, optimised under the perceptual constraint.
[0099] There is further provided a method of decoding polyphonic signals comprising encoding
parameters (p
x) representing a main signal and encoded optimal parameters of a first adaptive filter
(60:1-60:N), comprising the steps of:
decoding the encoding parameters (px) representing the main signal;
generating a signal of a first channel (c"1-c"N) by applying the first adaptive filter (60:1-60:N) to the decoded main signal (x"),
characterised in that
the encoding parameters (p
x) further represent encoded optimal parameters of a second adaptive filter (60:1-60:N);
the method further comprising the step of generating a signal of a second channel
(c"
1-c"
N) by applying the second adaptive filter (60: 1-60:N) to the decoded main signal (x");
and
the first and second adaptive filters (30:1-30:N) being optimised under at least one
perceptual constraint selected from the group of gain constraint and shape constraint.
[0100] The method may further comprise the step of: generating a signal of a second channel
(c"
1-c"
N) as a predetermined linear combination of the decoded main signal (x") and the signal
of the first channel (c"
1-c"
N).
[0101] There is further provided an encoder apparatus (14), comprising:
input (16:1-16:N) for multi-channel signals (c1-cN) comprising at least a first and a second channel;
means (38) for generating encoding parameters (px) representing a main signal (x) being a first predetermined linear combination of
signals of the multi-channel signals (c1-cN), which means (38) for generating being connected to the input (16:1-16:N);
means (31; 131, 132, 133:1-2) for deriving optimal parameters of a first adaptive
filter;
means (66) for encoding the optimal parameters; and
output means (52);
characterised by:
means (31; 131, 132, 133:1-2) for deriving optimal parameters of a second adaptive
filter;
the first adapting filter giving minimum difference between the signal of the first
channel (c1-cN) and the filter output signal when the first adaptive filter is applied on the first
predetermined linear combination (x);
the minimum difference being defined according to a first criterion;
the second adapting filter giving minimum difference between the signal of the second
channel (C1-CN) and the filter output signal when the second adaptive filter is applied on the first
predetermined linear combination (x);
the minimum difference being defined according to a second criterion;
whereby the means (31; 131, 132, 133:1-2) for deriving optimal parameters of said
first and said second adaptive filters being arranged for deriving the optimal parameters
under at least one perceptual constraint selected from the group of gain constraint
and shape constraint.
[0102] There is further provided a decoder apparatus (24), comprising:
input (54) for encoding parameters (px) representing a main signal (x) and encoded optimal parameters of a first adaptive
filter;
means (64) for decoding the encoding parameters (px) representing a main signal (x);
means (60:1-60:N) for generating signals of a first channel by applying the first
adaptive filter to the decoded main signal (x"),
characterised in that
the encoding parameters (px) further represent encoded optimal parameters of a second adaptive filter (60:1-60:N);
the decoder apparatus further comprising means (60:1-60:N) for generating signals
of a second channel by applying the second adaptive filter to the decoded main signal
(x"); and
the first and second adaptive filters being optimised under at least one perceptual
constraint selected from the group of gain constraint and shape constraint.
1. A method of coding multi-channel signals (c
1-c
N) comprising at least a first and a second channel, comprising the steps of:
generating encoding parameters (px) representing a main signal (x) being a first predetermined linear combination of
signals of the multi-channel signals (c1-cN);
deriving optimal parameters (p1-pN) of a first adaptive filter (31; 131, 132, 133:1-2); and encoding the optimal parameters
(p1-pN),
characterised by the further step of:
deriving optimal parameters (p1-pN) of at least a second adaptive filter (31; 131, 132, 133:1-2);
said first adaptive filter (31; 131, 132, 133:1-2) being derived to give a first minimum
difference between the signal of the first channel (c1-cN) and a filter output signal when the first adaptive filter (31; 131, 132, 133:1-2)
is applied on the first predetermined linear combination (x);
the first minimum difference being defined according to a first criterion;
said second adaptive filter being derived to give a second minimum difference between
the signal of the second channel (c1-cN) and a filter output signal when the second adaptive filter is applied on the first
predetermined linear combination (x);
the second minimum difference being defined according to a second criterion; and
whereby the deriving steps of said first and said second adaptive filters (31; 131,
132, 133:1-2) are performed under a gain constraint, whereby the gain constraint preserves
the relative energy between the channels.
2. A method according to claim 1, characterised in that at least one of the first criterion and the second criterion is a least mean square
criterion.
3. A method according to claim 1 or 2, characterised in that the gain constraint strives to give a total energy of the filter output signal equal
to a total energy of the signal of the first channel.
4. A method according to claim 3, characterised in that the gain constraint is an absolute constraint, demanding that the total energy of
the adaptive filter output signal is equal to the total energy of the signal of the
corresponding channel.
5. A method according to claim 3, characterised in that the gain constraint is a soft constraint, favouring adaptive filters giving the total
energy of the adaptive filter output signal close to the total energy of the signal
of the corresponding channel.
6. A method according to claim 3, characterised in that the gain constraint is imposed as a gain factor (gc1, gcN) times an adaptive filter derived without gain constraints.
7. A method according to claim 6,
characterised in that the gain constrained filter

is given by:

where

is the adaptive filter derived without gain constraints,
Ec a prescribed energy of the adaptive filter output signal and
ĉuc(
n) is an adaptive filter output of main signal
x(
n) without gain constraints.
8. A method according to any of the claims 1 to 7, characterised in that the step of encoding the optimal parameters (p1-pN) comprises jointly coding of the optimal parameters of the first and second adaptive
filters.
9. A method according to any of the claims 1 to 8, characterised in that the step of deriving is performed based on the encoding parameters (px) representing the main signal (x).
10. A method according to any of the claims 1 to 8, characterised in that the step of deriving is performed based directly on the first predetermined linear
combination (x).
11. A method according to any of the claims 1 to 10, characterised in that the multi-channel signals comprise more than two channels, whereby the main signal
is based on a first predetermined linear combination (x) of all the more than two
channels, and the signal of each channel is represented by a separate adaptive filter,
optimised under the perceptual constraint.
12. A method of decoding polyphonic signals comprising encoding parameters (p
x) representing a main signal and encoded optimal parameters of a first adaptive filter
(60:1-60:N), comprising the steps of:
decoding the encoding parameters (px) representing the main signal;
generating a signal of a first channel (c"1-c"N) by applying the first adaptive filter (60:1-60:N) to the decoded main signal (x"),
characterised in that
the encoding parameters (px) further represent encoded optimal parameters of a second adaptive filter (60:1-60:N);
the method further comprising the step of generating a signal of a second channel
(c"1-c"N) by applying the second adaptive filter (60:1-60:N) to the decoded main signal (x");
and
the first and second adaptive filters (30:1-30:N) being optimised under a gain constraint,
whereby the gain constraint preserves the relative energy between the channels.
13. A method according to claim 12,
characterised by the step of:
generating a signal of a second channel (c"1-c"N) as a predetermined linear combination of the decoded main signal (x") and the signal
of the first channel (c"1-c"N).
14. Encoder apparatus (14), comprising:
input (16:1-16:N) for multi-channel signals (c1-cN) comprising at least a first and a second channel;
means (38) for generating encoding parameters (px) representing a main signal (x) being a first predetermined linear combination of
signals of the multi-channel signals (c1-cN), which means (38) for generating being connected to the input (16:1-16:N);
means (31; 131, 132, 133:1-2) for deriving optimal parameters of a first adaptive
filter;
means (66) for encoding the optimal parameters; and
output means (52);
characterised by:
means (31; 131, 132, 133:1-2) for deriving optimal parameters of a second adaptive
filter;
the first adapting filter giving a first minimum difference between the signal of
the first channel (c1-cN) and the filter output signal when the first adaptive filter is applied on the first
predetermined linear combination (x);
the first minimum difference being defined according to a first criterion;
the second adapting filter giving a second minimum difference between the signal of
the second channel (c1-cN) and the filter output signal when the second adaptive filter is applied on the first
predetermined linear combination (x);
the second minimum difference being defined according to a second criterion;
whereby the means (31; 131, 132, 133:1-2) for deriving optimal parameters of said
first and said second adaptive filters being arranged for deriving the optimal parameters
under a gain constraint, whereby the gain constraint preserves the relative energy
between the channels.
15. Decoder apparatus (24), comprising:
input (54) for encoding parameters (px) representing a main signal (x) and encoded optimal parameters of a first adaptive
filter;
means (64) for decoding the encoding parameters (px) representing a main signal (x);
means (60:1-60:N) for generating signals of a first channel by applying the first
adaptive filter to the decoded main signal (x"),
characterised in that
the encoding parameters (px) further represent encoded optimal parameters of a second adaptive filter (60:1-60:N);
the decoder apparatus further comprising means (60:1-60:N) for generating signals
of a second channel by applying the second adaptive filter to the decoded main signal
(x"); and
the first and second adaptive filters being optimised under a gain constraint, whereby
the gain constraint preserves the relative energy between the channels.