TECHNICAL FIELD OF THE INVENTION
[0001] The present invention generally relates to audio encoding and decoding techniques,
and more particularly to multi-channel audio encoding such as stereo coding.
BACKGROUND OF THE INVENTION
[0002] There is a high market need to transmit and store audio signals at low bit rates
while maintaining high audio quality. Particularly, in cases where transmission resources
or storage is limited low bit rate operation is an essential cost factor. This is
typically the case, for example, in streaming and messaging applications in mobile
communication systems such as GSM, UMTS, or CDMA.
[0003] A general example of an audio transmission system using multi-channel coding and
decoding is schematically illustrated in Fig. 1. The overall system basically comprises
a multi-channel audio encoder 100 and a transmission module 10 on the transmitting
side, and a receiving module 20 and a multi-channel audio decoder 200 on the receiving
side.
[0004] The simplest way of stereophonic or multi-channel coding of audio signals is to encode
the signals of the different channels separately as individual and independent signals,
as illustrated in Fig. 2. However, this means that the redundancy among the plurality
of channels is not removed, and that the bit-rate requirement will be proportional
to the number of channels.
[0005] Another basic way used in stereo FM radio transmission and which ensures compatibility
with legacy mono radio receivers is to transmit a sum and a difference signal of the
two involved channels.
[0006] State-of-the art audio codecs such as MPEG-1/2 Layer III and MPEG-2/4 AAC make use
of so-called joint stereo coding. According to this technique, the signals of the
different channels are processed jointly rather than separately and individually.
The two most commonly used joint stereo coding techniques are known as 'Mid/Side'
(M/S) Stereo and intensity stereo coding which usually are applied on sub-bands of
the stereo or multi-channel signals to be encoded.
[0007] M/S stereo coding is similar to the described procedure in stereo FM radio, in a
sense that it encodes and transmits the sum and difference signals of the channel
sub-bands and thereby exploits redundancy between the channel sub-bands. The structure
and operation of a coder based on M/S stereo coding is described, e.g. in reference
[1].
[0008] Intensity stereo on the other hand is able to make use of stereo irrelevancy. It
transmits the joint intensity of the channels (of the different sub-bands) along with
some location information indicating how the intensity is distributed among the channels.
Intensity stereo does only provide spectral magnitude information of the channels,
while phase information is not conveyed. For this reason and since temporal inter-channel
information (more specifically the inter-channel time difference) is of major psychoacoustical
relevancy particularly at lower frequencies, intensity stereo can only be used at
high frequencies above e.g. 2 kHz. An intensity stereo coding method is described,
e.g. in reference [2].
[0009] A recently developed stereo coding method called Binaural Cue Coding (BCC) is described
in reference [3]. This method is a parametric multi-channel audio coding method. The
basic principle of this kind of parametric coding technique is that at the encoding
side the input signals from N channels are combined to one mono signal. The mono signal
is audio encoded using any conventional monophonic audio codec. In parallel, parameters
are derived from the channel signals, which describe the multi-channel image. The
parameters are encoded and transmitted to the decoder, along with the audio bit stream.
The decoder first decodes the mono signal and then regenerates the channel signals
based on the parametric description of the multi-channel image.
[0010] The principle of the Binaural Cue Coding (BCC) method is that it transmits the encoded
mono signal and so-called BCC parameters. The BCC parameters comprise coded inter-channel
level differences and inter-channel time differences for sub-bands of the original
multi-channel input signal. The decoder regenerates the different channel signals
by applying sub-band-wise level and phase and/or delay adjustments of the mono signal
based on the BCC parameters. The advantage over e.g. M/S or intensity stereo is that
stereo information comprising temporal inter-channel information is transmitted at
much lower bit rates. However, BCC is computationally demanding and generally not
perceptually optimized.
[0011] Another technique, described in reference [4] uses the same principle of encoding
of the mono signal and so-called side information. In this case, the side information
consists of predictor filters and optionally a residual signal. The predictor filters,
estimated by an LMS algorithm, when applied to the mono signal allow the prediction
of the multi-channel audio signals. With this technique one is able to reach very
low bit rate encoding of multi-channel audio sources, however at the expense of a
quality drop.
[0012] The basic principles of such parametric stereo coding are illustrated in Fig. 3,
which displays a layout of a stereo codec, comprising a down-mixing module 120, a
core mono codec 130, 230 and a parametric stereo side information encoder/decoder
140, 240. The down-mixing transforms the multi-channel (in this case stereo) signal
into a mono signal. The objective of the parametric stereo codec is to reproduce a
stereo signal at the decoder given the reconstructed mono signal and additional stereo
parameters.
[0013] Finally, for completeness, a technique is to be mentioned that is used in 3D audio.
This technique synthesizes the right and left channel signals by filtering sound source
signals with so-called head-related filters. However, this technique requires the
different sound source signals to be separated and can thus not generally be applied
for stereo or multi-channel coding.
[0014] US 5 974 380 relates to a multi-channel audio encoder with global bit allocation over time, (lower
and higher) frequency and channels to encode/decode a data stream to generate high
fidelity reconstructed audio. The coder filters audio frames into baseband and high
frequency ranges, and employs a high frequency encoding stage for encoding the high
frequency part independently of the baseband part.
[0015] WO 02/23528 relates to multi-channel linear predictive analysis-by-synthesis encoding, in which
inter-channel correlation is detected, and one of several possible encoding modes
is selected based on the correlation, and bits are adaptively distributed between
channel-specific fixed codebooks and a shared fixed codebook depending on the selected
encoding mode. By way of example, for low correlation, channel-specific fixed codebooks
are used, and for high correlation the shared fixed codebook is used.
[0016] WO 03/090207 relates to encoding of multi-channel audio signals into a monaural audio signal and
additional information allowing recovery of the multi-channel audio signal. The additional
information is generated by determining a first portion for a first frequency region
and a second portion for a second frequency region, where the second region is a sub-range
of the first region. The information is multi-layered to enable a scaling of the decoding
quality versus bit rate. The first portion forms a base layer always present, and
the second portion forms an enhancement layer which is encoded only if the bit rate
of the encoded base layer and enhancement layer is not higher than a maximum allowable
bit rate.
SUMMARY OF THE INVENTION
[0017] The present invention overcomes these and other drawbacks of the prior art arrangements.
[0018] It is a general object of the present invention to provide high multi-channel audio
quality at low bit rates.
[0019] In particular it is desirable to provide an efficient encoding process that is capable
of accurately representing stereophonic or multi-channel information using a relatively
low number of encoding bits. For stereo coding, for example, it is important that
the dynamics of the stereo image are well represented so that the quality of stereo
signal reconstruction is enhanced.
[0020] It is also an object of the invention to make efficient use of the available bit
budget for a multi-stage side signal encoder.
[0021] It is a particular object of the invention to provide a method and apparatus for
encoding a multi-channel audio signal as defined in claims 1 and 18.
[0022] Another particular object of the invention is to provide a method and apparatus for
decoding an encoded multi-channel audio signal as defined in claims 17 and 34
[0023] Yet another object of the invention is to provide an improved audio transmission
system based on audio encoding and decoding techniques as defined in claim 35.
[0024] These and other objects are met by the invention as defined by the accompanying patent
claims.
[0025] Today, there are no standardized codecs available providing high stereophonic or
multi-channel audio quality at bit rates which are economically interesting for use
in e.g. mobile communication systems. What is possible with available codecs is monophonic
transmission and/or storage of the audio signals. To some extent also stereophonic
transmission or storage is available, but bit rate limitations usually require limiting
the stereo representation quite drastically.
[0026] The invention overcomes these problems by proposing a solution, which allows to separate
stereophonic or multi-channel information from the audio signal and to accurately
represent it with a low bit rate.
[0027] A basic idea of the invention is to provide a highly efficient technique for encoding
a multi-channel audio signal. The invention relies on the basic principle of encoding
a first signal representation of one or more of the multiple channels in a first signal
encoding process and encoding a second signal representation of one or more of the
multiple channels in a second, multi-stage, signal encoding process. This procedure
is significantly enhanced by adaptively allocating a number of encoding bits among
the different encoding stages of the second, multi-stage, signal encoding process
in dependence on multi-channel audio signal characteristics.
[0028] For example, if the performance of one of the stages in the multi-stage encoding
process is saturating, there is no use to increase the number of bits allocated for
encoding/quantization at this particular encoding stage. Instead it may be better
to allocate more bits to another encoding stage in the multi-stage encoding process
so as to provide a greater overall improvement in performance. For this reason it
has turned out to be particularly beneficial to perform bit allocation based on estimated
performance of at least one encoding stage. The allocation of bits to a particular
encoding stage may for example be based on estimated performance of that encoding
stage. Alternatively, however, the encoding bits are jointly allocated among the different
encoding stages based on the overall performance of a combination of encoding stages.
[0029] For example, the first encoding process may be a main encoding process and the first
signal representation may be a main signal representation. The second encoding process,
which is a multi-stage process, may for example be a side signal process, and the
second signal representation may then be a side signal representation such as a stereo
side signal.
[0030] Preferably, the bit budget available for the second, multi-stage, signal encoding
process is adaptively allocated among the different encoding stages based on inter-channel
correlation characteristics of the multi-channel audio signal. This is particularly
useful when the second multi-stage signal encoding process includes a parametric encoding
stage such as an inter-channel prediction (ICP) stage. In the event of low inter-channel
correlation, the parametric (ICP) filter, as a means for multi-channel or stereo coding,
will normally produce a relatively poor estimate of the target signal. Therefore,
increasing the number of allocated bits for filter quantization does not lead to significantly
better performance. The effect of saturation of performance of the ICP filter and
in general of parametric coding makes these techniques quite inefficient in terms
of bit usage. In fact, the bits could be used for different encoding in another encoding
stage, such as e.g. non-parametric coding, which in turn could result in greater overall
improvement in performance.
[0031] In a particular embodiment, the invention involves a hybrid parametric and non-parametric
encoding process and overcomes the problem of parametric quality saturation by exploiting
the strengths of (inter-channel prediction) parametric representations and non-parametric
representations based on efficient allocation of available encoding bits among the
parametric and non-parametric encoding stages.
[0032] Preferably, the procedure of allocating bits to a particular encoding stage is based
on assessment of estimated performance of the encoding stage as a function of the
number of bits to be allocated to the encoding stage.
[0033] In general, the bit-allocation can also be made dependent on performance of an additional
stage or the overall performance of two or more stages. For example, the bit allocation
can be based on the overall performance of the combination of both parametric and
non-parametric representations.
[0034] For example, consider the case of a first adaptive inter-channel prediction (ICP)
stage for second-signal prediction. The estimated performance of the ICP encoding
stage is normally based on determining a relevant quality measure. Such a quality
measure could for example be estimated based on the so-called second-signal prediction
error, preferably together with an estimation of a quantization error as a function
of the number of bits allocated for quantization of second signal reconstruction data
generated by the inter-channel prediction. The second signal reconstruction data is
typically the inter-channel prediction (ICP) filter coefficients.
[0035] In a particularly advantageous embodiment, the second, multi-stage, signal encoding
process further comprises an encoding process in a second encoding stage for encoding
a representation of the signal prediction error from the first stage.
[0036] The second signal encoding process normally generates output data representative
of the bit allocation, as this will be needed on the decoding side to correctly interpret
the encoded/quantized information in the form of second signal reconstruction data.
On the decoding side, a decoder receives bit allocation information representative
of how the bit budget has been allocated among the different signal encoding stages
during the second signal encoding process. This bit allocation information is used
for interpreting the second signal reconstruction data in a corresponding second,
multi-stage, signal decoding process for the purpose of correctly decoding the second
signal representation.
[0037] For further improvement of the multi-channel audio encoding mechanism, it is also
possible to use an efficient variable dimension/variable-rate bit allocation based
on the performance of the second encoding process or at least one of the encoding
stages thereof. In practice, this normally means that a combination of number of bits
to be allocated to the first encoding stage and filter dimension/length is selected
so as to optimize a measure representative of the performance of the first stage or
a combination of stages. The use of longer filters lead to better performance, but
the quantization of a longer filter yields a larger quantization error if the bit-rate
is fixed. With increased filter length, comes the possibility of increased performance,
but to reach it more bits are needed. There will be a trade-off between selected filter
dimension/length and the imposed quantization error, and the idea is to use a performance
measure and find an optimum value by varying the filter length and the required amount
of bits accordingly.
[0038] Although bit allocation and encoding/decoding is often performed on a frame-by-frame
basis, it is possible to perform bit allocation and encoding/decoding on variable
sized frames, allowing signal adaptive optimized frame processing.
[0039] In particular, variable filter dimension and bit-rate can be used on fixed frames
but also on variable frame lengths.
[0040] For variable frame lengths, an encoding frame can generally be divided into a number
of sub-frames according to various frame division configurations. The sub-frames may
have different sizes, but the sum of the lengths of the sub-frames of any given frame
division configuration is equal to the length of the overall encoding frame. In a
preferred exemplary embodiment of the invention, the idea is to select a combination
of frame division configuration, as well as bit allocation and filter length/dimension
for each sub-frame, so as to optimize a measure representative of the performance
of the considered second encoding process (i.e. at least one of the signal encoding
stages thereof) over an entire encoding frame. The second signal representation is
then encoded separately for each of the sub-frames of the selected frame division
configuration in accordance with the selected combination of bit allocation and filter
dimension. In addition to the general high-quality, low bit-rate performance offered
by the signal adaptive bit allocation of the present invention, a significant advantage
of the variable frame length processing scheme is that the dynamics of the stereo
or multi-channel image is very well represented.
[0041] The second signal encoding process here preferably generates output data, for transfer
to the decoding side, representative of the selected frame division configuration,
and for each sub-frame of the selected frame division configuration, bit allocation
and filter length. However, to reduce the bit-rate requirements on signaling from
the encoding side to the decoding side in an audio transmission system, the filter
length, for each sub frame, is preferably selected in dependence on the length of
the sub-frame. This means that an indication of frame division configuration of an
encoding frame into a set of sub-frames at the same time provides an indication of
selected filter dimension for each sub-frame, thereby reducing the required signaling.
[0042] The invention offers the following advantages:
➢ Improved multi-channel audio encoding/decoding.
➢ Improved audio transmission system.
➢ Increased multi-channel audio reconstruction quality.
➢ High multi-channel audio quality at relatively low bit rates.
➢ Efficient use of the available bit budget for a multi-stage encoder such as a multi-stage
side signal encoder.
➢ Good representation of the dynamics of the stereo image
➢ Enhanced quality of stereo signal reconstruction.
[0043] Other advantages offered by the invention will be appreciated when reading the below
description of embodiments of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0044] The invention, together with further objects and advantages thereof, will be best
understood by reference to the following description taken together with the accompanying
drawings, in which:
Fig. 1 is a schematic block diagram illustrating a general example of an audio transmission
system using multi-channel coding and decoding.
Fig. 2 is a schematic diagram illustrating how signals of different channels are encoded
separately as individual and independent signals.
Fig. 3 is a schematic block diagram illustrating the basic principles of parametric
stereo coding.
Fig. 4 is a diagram illustrating the cross spectrum of mono and side signals.
Fig. 5 is a schematic block diagram of a multi-channel encoder according to an exemplary
preferred embodiment of the invention.
Fig. 6 is a schematic flow diagram setting forth a basic multi-channel encoding procedure
according to a preferred embodiment of the invention.
Fig. 7 is a schematic flow diagram setting forth a corresponding multi-channel decoding
procedure according to a preferred embodiment of the invention.
Fig. 8 is a schematic block diagram illustrating relevant parts of a (stereo) encoder
according to an exemplary preferred embodiment of the invention.
Fig. 9 is a schematic block diagram illustrating relevant parts of a (stereo) decoder
according to an exemplary preferred embodiment of the invention.
Fig. 10A illustrates side signal estimation using inter-channel prediction (FIR) filtering.
Fig. 10B illustrates an audio encoder with mono encoding and multi-stage hybrid side
signal encoding.
Fig. 11A is a frequency-domain diagram illustrating a mono signal and a side signal
and the inter-channel correlation, or cross-correlation, between the mono and side
signals.
Fig. 11B is a time-domain diagram illustrating the predicted side signal along with
the original side signal corresponding to the case of Fig. 11A.
Fig. 11C is frequency-domain diagram illustrating another mono signal and side signal
and their cross-correlation.
Fig. 11D is a time-domain diagram illustrating the predicted side signal along with
the original side signal corresponding to the case of Fig. 11C.
Fig. 12 is a schematic diagram illustrating an adaptive bit allocation controller,
in association with a multi-stage side encoder, according to a particular exemplary
embodiment of the invention.
Fig. 13 is a schematic diagram illustrating the quality of a reconstructed side signal
as a function of bits used for quantization of the ICP filter coefficients.
Fig. 14 is a schematic diagram illustrating prediction feasibility.
Fig. 15 illustrates a stereo decoder according to preferred exemplary embodiment of
the invention.
Fig. 16 illustrates an example of an obtained average quantization and prediction
error as a function of the filter dimension.
Fig. 17 illustrates the total quality achieved when quantizing different dimensions
with different number of bits.
Fig. 18 is a schematic diagram illustrating an example of multi-stage vector encoding.
Fig. 19 is a schematic timing chart of different frame divisions in a master frame.
Fig. 20 illustrates different frame configurations according to an exemplary embodiment
of the invention.
DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
[0045] Throughout the drawings, the same reference characters will be used for corresponding
or similar elements.
[0046] The invention relates to multi-channel encoding/decoding techniques in audio applications,
and particularly to stereo encoding/decoding in audio transmission systems and/or
for audio storage. Examples of possible audio applications include phone conference
systems, stereophonic audio transmission in mobile communication systems, various
systems for supplying audio services, and multi-channel home cinema systems.
[0047] For a better understanding of the invention, it may be useful to begin with a brief
overview and analysis of problems with existing technology. Today, there are no standardized
codecs available providing high stereophonic or multi-channel audio quality at bit
rates which are economically interesting for use in e.g. mobile communication systems,
as mentioned previously. What is possible with available codecs is monophonic transmission
and/or storage of the audio signals. To some extent also stereophonic transmission
or storage is available, but bit rate limitations usually require limiting the stereo
representation quite drastically.
[0048] The problem with the state-of-the-art multi-channel coding techniques is that they
require high bit rates in order to provide good quality. Intensity stereo, if applied
at low bit rates as low as e.g. only a few kbps suffers from the fact that it does
not provide any temporal inter-channel information. As this information is perceptually
important for low frequencies below e.g. 2 kHz, it is unable to provide a stereo impression
at such low frequencies.
[0049] BCC on the other hand is able to reproduce the stereo or multi-channel image even
at low frequencies at low bit rates of e.g. 3 kbps since it also transmits temporal
inter-channel information. However, this technique requires computationally demanding
time-frequency transforms on each of the channels both at the encoder and the decoder.
Moreover, BCC does not attempt to find a mapping from the transmitted mono signal
to the channel signals in a sense that their perceptual differences to the original
channel signals are minimized.
[0050] The LMS technique, also referred to as inter-channel prediction (ICP), for multi-channel
encoding, see [4], allows lower bit rates by omitting the transmission of the residual
signal. To derive the channel reconstruction filter, an unconstrained error minimization
procedure calculates the filter such that its output signal matches best the target
signal. In order to compute the filter, several error measures may be used. The mean
square error or the weighted mean square error are well known and are computationally
cheap to implement.
[0051] One could say that in general, most of the state-of-the-art methods have been developed
for coding of high-fidelity audio signals or pure speech. In speech coding, where
the signal energy is concentrated in the lower frequency regions, sub-band coding
is rarely used. Although methods as BCC allow for low bit-rate stereo speech, the
sub-band transform coding processing increases both complexity and delay.
[0052] There has been a long debate on whether linear inter-channel prediction (ICP) applied
to audio coding would increase the compression rate for multi-channel signals.
[0053] Research concludes that even though ICP coding techniques do not provide good results
for high-quality stereo signals, for stereo signals with energy concentrated in the
lower frequencies, redundancy reduction is possible [7]. The whitening effects of
the ICP filtering increase the energy in the upper frequency regions, resulting in
a net coding loss for perceptual transform coders. These results have been confirmed
in [9] and [10] where quality enhancements have been reported only for speech signals.
[0054] The accuracy of the ICP reconstructed signal is governed by the present inter-channel
correlations. Bauer
et al. [11] did not find any linear relationship between left and right channels in audio
signals. However, as can be seen from the cross spectrum of the mono and side signals
in Fig. 4, strong inter-channel correlation is found in the lower frequency regions
(0 - 2000 Hz) for speech signals.
[0055] In the event of low inter-channel correlations, the ICP filter, as means for stereo
coding, will produce a poor estimate of the target signal. The produced estimate is
poor even before quantization of the filters. Therefore increasing the number of allocated
bits for filter quantization does not lead to better performance or the improvement
in performance is quite small.
[0056] This effect of saturation of performance of ICP and in general of parametric methods
makes these techniques quite inefficient in terms of bit usage. Some bits could be
used for e.g. non-parametric coding techniques instead, which in turn could result
in greater overall improvement in performance. Moreover, these parametric techniques
are not asymptotically optimal since even at a high bit rate, characteristic artifacts
inherent in the coding method will not disappear.
[0057] Fig. 5 is a schematic block diagram of a multi-channel encoder according to an exemplary
preferred embodiment of the invention. The multi-channel encoder basically comprises
an optional pre-processing unit 110, an optional (linear) combination unit 120, a
first encoder 130, at least one additional (second) encoder 140, a controller 150
and an optional multiplexor (MUX) unit 160.
[0058] The multi-channel or polyphonic signal may be provided to the optional pre-processing
unit 110, where different signal conditioning procedures may be performed. The signals
of the input channels can be provided from an audio signal storage (not shown) or
"live", e.g. from a set of microphones (not shown). The audio signals are normally
digitized, if not already in digital form, before entering the multi-channel encoder.
[0059] The (optionally pre-processed) signals may be provided to an optional signal combination
unit 120, which includes a number of combination modules for performing different
signal combination procedures, such as linear combinations of the input signals to
produce at least a first signal and a second signal. For example, the first encoding
process may be a main encoding process and the first signal representation may be
a main signal representation. The second encoding process, which is a multi-stage
process, may for example be an auxiliary (side) signal process, and the second signal
representation may then be an auxiliary (side) signal representation such as a stereo
side signal. In traditional stereo coding, for example, the L and R channels are summed,
and the sum signal is divided by a factor of two in order to provide a traditional
mono signal as the first (main) signal. The L and R channels may also be subtracted,
and the difference signal is divided by a factor of two to provide a traditional side
signal as the second signal. According to the invention, any type of linear combination,
or any other type of signal combination for that matter, may be performed in the signal
combination unit with weighted contributions from at least part of the various channels.
The signal combination used by the invention is not limited to two channels but may
of course involve multiple channels. It is also possible to generate more than one
additional (side) signal, as indicated in Fig. 5. It is even possible to use one of
the input channels directly as a first signal, and another one of the input channels
directly as a second signal. For stereo coding, for example, this means that the L
channel may be used as main signal and the R channel may be used as side signal, or
vice versa. A multitude of other variations also exist.
[0060] A first signal representation is provided to the first encoder 130, which encodes
the first (main) signal according to any suitable encoding principles. Such principles
are available in the prior art and will therefore not be further discussed here.
[0061] A second signal representation is provided to a second, multi-stage, coder 140 for
encoding the second (auxiliary/side) signal.
[0062] The overall encoder also comprises a controller 150, which includes at least a bit
allocation module for adaptively allocating the available bit budget for the second,
multi-stage, signal encoding among the encoding stages of the multi-stage signal encoder
140. The multi-stage encoder may also be referred to as a multi-unit encoder having
two or more encoding units.
[0063] For example, if the performance of one of the stages in the multi-stage encoder 140
is saturating, there is little meaning to increase the number of bits allocated to
this particular encoding stage. Instead it may be better to allocate more bits to
another encoding stage in the multi-stage encoder to provide a greater overall improvement
in performance. For this reason it turns out to be particularly beneficial to perform
bit allocation based on estimated performance of at least one encoding stage. The
allocation of bits to a particular encoding stage may for example be based on estimated
performance of that encoding stage. Alternatively, however, the encoding bits are
jointly allocated among the different encoding stages based on the overall performance
of a combination of encoding stages.
[0064] Of course, there is an overall bit budget for the entire multi-channel encoder apparatus,
which overall bit budget is divided between the first encoder 130 and the multi-stage
encoder 140 and possible other encoder modules according to known principles. In the
following, we will mainly focus on how to allocate the bit budget available for the
multi-stage encoder among the different encoding stages thereof.
[0065] Preferably, the bit budget available for the second signal encoding process is adaptively
allocated among the different encoding stages of the multi-stage encoder based on
predetermined characteristics of the multi-channel audio signal such as inter-channel
correlation characteristics. This is particularly useful when the second multi-stage
encoder includes a parametric encoding stage such as an inter-channel prediction (ICP)
stage. In the event of low inter-channel correlation (e.g. between the first and second
signal representations of the input channels), the parametric filter, as a means for
multi-channel or stereo coding, will normally produce a relatively poor estimate of
the target signal. Therefore, increasing the number of allocated bits for filter quantization
does not lead to significantly better performance. The effect of saturation of the
performance of the (ICP) filter, and in general of parametric coding, makes these
techniques quite inefficient in terms of bit usage. In fact, the bits could be used
for different encoding in another encoding stage, such as e.g. non-parametric coding,
which in turn could result in greater overall improvement in performance.
[0066] In a particular embodiment, the invention involves a hybrid parametric and non-parametric
multi-stage signal encoding process and overcomes the problem of parametric quality
saturation by exploiting the strengths of parametric representations and non-parametric
coding based on efficient allocation of available encoding bits among the parametric
and non-parametric encoding stages.
[0067] For a particular encoding stage, bits may, as an example, be allocated based on the
following procedure:
■ estimating performance of the encoding stage as a function of the number of bits
assumed to be allocated to the encoding stage;
■ assessing estimated performance of the encoding stage; and
■ allocating a first amount of bits to the first encoding stage based on the assessment
of estimated performance.
[0068] If only two stages are used, and a first amount of bits have been allocated to a
first stage based on estimated performance, bits may be allocated to a second stage
by simply assigning the remaining amount of encoding bits to the second encoding stage.
[0069] In general, the bit-allocation can also be made dependent on performance of an additional
stage or the overall performance of two or more stages. In the former case, bits can
be allocated to an additional encoding stage based on estimated performance of the
additional stage. In the latter case, the bit allocation can be based for example
on the overall performance of the combination of both parametric and non-parametric
representations.
[0070] For example, the bit allocation may be determined as the allocation of bits among
the different stages of the multi-stage encoder when a change in bit allocation does
not lead to significantly better performance according to a suitable criterion. In
particular, with respect to performance saturation the number of bits to be allocated
to a certain stage may be determined as the number of bits when an increase of the
number of allocated bits does not lead to significantly better performance of that
stage according to a suitable criterion.
[0071] As discussed above, the second multi-stage encoder may include an adaptive inter-channel
prediction (ICP) stage for second-signal prediction based on the first signal representation
and the second signal representation, as indicated in Fig. 5. The first (main) signal
information may equivalently be deduced from the signal encoding parameters generated
by the first encoder 130, as indicated by the dashed line from the first encoder.
In this context, it may be suitable to use an error encoding stage in "sequence" with
the ICP stage. For example, a first adaptive ICP stage for signal prediction generates
signal reconstruction data based on the first and second signal representations, and
a second encoding stage generates further signal reconstruction data based on the
signal prediction error.
[0072] Preferably, the controller 150 is configured to perform bit allocation in response
to the first signal representation and the second signal representation and the performance
of one or more stages in the multi-stage (side) encoder 140.
[0073] As illustrated in Fig. 5, a plural number N of signal representations (including
also the case when respective input channels are provided directly as separate signals)
may be provided. Preferably, the first signal representation is a main signal, and
the remaining
N-1 signal representations are auxiliary signals such as side signals. Each auxiliary
signal is preferably encoded separately in a dedicated auxiliary (side) encoder, which
may or may not be a multi-stage encoder with adaptively controlled bit allocation.
[0074] The output signals of the various encoders 130, 140, including bit allocation information
from the controller 150, are preferably multiplexed into a single transmission (or
storage) signal in the multiplexer unit 160. However, alternatively, the output signals
may be transmitted (or stored) separately.
[0075] In an extension of the invention it may also be possible to select a combination
of bit allocation and filter dimension/length to be used (e.g. for inter-channel prediction)
so as to optimize a measure representative of the performance of the second signal
encoding process. There will be a trade-off between selected filter dimension/length
and the imposed quantization error, and the idea is to use a performance measure and
find an optimum value by varying the filter length and the required amount of bits
accordingly.
[0076] Although encoding/decoding and the associated bit allocation is often performed on
a frame-by-frame basis, it is envisaged that encoding/decoding and bit allocation
can be performed on variable sized frames, allowing signal adaptive optimized frame
processing. This also enables the possibility to provide an even higher degree of
freedom to optimize the performance measure, as will be explained later on.
[0077] Fig. 6 is a schematic flow diagram setting forth a basic multi-channel encoding procedure
according to a preferred embodiment of the invention. In step S1, a first signal representation
of one or more audio channels is encoded in a first signal encoding process. In step
S2, the available bit budget for second signal encoding is allocated among the different
stages of a second, multi-stage, signal encoding process in dependence on multi-channel
input signal characteristics such as inter-channel correlation, as outlined above.
The allocation of bits among the different stages may generally vary on a frame-to-frame
basis. Further detailed embodiments of the bit allocation proposed by the invention
will be described later on. In step S3, the second signal representation is encoded
in the second, multi-stage, signal encoding process accordingly.
[0078] Fig. 7 is a schematic flow diagram setting forth a corresponding multi-channel decoding
procedure according to a preferred embodiment of the invention. In step S11, the encoded
first signal representation is decoded in a first signal decoding process in response
to first signal reconstruction data received from the encoding side. In step S12,
dedicated bit allocation information is received from the encoding side. The bit allocation
information is representative of how the bit budget for second-signal encoding has
been allocated among the different encoding stages on the encoding side. In step S13,
second signal reconstruction data received from the encoding side is interpreted based
on the received bit allocation information. In step S 14, the encoded second signal
representation is decoded in a second, multi-stage, signal decoding process based
on the interpreted second signal reconstruction data.
[0079] The overall decoding process is generally quite straight forward and basically involves
reading the incoming data stream, interpreting data, inverse quantization and final
reconstruction of the multi-channel audio signal. More details on the decoding procedure
will be given later on with reference to an exemplary embodiment of the invention.
[0080] Although the following description of exemplary embodiments mainly relates to stereophonic
(two-channel) encoding and decoding, it should be kept in mind that the invention
is generally applicable to multiple channels. Examples include but are not limited
to encoding/decoding 5.1 (front left, front centre, front right, rear left and rear
right and subwoofer) or 2.1 (left, right and center subwoofer) multi-channel sound.
[0081] Fig. 8 is a schematic block diagram illustrating relevant parts of a (stereo) encoder
according to an exemplary preferred embodiment of the invention. The (stereo) encoder
basically comprises a first (main) encoder 130 for encoding a first (main) signal
such as a typical mono signal, a second multi-stage (auxiliary/side) encoder 140 for
(auxiliary/side) signal encoding, a controller 150 and an optional multiplexor unit
160. In this particular example, the auxiliary/side encoder 140 comprises two (or
more) stages 142, 144. The first stage 142, stage A, generates side signal reconstruction
data such as quantized filter coefficients in response to the main signal and the
side signal. The second stage 144, stage B, is preferably a residual coder, which
encodes/quantizes the residual error from the first stage 142, and thereby generates
additional side signal reconstruction data for enhanced stereo reconstruction quality.
The controller 150 comprises a bit allocation module, an optional module for controlling
filter dimension and an optional module for controlling variable frame length processing.
The controller 150 provides at least bit allocation information representative of
how the bit budget available for side signal encoding is allocated among the two encoding
stages 142, 144 of the side encoder 140 as output data. The set of information comprising
quantized filter coefficients, quantized residual error and bit allocation information
is preferably multiplexed together with the main signal encoding parameters into a
single transmission or storage signal in the multiplexor unit 160.
[0082] Fig. 9 is a schematic block diagram illustrating relevant parts of a (stereo) decoder
according to an exemplary preferred embodiment of the invention. The (stereo) decoder
basically comprises an optional demultiplexor unit 210, a first (main) decoder 230,
a second (auxiliary/side) decoder 240, a controller 250, an optional signal combination
unit 260 and an optional post-processing unit 270. The demultiplexor 210 preferably
separates the incoming reconstruction information such as first (main) signal reconstruction
data, second (auxiliary/side) signal reconstruction data and control information such
as bit allocation information. The first (main) decoder 230 "reconstructs" the first
(main) signal in response to the first (main) signal reconstruction data, usually
provided in the form of first (main) signal representing encoding parameters. The
second (auxiliary/side) decoder 240 preferably comprises two (or more) decoding stages
242, 244. The decoding stage 244, stage B, "reconstructs" the residual error in response
to encoded/quantized residual error information. The decoding stage 242, stage A,
"reconstructs" the second signal in response to the quantized filter coefficients,
the reconstructed first signal representation and the reconstructed residual error.
The second decoder 240 is also controlled by the controller 250. The controller receives
information on bit allocation, and optionally also on filter dimension and frame length
from the encoding side, and controls the side decoder 240 accordingly.
[0083] For a more thorough understanding of the invention, the invention will now be described
in more detail with reference to various exemplary embodiments based on parametric
coding principles such as inter-channel prediction.
Parametric Stereo Coding Using Inter-channel Prediction
[0084] In general, inter-channel prediction (ICP) techniques utilize the inherent inter-channel
correlation between the channels. In stereo coding, channels are usually represented
by the left and the right signals
l(n), r(n), an equivalent representation is the mono signal
m(n) (a special case of the main signal) and the side signal
s(n). Both representations are equivalent and are normally related by the traditional matrix
operation:

[0085] As illustrated in Fig. 10A, the ICP technique aims to represent the side signal
s(n) by an estimate
ŝ(
n), which is obtained by filtering the mono signal
m(n) through a time-varying FIR filter
H(z) having
N filter coefficients
h1(i):

[0086] It should be noted that the same approach could be applied directly on the left and
right channels.
[0087] The ICP filter derived at the encoder may for example be estimated by minimizing
the
mean squared error (MSE), or a related performance measure, for instance psycho-acoustically weighted
mean square error, of the side signal prediction error
e(n). The MSE is typically given by:

where
L is the frame size and
N is the length/order/dimension of the ICP filter. Simply speaking, the performance
of the ICP filter, thus the magnitude of the MSE, is the main factor determining the
final stereo separation. Since the side signal describes the differences between the
left and right channels, accurate side signal reconstruction is essential to ensure
a wide enough stereo image.
[0088] The optimal filter coefficients are found by minimizing the MSE of the prediction
error over all samples and are given by:

[0089] In (4) the
correlations vector r and the
covariance matrix R are defined as:

where

[0090] Inserting (5) into (3) one gets a simplified algebraic expression for the Minimum
MSE (MMSE) of the (unquantized) ICP filter:

where
Pss is the power of the side signal, also expressed as
sTs.
[0091] Inserting
r = Rhopt into (7) yields:

[0092] LDLT factorization [12] on
R gives us the equation system:

[0093] Where we first solve z in and iterative fashion:

[0094] Now we introduce a new vector
q =
LTh. Since the matrix
D only has non-zero values in the diagonal, finding
q is straightforward:

[0095] The sought filter vector
h can now be calculated iteratively in the same way as (10):

[0096] Besides the computational savings compared to regular matrix inversion, this solution
offers the possibility of efficiently calculating the filter coefficients corresponding
to different dimensions
n (filter lengths):

[0097] The optimal ICP (FIR) filter coefficients
hopt may be estimated, quantized and sent to the decoder on a frame-by-frame basis.
Multistage Hybrid Multi-channel Coding by Residual Coding
[0098] Fig. 10B illustrates an audio encoder with mono encoding and multi-stage hybrid side
signal encoding. The mono signal m(n) is encoded and quantized (Q
0) for transfer to the decoding side as usual. The ICP module for side signal prediction
provides a FIR filter representation
H(z) which is quantized (Q
1) for transfer to the decoding side. Additional quality can be gained by encoding
and/or quantizing (Q
2) the side signal prediction error
e(n). It should be noted that when the residual error is quantized, the coding can no longer
be referred to as purely parametric, and therefore the side encoder is referred to
as a hybrid encoder.
Adaptive bit allocation
[0099] The invention is based on the recognition that low inter-channel correlation may
lead to bad side signal prediction. On the other hand, high inter-channel correlation
usually leads to good side signal prediction.
[0100] Fig. 11A is a frequency-domain diagram illustrating a mono signal and a side signal
and the inter-channel correlation, simply referred to as cross-correlation, between
the mono and side signals. Fig. 11B is a corresponding time-domain diagram illustrating
the predicted side signal along with the original side signal.
[0101] Fig. 11C is frequency-domain diagram illustrating another mono signal and side signal
and their cross-correlation. Fig. 11D is a corresponding time-domain diagram illustrating
the predicted side signal along with the original side signal.
[0102] It can be seen that high inter-channel correlation yields a good estimate of the
target signal, whereas low inter-channel correlation yields a quite poor estimate
of the target signal. If the produced estimate is poor even before quantization of
the filter, there is usually no sense in allocating a lot of bits for filter quantization.
Instead it may be more useful to use at least part of the bits for different encoding
such as non-parametric encoding of the side signal prediction error, which could lead
to better overall performance. In the case of higher correlation, it may sometimes
be possible to quantize the filter with relatively few bits and still get a quite
good result. In other instances a larger amount of bits will have to be used for quantization
even if the correlation is relatively high, and it has to be decided if it is "economical"
from a bit allocation perspective to use this amount of bits.
[0103] In a particular exemplary embodiment, the codec is preferably designed based on combining
the strengths of both parametric stereo representation as provided by the ICP filters
and non-parametric representation such as residual error coding in a way that is made
adaptive in dependence on the characteristics of the stereo input signal.
[0104] Fig. 12 is a schematic diagram illustrating an adaptive bit allocation controller,
in association with a multi-stage side encoder, according to a particular exemplary
embodiment of the invention.
[0105] As hinted above, to fully exploit the available bit budget and in order to further
enhance the quality of the stereo signal reconstruction, at least a second quantizer
will have to be used to prevent all bits from going to the quantization of the prediction
filter. The use of a second quantizer provides an additional degree of freedom that
is exploited by the present invention. The multi-stage encoder thus includes a first
parametric stage with a filter such as an ICP filter and an associated first quantizer
Q
1, and a second stage based on a second quantizer Q
2.
[0106] Preferably, the prediction error of the ICP filter, i.e.
e(
n)
=s(
n)
-ŝ(
n), is quantized by using a non-parametric coder, typically a waveform coder or a transform
coder or a combination of both. It should though be understood that it is possible
to use other types of coding of the prediction error such as CELP (Code Excited Linear
Prediction) coding.
[0107] It is assumed that the total bit budget for the side signal encoding process is
B =
bICP +
b2, where
bICP is the number of bits for quantization of the ICP filter, and
b2 is the number of bits for quantization of the residual error
e(
n).
[0108] Optimally, the bits are jointly allocated among the different encoding stages based
on the overall performance of the encoding stages, as schematically indicated by the
inputs of
e(n) and
e2(n) into the bit allocation module of Fig. 12. It may be reasonable to strive for minimization
of the total error
e2(n) in a perceptually weighted sense.
[0109] In a simpler and more straightforward implementation, the bit allocation module allocates
bits to the first quantizer depending on the performance of the first parametric (ICP)
filtering procedure, and allocates the remaining bits to the second quantizer. Performance
of the parametric (ICP) filter is preferably based on a fidelity criterion such as
the MSE or perceptually weighted MSE of the prediction error
e(n).
[0110] The performance of the parametric (ICP) filter is typically varying with the characteristics
of the different signal frames as well as the available bit-rate.
[0111] For instance, in the event of low inter-channel correlations, the ICP filtering procedure
will produce a poor estimate of the target (side) signal even prior to filter quantization.
Thus, allocating more bits will not lead to big performance improvement. Instead,
it is better to allocate more bits to the second quantizer.
[0112] In other instances, the redundancy between the mono signal and the side signal is
fully removed by the sole use of the ICP filter quantized with a certain bit-rate,
and thus allocating more bits to the second quantizer would be inefficient.
[0113] The inherent limitations of the performance of ICP follow as a direct consequence
of the degree of correlation between the mono and the side signal. The performance
of the ICP is always limited by the maximum achievable performance provided by the
un-quantized filters.
[0114] Fig. 13 shows a typical case of how the performance of the quantized ICP filter varies
with the amount of bits. Any general fidelity criterion may be used. A fidelity criterion
in the form of a quality measure
Q may be used. Such a quality measure may for example be based on a signal-to-noise
(SNR) ratio, and is then denoted
Qsnr. For example, a quality measure based on a ratio between the power of the side signal
and the MSE of the side signal prediction error e(n):

[0115] There is a minimum bit-rate
bmin for which the use of ICP provides an improvement which is characterized by a value
for
Qsnr which is greater than 1, i.e. 0 dB.. Obviously, when the bit-rate increases, the
performance reaches that of the unquantized filter
Qmax. On the other hand, allocating more than
bmax bits for quantization would lead to quality saturation.
[0116] Typically, a lower bit-rate is selected (
bopt in Fig. 13) from which rate the performance increase is no longer significant according
to a suitable criterion. The selection criterion is normally designed in dependence
on the particular application and the specific requirements thereof.
[0117] For some problematic signals, where mono/side correlations is close to zero, it is
better not to use any ICP filtering at all, and instead allocate the whole bit budget
to the secondary quantizer. For the same type of signals, if the performance of the
secondary quantizer is insufficient, then the signal may be coded using pure parametric
ICP filtering.
[0118] In general, the filter coefficients are treated as vectors, which are efficiently
quantized using vector quantization (VQ). The quantization of the filter coefficients
is one of the most important aspects of the ICP coding procedure. As will be seen,
the quantization noise introduced on the filter coefficients can be directly related
to the loss in MSE.
[0119] The MMSE has previously been defined as:

[0120] Quantizing
hopt introduces a quantization error
e: ĥ = ĥopt + e. The new MSE can now be written as:

[0121] Since
Rhopt =
r, the last two terms in (16) cancel out and the MSE of the quantized filter becomes:

[0122] What this means is that in order to have any prediction gain at all the quantization
error term has to be lower than the prediction term, i.e.
rThopt > eT Re.
[0123] From Fig. 14 it can be seen that allocating less than
bmin bits for the ICP filter quantization does not reduce the side signal prediction error
energy. In fact, the energy of the prediction error is larger than that of the target
side signal, making it unreasonable to use ICP filtering at all. This of course sets
a lower limit for the usability of ICP as means for signal representation and encoding.
Therefore, a bit-allocation controller would in the preferred embodiment consider
this as a lower bound for ICP.
[0124] Direct quantization of the filter coefficients leads in general to bad results, rather
one should quantize the filters in order to minimizing the term
eT Re. An example of a desired distortion measure is given by:

[0125] This suggests the usage of a weighted vector quantization (VQ) procedure. Similar
weighted quantizers have been used in [8] for speech compression algorithms.
[0126] A clear benefit could also be gained in terms of bit-rate if one uses predictive
weighted vector quantization. In fact, prediction filters that result from the above-described
concepts are in general correlated in time.
[0127] Returning once again to Fig. 12, it can be understood that the bit allocation module
needs the main signal
m(n) and side signal
s(n) as input in order to calculate the correlations vector
r and the covariance matrix
R. Clearly,
hopt is also required for the MSE calculation of the quantized filter. From the MSE, a
corresponding quality measure can be estimated, and used as a basis for bit allocation.
If variable sized frames are used, it is generally necessary to provide information
on the frame size to the bit allocation module.
[0128] With reference to Fig. 15, which illustrates a stereo decoder according to preferred
exemplary embodiment of the invention, the decoding procedure will be explained in
more detail. A demultiplexor may be used for separating the incoming stereo reconstruction
data into mono signal reconstruction data, side signal reconstruction data, and bit
allocation information. The mono signal is decoded in a mono decoder, which generates
a reconstructed main signal estimate
m̂(n). The filter coefficients are decoded by inverse quantization to reconstruct the quantized
ICP filter
Ĥ(
z). The side signal
ŝ(
n) is reconstructed by filtering the reconstructed mono signal
m̂(
n) through the quantized ICP filter
Ĥ(
z). For improved quality, the prediction error
ês(
n) is reconstructed by inverse quantization
Q2-1 and added to the side signal estimate
ŝ(
n). Finally, the output stereo signal is obtained as:

[0129] It is important to note that the side signal quality, and thus the stereo quality,
is affected both by the accuracy of the mono reproduction and the ICP filter quantization
as well as the residual error encoding.
Variable Rate - Variable Dimension Filtering
[0130] As previously mentioned, it is also possible to select a combination of bit allocation
and filter dimension/length to be used (e.g. for inter-channel prediction) so as to
optimize a given performance measure.
[0131] It may for example be convenient to select a combination of number of bits to be
allocated to the first encoding stage and filter length to be used in the first encoding
stage so as to optimize a measure representative of the performance of the first encoding
stage or a combination of encoding stages in a multi-stage (auxiliary/side) encoder.
[0132] For example, given that a non-parametric coder accompanies a parametric coder, the
target of the ICP filtering may be to minimize the MSE of the prediction error. Increasing
the filter dimension is known to decrease the MSE. However, for some signal frames
the mono and side signals only differ in amplitude and not in time alignment. Thus,
one filter coefficient would suffice for this case.
[0133] As discussed earlier, it is possible to calculate the filter coefficients for the
different dimensions iteratively. Since the filter is completely determined by the
symmetric
R matrix and
r vector, it is also possible to calculate the MMSE of the different dimensions iteratively.
Inserting
q = L-Thopt into (8) yields:

where
di ≥ 0, ∀
i. Thus increasing the filter order decreases the MMSE. Hence, it is possible to compute
the provided gain of an additional filter dimension without having to re-calculate
rThopt for every dimension.
[0134] For some frames, the gain of using long filters is noticeable, whereas for others
the performance increase by using long filters is nearly negligible. This is explained
by the fact that maximum de-correlation between the channels can be achieved without
using a long filter. This holds especially true for frames where the amount of inter-channel
correlation is low.
[0135] Fig. 16 illustrates average quantization and prediction error as a function of the
filter dimension. The quantization error increases with dimension since the bit-rate
is fixed. In all cases, the use of long filters leads to a better performance. However,
quantization of a longer vector yields a larger quantization error if the bit-rate
is held fixed, as illustrated in Fig. 16. With increased filter length, comes the
possibility of increased performance but to reach the performance gain more bits are
needed.
[0136] The idea of the variable rate/variable dimension scheme is to utilize the varying
performance of the (ICP) filter so that accurate filter quantization is only performed
for those frames where more bits results in a noticeably better performance.
[0137] Fig. 17 illustrates the total quality achieved when quantizing different dimensions
with different number of bits. For example, the objective may be defined such that
maximum quality is achieved when selecting the combination of dimension and bit-rate
that gives the minimum MSE. Remembering that MSE of the quantized ICP filter is defined
as:

[0138] It can be seen that the performance is a trade-off between the selected filter dimension
n, and the imposed quantization error. This is illustrated in Fig. 17 where different
bit rate ranges give different performance for different dimensions.
[0139] Allocating the necessary bits for the (ICP) filter is efficiently performed based
on the
QN,max curve. This optimal performance/rate curve
QN,
max shows the optimum performance obtained by varying the filter dimension and the required
amount of bits accordingly. It is also interesting to notice that this curve exhibits
regions where the increase in bit rate (and the associated dimension) leads to a very
small improvement in the performance/quality measure
Qsnr. Typically, for these plateau regions, there is no noticeable gain achieved by increasing
the amount of bits for the quantization of the (ICP) filter.
[0140] A simpler but suboptimal approach consists in varying the total amount of bits in
proportion to the dimension, for instance to make the ratio between the total number
of bits and dimension constant. The variable-rate/variable-dimension coding then involves
selecting the dimension (or equivalently the bit-rate), which leads to the minimization
of the MSE.
[0141] In another embodiment, the dimension is held fixed and the bit-rate is varied. A
set of thresholds determine whether or not it is feasible to spend more bits on quantizing
the filter, by e.g. selecting additional stages in a MSVQ [13] scheme depicted in
Fig. 18.
[0142] Variable rate coding is well motivated by the varying characteristic of the correlation
between the main (mono) and the side signal. For low correlation cases, only a few
bits are allocated to encode a low dimensional filter while the rest of the bit budget
could be used for encoding the residual error with a non-parametric coder.
Improved parametric coding based on inter-channel prediction
[0143] As mentioned briefly, for cases where main/side correlations is close to zero, it
may be better not to use any ICP filtering at all, and instead allocate the whole
bit budget to the secondary quantizer. For the same type of signals, if the performance
of the secondary quantizer is insufficient, the signal may be coded using pure parametric
ICP filtering. In the latter case, it may be advantageous to make some modifications
to the ICP filtering procedure to provide acceptable stereo or multi-channel reconstruction.
[0144] These modifications are intended in order to operate stereo or multi-channel coding
based solely on inter-channel prediction (ICP), thus allowing low bit-rate operation.
In fact, a scheme where the side signal reconstruction is based solely on ICP filtering
will normally suffer from quality degradation when the correlation between mono and
side signal is weak. This holds especially true after quantization of the filter coefficients.
Covariance Matrix Modification
[0145] If only a parametric representation is used, then the target is no longer minimizing
the MSE alone but to combine it with smoothing and regularization in order to be able
to cope with the cases where there is no correlation between the mono and the side
signal.
[0146] Informal listening test reveal that coding artifacts introduced by ICP filtering
are perceived as more annoying than temporary reduction in stereo width. Therefore,
the stereo width, i.e. the side signal energy, is intentionally reduced whenever a
problematic frame is encountered. In the worst-case scenario, i.e. no ICP filtering
at all, the resulting stereo signal is reduced to pure mono.
[0147] It is possible to calculate the expected prediction gain from the covariance matrix
R and the correlation vector
r, without having to perform the actual filtering. It has been found that coding artifacts
are mainly present in the reconstructed side signal when the anticipated prediction
gain is low or equivalently when the correlation between the mono and the side signal
is low. Hence, a frame classification algorithm has been constructed, which performs
classification based on estimated level of prediction gain. When the prediction gain
(or the correlation) falls below a certain threshold, the covariance matrix used to
derive the ICP filter is modified according to:

[0148] The value of ρ can be made adaptive to facilitate different levels of modification.
The modified ICP filter is computed as
h* = (R*)-1r. Evidently, the energy of the ICP filter is reduced thus reducing the energy of the
reconstructed side signal. Other schemes for reducing the introduced estimation errors
are also plausible.
Filter Smoothing
[0149] Rapid changes in the ICP filter characteristics between consecutive frames create
disturbing aliasing artifacts and instability in the reconstructed stereo image. This
comes from the fact that the predictive approach introduces large spectral variations
as opposed to a fixed filtering scheme.
[0150] Similar effects are also present in BCC when spectral components of neighboring sub-bands
are modified differently [5]. To circumvent this problem, BCC uses overlapping windows
in both analysis and synthesis.
[0151] The use of overlappning windows solves the alising problem for ICP filtering as well.
However, this comes at the expense of a rather large reduction in MSE since the filter
coefficients no longer are optimal for the present frame. A modified cost function
is suggested. It is defined as:

where
ht and
ht-1 are the ICP filters at frame
t and (
t-1) respectively. Calculating the partial derivative of (23) and setting it to zero
yields the new
smoothed ICP filter:

[0152] The
smoothing factor µ determines the contribution of the previous ICP filter, thereby controlling the level
of smoothing. The proposed filter smoothing effectively removes coding artifacts and
stabilizes the stereo image. However this comes at the expense of a reduced stereo
image.
[0153] The problem of stereo image width reduction due to smoothing can be overcome by making
the smoothing factor adaptive. A large smoothing factor is used when the prediction
gain of the previous filter applied to the current frame is high. However, if the
previous filter leads to deterioration in the prediction gain, then the smoothing
factor is gradually decreased.
Frequency Band Processing
[0154] The previously suggested algorithms benefit from frequency band processing. In fact,
spatial psychoacoustics teaches that the dominant cues for sound localization in the
lower frequencies are inter-channel time differences [6], while at high frequencies
it is the inter-channel level differences. This suggests that the stereo or multi-channel
reconstruction can benefit from coding different regions of the spectrum using different
methods and different bit-rates. For example, hybrid parametric and non-parametric
coding with adaptively controlled bit allocation could be performed in the low-frequency
range, whereas some other coding scheme(s) could be used in higher frequency regions.
Variable-Length Optimized Frame Processing
[0155] For variable frame lengths, an encoding frame can generally be divided into a number
of sub-frames according to various frame division configurations. The sub-frames may
have different sizes, but the sum of the lengths of the sub-frames of any given frame
division configuration is normally equal to the length of the overall encoding frame.
As described in our co-pending
US Patent Application No. 11/011765, which is incorporated herein as an example by this reference, and the corresponding
International Application
PCT/SE2004/001867, a number of encoding schemes is provided, where each encoding scheme is characterized
by or associated with a respective set of sub-frames together constituting an overall
encoding frame (also referred to as a master frame). A particular encoding scheme
is selected, preferably at least to a part dependent on the signal content of the
signal to be encoded; and then the signal is encoded in each of the sub-frames of
the selected set of sub-frames separately.
[0156] In general, encoding is typically performed in one frame at a time, and each frame
normally comprises audio samples within a pre-defined time period. The division of
the samples into frames will in any case introduce some discontinuities at the frame
borders. Shifting sounds will give shifting encoding parameters, changing basically
at each frame border. This will give rise to perceptible errors. One way to compensate
somewhat for this is to base the encoding, not only on the samples that are to be
encoded, but also on samples in the absolute vicinity of the frame. In such a way,
there will be a softer transfer between the different frames. As an alternative, or
complement, interpolation techniques are sometimes also utilised for reducing perception
artefacts caused by frame borders. However, all such procedures require large additional
computational resources, and for certain specific encoding techniques, it might also
be difficult to provide in with any resources.
[0157] In this view, it is beneficial to utilise as long frames as possible, since the number
of frame borders will be small. Also the coding efficiency typically becomes high
and the necessary transmission bit-rate will typically be minimised. However, long
frames give problems with pre-echo artefacts and ghost-like sounds.
[0158] By instead utilising shorter frames, anyone skilled in the art realises that the
coding efficiency may be decreased, the transmission bit-rate may have to be higher
and the problems with frame border artefacts will increase. However, shorter frames
suffer less from e.g. other perception artefacts, such as ghost-like sounds and pre-echoing.
In order to be able to minimise the coding error as much as possible, one should use
an as short frame length as possible.
[0159] Thus, there seems to be conflicting requirements on the length of the frames. Therefore,
it is beneficial for the audio perception to use a frame length that is dependent
on the present signal content of the signal to be encoded. Since the influence of
different frame lengths on the audio perception will differ depending on the nature
of the sound to be encoded, an improvement can be obtained by letting the nature of
the signal itself affect the frame length that is used. In particular, this procedure
has turned out to be advantageous for side signal encoding.
[0160] Due to small temporal variations, it may e.g. in some cases be beneficial to encode
the side signal with use of relatively long frames. This may be the case with recordings
with a great amount of diffuse sound field such as concert recordings. In other cases,
such as stereo speech conversation, short frames are preferable.
[0161] For example, the lengths of the sub-frames used could be selected according to:

where
lsf are the lengths of the sub-frames,
lƒ is the length of the overall encoding frame and
n is an integer. However, it should be understood that this is merely an example. Any
frame lengths will be possible to use as long as the total length of the set of sub-frames
is kept constant.
[0162] The decision on which frame length to use can typically be performed in two basic
ways: closed loop decision or open loop decision.
[0163] When a closed loop decision is used, the input signal is typically encoded by all
available encoding schemes. Preferably, all possible combinations of frame lengths
are tested and the encoding scheme with an associated set of sub-frames that gives
the best objective quality, e.g. signal-to-noise ratio or a weighted signal-to-noise
ratio, is selected.
[0164] Alternatively, the frame length decision is an open loop decision, based on the statistics
of the signal. In other words, the spectral characteristics of the (side) signal will
be used as a base for deciding which encoding scheme that is going to be used. As
before, different encoding schemes characterised by different sets of sub-frames are
available. However, in this embodiment, the input (side) signal is first analyzed
and then a suitable encoding scheme is selected and utilized.
[0165] The advantage with an open loop decision is that only one actual encoding has to
be performed. The disadvantage is, however, that the analysis of the signal characteristics
may be very complicated indeed and it may be difficult to predict possible behaviours
in advance. A lot of statistical analysis of sound has to be performed. Any small
change in the encoding schemes may turn upside down on the statistical behaviour.
[0166] By using closed loop selection, encoding schemes may be exchanged without making
any changes in the rest of the implementation. On the other hand, if many encoding
schemes are to be investigated, the computational requirements will be high.
[0167] The benefit with such a variable frame length coding for the input (side) signal
is that one can select between a fine temporal resolution and coarse frequency resolution
on one side and coarse temporal resolution and fine frequency resolution on the other.
The above embodiments will preserve the multi-channel or stereo image in the best
possible manner.
[0168] There are also some requirements on the actual encoding utilised in the different
encoding schemes. In particular when the closed loop selection is used, the computational
resources to perform a number of more or less simultaneous encoding have to be large.
The more complicated the encoding process is, the more computational power is needed.
Furthermore, a low bit rate at transmission is also to prefer.
[0169] The Variable Length Optimized Frame Processing according to an exemplary embodiment
of the invention takes as input a large "master-frame" and given a certain number
of frame division configurations, selects the best frame division configuration with
respect to a given distortion measure, e.g. MSE or weighted MSE.
[0170] Frame divisions may have different sizes but the sum of all frames divisions cover
the whole length of the master-frame.
[0171] In order to illustrate an exemplary procedure, consider a master-frame of length
L ms and the possible frame divisions illustrated in Fig. 19, and exemplary frame
configurations are illustrated in Fig. 20.
[0172] In a particular exemplary embodiment of the invention, the idea is to select a combination
of encoding scheme with associated frame division configuration, as well filter length/dimension
for each sub-frame, so as to optimize a measure representative of the performance
of the considered encoding process or signal encoding stage(s) thereof over an entire
encoding frame (master-frame). The possibility to adjust the filter length for each
sub-frame provides an added degree of freedom, and generally results in improved performance.
[0173] However, to reduce the signalling requirements during transmission from the encoding
side to the decoding side, each sub-frame of a certain length is preferably associated
with a predefined filter length. Usually long filters are assigned to long frames
and short filters to short frames.
[0174] Possible frame configurations are listed in the following table:
| 0, 0, 0, 0 |
| 0, 0, 1, 1 |
| 1, 1, 0, 0 |
| 0, 1, 1, 0 |
| 1, 1, 1, 1 |
| 2, 2, 2, 2 |
in the form (
m1, m2, m3,
m4) where
mk denotes the frame type selected for the
kth (sub)frame of length
L/4 ms inside the master-frame such that for example
mk = 0 for L/4 frame with filter length P,
mk = 1 for L/2 -ms frame with filter length 2xP,
mk = 2 for L -ms super-frame with filter length 4xP.
[0175] For example, the configuration (0, 0, 1, 1) indicates that the
L-ms master-frame is divided into two
L/
4-ms (sub)frames with filter length P, followed by an
L/
2-ms (sub)frame with filter length
2xP. Similarly, the configuration (2, 2, 2, 2) indicates that the
L-ms frame is used with filter length
4xP. This means that frame division configuration as well as filter length information
are simultaneously indicated by the information (
m1,
m2,
m3,
m4).
[0176] The optimal configuration is selected, for example, based on the MSE or equivalently
maximum SNR. For instance, if the configuration (0,0,1,1) is used, then the total
number of filters is 3:2 filters of length
P and 1 of length
2xP.
[0177] The frame configuration, with its corresponding filters and their respective lengths,
that leads to the best performance (measured by SNR or MSE) is usually selected.
[0178] The filters computation, prior to frame selection, may be either open-loop or closed-loop
by including the filters quantization stages.
[0179] The advantage of using this scheme is that with this procedure, the dynamics of the
stereo or multi-channel image are well represented. The transmitted parameters are
the frame configuration as well as the encoded filters.
[0180] Because of the variable frame length processing that is involved, the analysis windows
overlap in the encoder can be of different lengths. In the decoder, it is therefore
essential for the synthesis of the channel signals to window accordingly and to overlap-add
different signal lengths.
[0181] It is often the case that for stationary signals the stereo image is quite stable
and the estimated channel filters are quite stationary. In this case, one would benefit
from an FIR filter with longer impulse response, i.e. better modeling of the stereo
image.
[0182] It has turned out to be particularly beneficial to add yet another degree of freedom
by also incorporating the previously described bit allocation procedure into the variable
frame length and adjustable filter length processing. In a preferred exemplary embodiment
of the invention, the idea is to select a combination of frame division configuration,
as well as bit allocation and filter length/dimension for each sub-frame, so as to
optimize a measure representative of the performance of the considered encoding process
or signal encoding stage(s) over an entire encoding frame. The considered signal representation
is then encoded separately for each of the sub-frames of the selected frame division
configuration in accordance with the selected bit allocation and filter dimension.
[0183] Preferably, the considered signal is a side signal and the encoder is a multi-stage
encoder comprising a parametric (ICP) stage and an auxiliary stage such as a non-parametric
stage. The bit allocation information controls how many quantization bits that should
go to the parametric stage and to the auxiliary stage, and the filter length information
preferably relates to the length of the parametric (ICP) filter.
[0184] The signal encoding process here preferably generates output data, for transfer to
the decoding side, representative of the selected frame division configuration, and
for each sub-frame of the selected frame division configuration, bit allocation and
filter length.
[0185] With a higher degree of freedom, it is possible to find a truly optimal selection.
However, the amount of control information to be transferred to the decoding side
increases. In order to reduce the bit-rate requirements on signaling from the encoding
side to the decoding side in an audio transmission system, the filter length, for
each sub frame, is preferably selected in dependence on the length of the sub-frame,
as described above. This means that an indication of frame division configuration
of an encoding frame or master frame into a set of sub-frames at the same time provides
an indication of selected filter dimension for each sub-frame, thereby reducing the
required signaling.
[0186] The embodiments described above are merely given as examples, and it should be understood
that the present invention is not limited thereto. Further modifications, changes
and improvements which retain the basic underlying principles disclosed and claimed
herein are within the scope of the invention.
REFERENCES
[0187]
- [1] U.S. Patent No. 5,285,498 by Johnston.
- [2] European Patent No. 0,497,413 by Veldhuis et al.
- [3] C. Faller et al., "Binaural cue coding applied to stereo and multi-channel audio compression",
112th AES convection, May 2002, Munich, Germany.
- [4] U.S. Patent No. 5,434,948 by Holt et al.
- [5] C. Faller and F. Baumgarte, "Binaural cue coding - Part I: Psychoacoustic fundamentals
and design principles", IEEE Trans. Speech Audio Processing, vol. 11, pp. 509-519,
Nov. 2003.
- [6] J. Robert Stuart, "The psychoacoustics of multichannel audio", Meridian Audio Ltd,
June 1998
- [7] S-S. Kuo, J. D. Johnston, "A study why cross channel prediction is not applicable
to perceptual audio coding", IEEE Signal Processing Lett., vol. 8, pp. 245-247.
- [8] Y. Linde, A. Buzo and R. M. Gray, "An algorithm for vector quantizer design", IEEE
Trans. on Commun., vol. COM-28, pp.84-95, Jan. 1980.
- [9] B. Edler, C. Faller and G. Schuller, "Perceptual audio coding using a time-varying
linear pre- and post-filter", in AES Convention, Los Angeles, CA, Sept. 2000.
- [10] Bernd Edler and Gerald Schuller, "Audio coding using a psychoacoustical pre- and post-filter",
ICASSP-2000 Conference Record, 2000.
- [11] Dieter Bauer and Dieter Seitzer, "Statistical properties of high-quality stereo signals
in the time domain", IEEE International Conf. on Acoustics, Speech, and Signal Processing,
vol. 3, pp. 2045-2048, May 1989.
- [12] Gene H. Golub and Charles F. van Loan, "Matrix Computations", second edition, chapter
4, pages 137-138, The John Hopkins University Press, 1989.
- [13] B-H. Juag and A. H. Gray Jr, "Multiple stage vector quantization for speech coding",
In International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp.
597-600, Paris, April 1982.
1. A method of encoding a multi-channel audio signal comprising the steps of:
- encoding (S1) a first signal representation of at least one of said multiple channels
in a first signal encoding process (130);
- encoding (S3) a second signal representation of at least one of said multiple channels
in a second signal encoding process (140), said second signal encoding process being
a multi-stage encoding process,
characterized in that said multi-stage signal encoding process (140) is a hybrid parametric and non-parametric
encoding process involving a parametric encoding stage (142) and a non-parametric
encoding stage (144), and said method further comprises the step (S2) of adaptively
allocating a number of encoding bits among said parametric encoding stage (142) and
said non-parametric encoding stage (144) in dependence on inter-channel correlation
characteristics of the multi-channel audio signal by considering estimated performance
of at least one of said encoding stages (142, 144), and allocating more bits to the
other stage (144, 142) of the multi-stage encoding process, if the estimated performance
of said at least one of said encoding stages (142, 144) is saturating.
2. The encoding method of claim 1, wherein said step (S2) of adaptively allocating a
number of bits among the different encoding stages is performed on a frame-by-frame
basis.
3. The encoding method of claim 1, wherein said step (S2) of adaptively allocating a
number of encoding bits among the different encoding stages is performed based on
estimated performance of at least one of the encoding stages by allocating more bits
to the non-parametric encoding stage if the performance of the parametric encoding
stage is saturating.
4. The encoding method of claim 1, wherein said step (S2) of adaptively allocating a
number of encoding bits among the different encoding stages comprises the steps of:
- assessing estimated performance of said parametric encoding stage as a function
of the number of bits assumed to be allocated to said parametric encoding stage; and
- allocating said first amount of encoding bits to said parametric encoding stage
based on said assessment.
5. The encoding method of claim 1 or 4, wherein said multi-stage signal encoding process
includes adaptive inter-channel prediction in said parametric encoding stage (142)
for prediction of said second signal based on the first signal representation and
the second signal representation, and said performance is estimated at least partly
based on a signal prediction error.
6. The encoding method of claim 5, wherein said performance is estimated also based on
estimation of a quantization error as a function of the number of bits allocated for
quantization of second-signal reconstruction data generated by said inter-channel
prediction.
7. The encoding method of claim 6, wherein said multi-stage signal encoding process further
comprises an encoding process in said non-parametric encoding stage (144) for encoding
a representation of the signal prediction error from said parametric encoding stage
(142).
8. The encoding method of claim 1, wherein said parametric encoding stage (142) has an
inter-channel prediction (ICP) filter and an associated first quantizer for quantization
of the ICP filter, and said non-parametric encoding stage (144) has a second quantize
for quantization of the residual prediction error of the ICP filter.
9. The encoding method of claim 1, wherein said number of encoding bits is determined
by a bit budget for said multi-stage signal encoding process, and output data representative
of the bit allocation is also generated.
10. The encoding method of claim 1, comprising the step of selecting a combination bit
allocation and filter length for encoding to minimize the Mean Squared Error (MSE)
of a prediction error of said parametric encoding stage (142).
11. The encoding method of claim 4, further comprising the step of selecting a combination
of number of bits to be allocated to said parametric encoding stage (142) and filter
length to be used in said parametric encoding stage to minimize the Mean Squared Error
(MSE) of a prediction error of said parametric encoding stage (142).
12. The encoding method of claim 10 or 11, wherein output data representative of the selected
bit allocation and filter length is generated.
13. The encoding method of claim 1, further comprising the step of selecting combination
of:
frame division configuration of an encoding frame into a set of sub-frames,
bit allocation and filter length for encoding for each sub-frame,
to minimize the Mean Squared Error (MSE) of a prediction error of said parametric
encoding stage (142) over an entire encoding frame; and
encoding said second signal representation in each of the sub-frames of the selected
set of sub-frames separately in accordance with the selected combination.
14. The encoding method of claim 4, further comprising the step of selecting combination
of:
frame division configuration of an encoding frame into a set of sub-frames,
number of bits to be allocated to said first encoding stage for each sub-frame,
filter length to be used in said first encoding stage for each sub-frame,
to minimize the Mean Squared Error (MSE) of a prediction error of said parametric
encoding stage (142) over an entire encoding frame; and
encoding said second signal representation in each of the sub-frames of the selected
set of sub-frames separately in accordance with the selected combination.
15. The encoding method of claim 13 or 14, wherein output data representative of the selected
frame division configuration, and for each sub-frame of the selected frame division
configuration, bit allocation and filter length is generated.
16. The encoding method of claim 15, wherein the filter length, for each sub frame, is
selected in dependence on the length of the sub-frame so that an indication of frame
division configuration of an encoding frame into a set of sub-frames at the same time
provides an indication of selected filter dimension for each sub-frame to thereby
reduce the required signaling.
17. A method of decoding an encoded multi-channel audio signal comprising the steps of:
- decoding (S11), in response to first signal reconstruction data, an encoded first
signal representation of at least one of said multiple channels in a first signal
decoding process (230);
- decoding (S14), in response to second signal reconstruction data, an encoded second
signal representation of at least one of said multiple channels in a second, multi-stage,
signal decoding process (240),
characterized by:
- receiving (S12) bit allocation information representative of how a number of bits
have been allocated among a parametric encoding stage and a non-parametric encoding
stage in a corresponding second, multi-stage, hybrid parametric and non-parametric
signal encoding process; and
- determining (S13), based on said bit allocation information, how to interpret said
second signal reconstruction data in said multi-stage signal decoding process (240).
18. An apparatus for encoding a multi-channel audio signal comprising:
- a first encoder (130) for encoding a first signal representation of at least one
of said multiple channels;
- a second, multi-stage, encoder (140) for encoding a second signal representation
of at least one of said multiple channels,
characterized in that said second multi-stage encoder (140) is a hybrid parametric and non-parametric encoder
involving a parametric encoding stage (142) and a non-parametric encoding stage (144),
and said apparatus further comprises means (150) for adaptively controlling allocation
of a number of encoding bits among said parametric encoding stage (142) and said non-parametric
encoding stage (144) of the second multi-stage encoder (140) in dependence on inter-channel
correlation characteristics of the multi-channel audio signal and based on estimated
performance of at least one of said encoding stages (142, 144), and allocating more
bits to the other stage (144, 142) of the multi-stage encoding process, if the estimated
performance of said at least one of said encoding stages (142, 144) is saturating.
19. The apparatus of claim 18, wherein said controlling means (150) is operable for adaptively
controlling allocation of bits among the different encoding stages on a frame-by-frame
basis.
20. The apparatus of claim 18, wherein said controlling means (150) is operable for adaptively
controlling allocation of a number of encoding bits among the different encoding stages
based on estimated performance of at least one of the encoding stages by allocating
more bits to said non-parametric encoding stage (144) if the performance of said parametric
encoding stage (142) is saturating.
21. The apparatus of claim 18, wherein said controlling means comprises:
- means for assessing estimated performance of said parametric encoding stage (142)
of said second multi-stage encoder (140) as a function of the number of bits assumed
to be allocated to said parametric encoding stage (142); and
- means for allocating said first amount of encoding bits to said parametric encoding
stage (142) based on said assessment.
22. The apparatus of claim 18 or 21, wherein said parametric encoding stage (142) includes
an adaptive inter-channel prediction filter for second-signal prediction based on
the first signal representation and the second signal representation, and said controlling
means (150) comprises means for assessing estimated performance of at least said parametric
encoding stage (142) at least partly based on a signal prediction error.
23. The apparatus of claim 22, wherein said assessing means is operable for assessing
estimated performance of at least said parametric encoding stage (142) based on assessment
of an estimated quantization error as a function of the number of bits allocated for
quantization of said inter-channel prediction filter.
24. The apparatus of claim 22, wherein said non-parametric encoding stage (144) is operable
for encoding a representation of the signal prediction error from said parametric
encoding stage (142).
25. The apparatus of claim 18, wherein said parametric encoding stage (142) has an inter-channel
prediction (ICP) filter and an associated first quantizer for quantization of the
ICP filter, and said non-parametric encoding stage (144) has a second quantizer for
quantization of the residual prediction error of the ICP filter.
26. The apparatus of claim 18, wherein said number of encoding bits are determined by
a bit budget for said second encoder (140), and said second encoder (140) is operable
for generating output data representative of the bit allocation.
27. The apparatus of claim 18, comprising means (150) for selecting a combination bit
allocation and filter length for encoding to minimize the Mean Squared Error (MSE)
of a prediction error of said parametric encoding stage (142).
28. The apparatus of claim 21, comprising means (150) for selecting a combination of number
of bits to be allocated to said parametric encoding stage (142) and filter length
to be used in said parametric encoding stage (142) to minimize the Mean Squared Error
(MSE) of a prediction error of said parametric encoding stage (142).
29. The apparatus of claim 27 or 28, wherein said second encoder (140; 150) is operable
for generating output data representative of the selected bit allocation and filter
length.
30. The apparatus of claim 18, further comprising:
means for selecting combination of frame division configuration of an encoding frame
into a set of sub-frames, and bit allocation and filter length for encoding for each
sub-frame, to minimize the Mean Squared Error (MSE) of a prediction error of said
parametric encoding stage over an entire encoding frame; and
means for encoding said second signal representation in each of the sub-frames of
the selected set of sub-frames separately in accordance with the selected combination.
31. The apparatus of claim 21, further comprising:
- means (150) for selecting combination of i) frame division configuration of an encoding
frame into a set of sub-frames, ii) number of bits to be allocated to said first encoding
stage for each sub-frame, and iii) filter length to be used in said parametric encoding
stage (142) for each sub-frame, to minimize the Mean Squared Error (MSE) of a prediction
error of said parametric encoding stage (142) over an entire encoding frame; and
- means (140) for encoding said second signal representation in each of the sub-frames
of the selected set of sub-frames separately in accordance with the selected combination.
32. The apparatus of claim 30 or 31, wherein said second encoder (140; 150) is operable
for generating output data representative of the selected frame division configuration,
and for each sub-frame of the selected frame division configuration, bit allocation
and filter length.
33. The apparatus of claim 32, wherein said second encoder (140; 150) is operable for
selecting the filter length, for each sub frame, in dependence on the length of the
sub-frame so that an indication of frame division configuration of an encoding frame
into a set of sub-frames at the same time provides an indication of selected filter
dimension for each sub-frame to thereby reduce the required signaling.
34. An apparatus for decoding an encoded multi-channel audio signal comprising:
- a first decoder (230) for decoding, in response to first signal reconstruction data,
an encoded first signal representation of at least one of said multiple channels;
- a second, multi-stage, decoder (240) for decoding, in response to second signal
reconstruction data, an encoded second signal representation of at least one of said
multiple channels,
characterized by:
- means (210; 250) for receiving bit allocation information representative of how
a number of bits have been allocated among a parametric encoding stage and a non-parametric
encoding stage in a corresponding second, multi-stage, hybrid parametric and non-parametric
encoder; and
- means (250) for interpreting, based on said bit allocation information, said second
signal reconstruction data in said second multi-stage decoder (240; 250) for the purpose
of decoding the second signal representation.
35. An audio transmission system, characterized in that said system comprises an encoding apparatus of claim 18 and a decoding apparatus
of claim 34.
1. Verfahren zum Codieren eines Mehrkanal-Audiosignals, folgende Schritte umfassend:
- Codieren (S1) einer ersten Signalrepräsentation von mindestens einem der mehrfachen
Kanäle in einem ersten Signalcodierprozess (130);
- Codieren (S3) einer zweiten Signalrepräsentation von mindestens einem der mehrfachen
Kanäle in einem zweiten Signalcodierprozess (140), wobei der zweite Signalcodierprozess
ein Mehrstufen-Codierprozess ist,
dadurch gekennzeichnet, dass der Mehrstufen-Signalcodierprozess (140) ein hybrider parametrischer und nichtparametrischer
Codierprozess ist, der eine parametrische Codierstufe (142) und eine nichtparametrische
Codierstufe (144) involviert, und dieses Verfahren außerdem den Schritt (S2) des adaptiven
Zuweisens einer Anzahl von Codierbits in der parametrischen Codierstufe (142) und
der nichtparametrischen Codierstufe (144) umfasst in Abhängigkeit von Interkanal-Korrelationscharakteristiken
des Mehrkanal-Audiosignals durch Berücksichtigung der geschätzten Performance von
mindestens einer der Codierstufen (142, 144), und das Zuweisen von mehr Bits an die
andere Stufe (144, 142) des Mehrstufen-Codierprozesses, falls die geschätzte Performance
der mindestens einen der Codierstufen (142, 144) Sättigung aufweist.
2. Codierverfahren nach Anspruch 1, worin der Schritt (S2) des adaptiven Zuweisens einer
Anzahl von Bits in den verschiedenen Codierstufen auf einer Rahmen-zu-Rahmen-Basis
ausgeführt wird.
3. Codierverfahren nach Anspruch 1, worin der Schritt (S2) des adaptiven Zuweisens einer
Anzahl von Codierbits in den verschiedenen Codierstufen auf der Basis der geschätzten
Performance von mindestens einer der Codierstufen ausgeführt wird, indem mehr Bits
an die nichtparametrische Codierstufe zugewiesen werden, falls die Performance der
parametrischen Codierstufe Sättigung aufweist.
4. Codierverfahren nach Anspruch 1, worin der Schritt (S2) des adaptiven Zuweisens einer
Anzahl von Codierbits in den verschiedenen Codierstufen folgende Schritte umfasst:
- Bewerten der geschätzten Performance der parametrischen Codierstufe als einer Funktion
der Anzahl von Bits, von denen angenommen wird, dass sie der parametrischen Codierstufe
zugewiesen sind; und
- Zuweisen der ersten Anzahl von Codierbits an die parametrische Codierstufe auf der
Basis der Bewertung.
5. Codierverfahren nach Anspruch 1 oder 4, worin der Mehrstufen-Signalcodierprozess adaptive
Interkanal-Prädiktion in der parametrischen Codierstufe (142) zur Prädiktion des zweiten
Signals auf der Basis der ersten Signalrepräsentation und der zweiten Signalrepräsentation
enthält und die Performance mindestens teilweise auf der Basis eines Signalprädiktionsfehlers
geschätzt wird.
6. Codierverfahren nach Anspruch 5, worin die Performance auch auf der Basis der Schätzung
eines Quantisierungsfehlers als einer Funktion der Anzahl von Bits geschätzt wird,
die zur Quantisierung von durch die Interkanal-Prädiktion erzeugten Zweitsignal-Rekonstruktionsdaten
zugewiesen sind.
7. Codierverfahren nach Anspruch 6, worin der Mehrstufen-Signalcodierprozess außerdem
einen Codierprozess in der nichtparametrischen Codierstufe (144) zum Codieren einer
Repräsentation des Signalprädiktionsfehlers aus der parametrischen Codierstufe (142)
umfasst.
8. Codierverfahren nach Anspruch 1, worin die parametrische Codierstufe (142) ein ICP(Interkanalprädiktion)-Filter
und einen assoziierten ersten Quantisierer zum Quantisieren des ICP-Filters hat und
die nichtparametrische Codierstufe (144) einen zweiten Quantisierer zum Quantisieren
des restlichen Prädiktionsfehlers des ICP-Filters hat.
9. Codierverfahren nach Anspruch 1, worin die Anzahl von Codierbits durch ein Bit-Budget
für den Mehrstufen-Signalcodierprozess bestimmt wird und auch Ausgangsdaten generiert
werden, die für die Bitzuweisung repräsentativ sind.
10. Codierverfahren nach Anspruch 1, den Schritt des Auswählens einer Kombination von
Bitzuweisung und Filterlänge zum Codieren umfassend, um den mittleren Quadratfehler
(MSE) eines Prädiktionsfehlers der parametrischen Codierstufe (142) zu minimieren.
11. Codierverfahren nach Anspruch 4, außerdem den Schritt des Auswählens einer Kombination
der Anzahl von an die parametrische Codierstufe (142) zuzuweisenden Bits und der in
der parametrischen Codierstufe zu verwendenden Filterlänge umfassend, um den mittleren
Quadratfehler (MSE) eines Prädiktionsfehlers der parametrischen Codierstufe (142)
zu minimieren.
12. Codierverfahren nach Anspruch 10 oder 11, worin Ausgangsdaten generiert werden, die
für die ausgewählte Bitzuweisung und Filterlänge repräsentativ sind.
13. Codierverfahren nach Anspruch 1, außerdem den Schritt umfassend, die Kombination von
Folgenden auszuwählen:
Rahmenaufteilungskonfiguration eines Codierrahmens in eine Menge von Subrahmen,
Bitzuweisung und Filterlänge zum Codieren für jeden Subrahmen,
um den mittleren Quadratfehler (MSE) eines Prädiktionsfehlers der parametrischen Codierstufe
(142) über einen ganzen Codierrahmen zu minimieren; und
gemäß der ausgewählten Kombination getrenntes Codieren der zweiten Signalrepräsentation
in jedem der Subrahmen der ausgewählten Menge von Subrahmen.
14. Codierverfahren nach Anspruch 4, außerdem den Schritt umfassend, die Kombination von
Folgenden auszuwählen:
Rahmenaufteilungskonfiguration eines Codierrahmens in eine Menge von Subrahmen,
Anzahl von Bits, die der ersten Codierstufe für jeden Subrahmen zuzuteilen sind,
Filterlänge, die in der ersten Codierstufe für jeden Subrahmen zu verwenden ist,
um den mittleren Quadratfehler (MSE) eines Prädiktionsfehlers der parametrischen Codierstufe
(142) über einen ganzen Codierrahmen zu minimieren; und
gemäß der ausgewählten Kombination getrenntes Codieren der zweiten Signalrepräsentation
in jedem der Subrahmen der ausgewählten Menge von Subrahmen.
15. Codierverfahren nach Anspruch 13 oder 14, worin Ausgangsdaten, die für die ausgewählte
Rahmenaufteilungskonfiguration repräsentativ sind, und für jeden Subrahmen der ausgewählten
Rahmenaufteilungskonfiguration Bitaufteilung und Filterlänge generiert werden.
16. Codierverfahren nach Anspruch 15, worin die Filterlänge für jeden Subrahmen in Abhängigkeit
von der Länge des Subrahmens ausgewählt wird, sodass eine Anzeige der Rahmenaufteilungskonfiguration
eines Codierrahmens in eine Menge von Subrahmen gleichzeitig eine Anzeige der ausgewählten
Filterdimension für jeden Subrahmen bereitstellt, um dadurch die erforderliche Signalisierung zu reduzieren.
17. Verfahren zum Decodieren eines codierten Mehrfachkanal-Audiosignals, folgende Schritte
umfassend:
- Decodieren (S11), als Antwort auf erste Signalrekonstruktionsdaten, einer codierten
ersten Signalrepräsentation von mindestens einem der Mehrfachkanäle in einem ersten
Signaldecodierprozess (230),
- Decodieren (S14), als Antwort auf zweite Signalrekonstruktionsdaten, einer codierten
zweiten Signalrepräsentation von mindestens einem der Mehrfachkanäle in einem zweiten
Mehrstufen-Signaldecodierprozess (240),
gekennzeichnet durch:
- Empfangen (S12) von Bitzuweisungsinformation, die dafür repräsentativ ist, wie eine
Anzahl von Bits in einer parametrischen Codierstufe und einer nichtparametrischen
Codierstufe in einem entsprechenden zweiten mehrstufigen, hybriden parametrischen
und nichtparametrischen Signalcodierprozess zugewiesen wurden; und
- Bestimmen (S13), auf der Basis der Bitzuweisungsinformation, wie die zweiten Signalrekonstruktionsdaten
im Mehrstufen-Signaldecodierprozess (240) zu interpretieren sind.
18. Vorrichtung zum Codieren eines Mehrfachkanal-Audiosignals, Folgendes umfassend:
- einen ersten Codierer (130) zum Codieren einer ersten Signalrepräsentation von mindestens
einem der Mehrfachkanäle;
- einen zweiten Mehrstufencodierer (140) zum Codieren einer zweiten Signalrepräsentation
von mindestens einem der Mehrfachkanäle,
dadurch gekennzeichnet, dass der zweite Mehrstufencodierer (140) ein hybrider parametrischer und nichtparametrischer
Codierer ist, der eine parametrische Codierstufe (142) und eine nichtparametrische
Codierstufe (144) involviert, und die Vorrichtung außerdem Mittel (150) umfasst, um
die Zuweisung einer Anzahl von Codierbits in der parametrischen Codierstufe (142)
und der nichtparametrischen Codierstufe (144) des zweiten Mehrstufencodierers (140)
in Abhängigkeit von Interkanal-Korrelationscharakteristiken des Mehrfachkanal-Audiosignals
und auf der Basis von geschätzter Performance von mindestens einer der Codierstufen
(142, 144) adaptiv zu steuern, und der anderen Stufe (144, 142) des Mehrstufen-Codierprozesses
mehr Bits zuzuweisen, falls die geschätzte Performance der mindestens einen der Codierstufen
(142, 144) Sättigung aufweist.
19. Vorrichtung nach Anspruch 18, worin das Steuermittel (150) betriebfähig ist, die Zuweisung
von Bits in den verschiedenen Codierstufen auf einer Rahmen-zu-Rahmen-Basis adaptiv
zu steuern.
20. Vorrichtung nach Anspruch 18, worin das Steuermittel (150) betriebfähig ist, die Zuweisung
einer Anzahl von Codierbits in den verschiedenen Codierstufen auf der Basis der geschätzten
Performance von mindestens einer der Codierstufen adaptiv zu steuern, indem der nichtparametrischen
Codierstufe (144) mehr Bits zugewiesen werden, falls die Performance der parametrischen
Codierstufe (142) Sättigung aufweist.
21. Vorrichtung nach Anspruch 18, worin das Steuermittel umfasst:
- Mittel zum Bewerten der geschätzten Performance der parametrischen Codierstufe (142)
des zweiten Mehrstufencodierers (140) als einer Funktion der Anzahl von Bits, von
denen angenommen, wird, dass sie der parametrischen Codierstufe (142) zugewiesen sind;
und
- Mittel zum Zuweisen der ersten Anzahl von Codierbits an die parametrische Codierstufe
(142) auf der Basis der Bewertung.
22. Vorrichtung nach Anspruch 18 oder 21, worin die parametrische Codierstufe (142) ein
adaptives Interkanal-Prädiktionsfilter zur Zweitsignalprädiktion auf der Basis der
ersten Signalrepräsentation und der zweiten Signalrepräsentation enthält und das Steuermittel
(150) Mittel umfasst zum Bewerten der geschätzten Performance von mindestens der parametrischen
Codierstufe (142) mindestens zum Teil auf der Basis eines Signalprädiktionsfehlers.
23. Vorrichtung nach Anspruch 22, worin das Bewertungsmittel betriebsfähig ist zum Bewerten
der geschätzten Performance von mindestens der parametrischen Codierstufe (142) auf
der Basis der Bewertung eines geschätzten Quantisierungsfehlers als einer Funktion
der Anzahl von Bits, die zur Quantisierung des Interkanal-Prädiktionsfilters zugewiesen
sind.
24. Vorrichtung nach Anspruch 22, worin die nichtparametrische Codierstufe (144) betriebsfähig
ist zum Codieren einer Repräsentation des Signalprädiktionsfehlers aus der parametrischen
Codierstufe (142).
25. Vorrichtung nach Anspruch 18, worin die parametrische Codierstufe (142) ein ICP(Interkanalprädiktion)-Filter
und einen assoziierten ersten Quantisierer zum Quantisieren des ICP-Filters hat und
die nichtparametrische Codierstufe (144) einen zweiten Quantisierer zum Quantisieren
des restlichen Prädiktionsfehlers des ICP-Filters hat.
26. Vorrichtung nach Anspruch 18, worin die Anzahl von Codierbits durch ein Bit-Budget
für den zweiten Codierer (140) bestimmt werden und der zweite Codierer (140) betriebsfähig
ist zum Generieren von Ausgangsdaten, die für die Bitzuweisung repräsentativ sind.
27. Vorrichtung nach Anspruch 18, Mittel (150) umfassend zum Auswählen einer Kombination
von Bitzuweisung und Filterlänge zum Codieren, um den mittleren Quadratfehler (MSE)
eines Prädiktionsfehlers der parametrischen Codierstufe (142) zu minimieren.
28. Vorrichtung nach Anspruch 21, Mittel (150) umfassend zum Auswählen einer Kombination
der Anzahl von an die parametrische Codierstufe (142) zuzuweisenden Bits und der in
der parametrischen Codierstufe (142) zu verwendenden Filterlänge, um den mittleren
Quadratfehler (MSE) eines Prädiktionsfehlers der parametrischen Codierstufe (142)
zu minimieren.
29. Vorrichtung nach Anspruch 27 oder 28, worin der zweite Codierer (140; 150) betriebsfähig
ist zum Generieren von Ausgangsdaten, die für die ausgewählte Bitzuweisung und Filterlänge
repräsentativ sind.
30. Vorrichtung nach Anspruch 18, außerdem umfassend:
Mittel zum Auswählen der Kombination von Rahmenaufteilungskonfiguration eines Codierrahmens
in eine Menge von Subrahmen, und Bitzuweisung und Filterlänge zum Codieren für jeden
Subrahmen, um den mittleren Quadratfehler (MSE) eines Prädiktionsfehlers der parametrischen
Codierstufe (142) über einen ganzen Codierrahmen zu minimieren; und
Mittel zum gemäß der ausgewählten Kombination getrennten Codieren der zweiten Signalrepräsentation
in jedem der Subrahmen der ausgewählten Menge von Subrahmen.
31. Vorrichtung nach Anspruch 21, außerdem umfassend:
- Mittel (150) zum Auswählen der Kombination von i) Rahmenaufteilungskonfiguration
eines Codierrahmens in eine Menge von Subrahmen, ii) Anzahl von Bits, die der ersten
Codierstufe für jeden Subrahmen zuzuteilen sind, und iii) Filterlänge, die in der
parametrischen Codierstufe (142) für jeden Subrahmen zu verwenden ist, um den mittleren
Quadratfehler (MSE) eines Prädiktionsfehlers der parametrischen Codierstufe (142)
über einen ganzen Codierrahmen zu minimieren; und
- Mittel (140) zum gemäß der ausgewählten Kombination getrennten Codieren der zweiten
Signalrepräsentation in jedem der Subrahmen der ausgewählten Menge von Subrahmen.
32. Vorrichtung nach Anspruch 30 oder 31, worin der zweite Codierer (140; 150) betriebsfähig
ist zum Generieren von Ausgangsdaten, die für die ausgewählte Rahmenaufteilungskonfiguration
repräsentativ sind, und für jeden Subrahmen der ausgewählten Rahmenaufteilungskonfiguration,
Bitzuweisung und Filterlänge.
33. Vorrichtung nach Anspruch 32, worin der zweite Codierer (140; 150) betriebsfähig ist
zum Auswählen der Filterlänge für jeden Subrahmen in Abhängigkeit von der Länge des
Subrahmens, sodass eine Anzeige von Rahmenaufteilungskonfiguration eines Codierrahmens
in eine Menge von Subrahmen gleichzeitig eine Anzeige der ausgewählten Filterdimension
für jeden Subrahmen bereitstellt, um dadurch die erforderliche Signalisierung zu reduzieren.
34. Vorrichtung zum Decodieren eines codierten Mehrkanal-Audiosignals, Folgendes umfassend:
- einen ersten Decodierer (230) zum Decodieren, als Antwort auf erste Signalrekonstruktionsdaten,
einer codierten ersten Signalrepräsentation von mindestens einem der mehrfachen Kanäle;
- einen zweiten Mehrstufen-Decodierer (240) zum Decodieren, als Antwort auf zweite
Signalrekonstruktionsdaten, einer codierten zweiten Signalrepräsentation von mindestens
einem der mehrfachen Kanäle,
gekennzeichnet durch:
- Mittel (210; 250) zum Empfangen von Bitzuweisungsinformation, die dafür repräsentativ
ist, wie eine Anzahl von Bits in einer parametrische Codierstufe und einer nichtparametrischen
Codierstufe in einem entsprechenden zweiten mehrstufigen, hybriden parametrischen
und nichtparametrischen Codierer zugewiesen wurden; und
- Mittel (250) um auf der Basis der Bitzuweisungsinformation die zweiten Signalrekontruktionsdaten
im zweiten Mehrstufendecodierer (240; 250) zum Zweck des Decodierens der zweiten Signalrepräsentation
zu interpretieren.
35. Audioübertragungssystem, dadurch gekennzeichnet, dass das System eine Codiervorrichtung von Anspruch 18 und eine Decodiervorrichtung von
Anspruch 34 umfasst.
1. Procédé destiné à coder un signal audio à canaux multiples comportant les étapes ci-dessous
consistant à :
- coder (S1) une première représentation de signal d'au moins l'un desdits canaux
multiples dans un premier processus de codage de signal (130) ;
- coder (S3) une seconde représentation de signal d'au moins l'un desdits canaux multiples
dans un second processus de codage de signal (140), ledit second processus de codage
de signal étant un processus de codage à étages multiples ;
caractérisé en ce que ledit processus de codage de signal à étages multiples (140) est un processus de
codage non paramétrique et paramétrique hybride impliquant un étage de codage paramétrique
(142) et un étage de codage non paramétrique (144), et
en ce que ledit procédé comporte en outre l'étape (S2) consistant à affecter de manière adaptative
un nombre de bits de codage parmi ledit étage de codage paramétrique (142) et ledit
étage de codage non paramétrique (144) selon des caractéristiques de corrélation inter-canaux
du signal audio à canaux multiples, en tenant compte de la performance estimée d'au
moins l'un desdits étages de codage (142, 144), et à affecter plus de bits à l'autre
étage (144, 142) du processus de codage à étages multiples, au cas où la performance
estimée dudit au moins un desdits étages de codage (142, 144) sature.
2. Procédé de codage selon la revendication 1, dans lequel ladite étape (S2) consistant
à affecter de manière adaptative un nombre de bits parmi les différents étages de
codage est mise en oeuvre sur une base « trame par trame ».
3. Procédé de codage selon la revendication 1, dans lequel ladite étape (S2) consistant
à affecter de manière adaptative un nombre de bits de codage parmi les différents
étages de codage est mise en oeuvre sur la base de la performance estimée d'au moins
l'un des étages de codage, en affectant plus de bits à l'étage de codage non paramétrique
au cas où la performance de l'étage de codage paramétrique sature.
4. Procédé de codage selon la revendication 1, dans lequel ladite étape (S2) consistant
à affecter de manière adaptative un nombre de bits de codage parmi les différents
étages de codage, comporte les étapes ci-dessous consistant à :
- évaluer la performance estimée dudit étage de codage paramétrique en fonction du
nombre de bits supposé destiné à être affecté audit étage de codage paramétrique ;
et
- affecter ladite première quantité de bits de codage audit étage de codage paramétrique
sur la base de ladite évaluation.
5. Procédé de codage selon la revendication 1 ou 4, dans lequel ledit processus de codage
de signal à étages multiples inclut une prédiction inter-canaux adaptative dans ledit
étage de codage paramétrique (142) en vue de la prédiction dudit second signal sur
la base de la première représentation de signal et de la seconde représentation de
signal, et ladite performance est estimée au moins en partie sur la base d'une erreur
de prédiction de signaux.
6. Procédé de codage selon la revendication 5, dans lequel ladite performance est estimée
en outre sur la base d'une estimation d'une erreur de quantification en fonction du
nombre de bits affecté pour la quantification de secondes données de reconstruction
de signal générées par ladite prédiction inter-canaux.
7. Procédé de codage selon la revendication 6, dans lequel ledit processus de codage
de signal à étages multiples comporte en outre un processus de codage dans ledit étage
de codage non paramétrique (144) en vue de coder une représentation de l'erreur de
prédiction de signaux provenant dudit étage de codage paramétrique (142).
8. Procédé de codage selon la revendication 1, dans lequel ledit étage de codage paramétrique
(142) présente un filtre de prédiction inter-canaux (ICP) et un premier quantificateur
associé en vue de la quantification du filtre de prédiction ICP, et ledit étage de
codage non paramétrique (144) présente un second quantificateur en vue de la quantification
de l'erreur de prédiction résiduelle du filtre de prédiction ICP.
9. Procédé de codage selon la revendication 1, dans lequel ledit nombre de bits de codage
est déterminé par un budget de bits pour ledit processus de codage de signal à étages
multiples, et des données de génération en sortie représentatives de l'affectation
de bits sont également générées.
10. Procédé de codage selon la revendication 1, comportant l'étape consistant à sélectionner
une combinaison d'affectation de bits et de longueur de filtre pour le codage, en
vue de minimiser l'erreur quadratique moyenne (MSE) d'une erreur de prédiction dudit
étage de codage paramétrique (142).
11. Procédé de codage selon la revendication 4, comportant en outre l'étape consistant
à sélectionner une combinaison de nombre de bits à affecter audit étage de codage
paramétrique (142) et de longueur de filtre à utiliser dans ledit étage de codage
paramétrique, en vue de minimiser l'erreur quadratique moyenne (MSE) d'une erreur
de prédiction dudit étage de codage paramétrique (142).
12. Procédé de codage selon la revendication 10 ou 11, dans lequel des données de génération
en sortie représentatives de l'affectation de bits et de la longueur de filtre sélectionnées
sont générées.
13. Procédé de codage selon la revendication 1, comportant en outre l'étape consistant
à sélectionner une combinaison des éléments ci-dessous :
une configuration de répartition de trames d'une trame de codage en un ensemble de
sous-trames ;
une affectation de bits et une longueur de filtre pour le codage de chaque sous-trame
;
en vue de minimiser l'erreur quadratique moyenne (MSE) d'une erreur de prédiction
dudit étage de codage paramétrique (142) sur une trame de codage complète ; et
un codage de ladite seconde représentation de signal dans chacune des sous-trames
de l'ensemble de sous-trames sélectionné, séparément selon la combinaison sélectionnée.
14. Procédé de codage selon la revendication 4, comportant en outre l'étape consistant
à sélectionner une combinaison des éléments ci-dessous :
une configuration de répartition de trames d'une trame de codage en un ensemble de
sous-trames ;
un nombre de bits destinés à être affecté audit premier étage de codage pour chaque
sous-trame ;
une longueur de filtre à utiliser dans ledit premier étage de codage pour chaque sous-trame
;
en vue de minimiser l'erreur quadratique moyenne (MSE) d'une erreur de prédiction
dudit étage de codage paramétrique (142) sur une trame de codage complète ; et
le codage de ladite seconde représentation de signal dans chacune des sous-trames
de l'ensemble de sous-trames sélectionné, séparément selon la combinaison sélectionnée.
15. Procédé de codage selon la revendication 13 ou 14, dans lequel des données de génération
en sortie, représentatives de la configuration de répartition de trames sélectionnée,
et pour chaque sous-trame, de la configuration de répartition de trames sélectionnée,
de l'affectation de bits et de la longueur de filtre, sont générées.
16. Procédé de codage selon la revendication 15, dans lequel la longueur de filtre, pour
chaque sous-trame, est sélectionnée selon la longueur de la sous-trame, de sorte qu'une
indication de configuration de répartition de trames d'une trame de codage en un ensemble
de sous-trames fournit simultanément une indication de dimension de filtre sélectionnée
pour chaque sous-trame, en vue de réduire par conséquent la signalisation requise.
17. Procédé destiné à décoder un signal audio codé à canaux multiples, comportant les
étapes ci-dessous consistant à :
- décoder (S11), en réponse à des premières données de reconstruction de signal, une
première représentation de signal codé d'au moins l'un desdits canaux multiples dans
un premier processus de décodage de signaux (230) ;
- décoder (S14), en réponse à des secondes données de reconstruction de signal, une
seconde représentation de signal codé d'au moins l'un desdits canaux multiples dans
un second processus de décodage de signaux à étages multiples (240) ;
caractérisé par les étapes ci-dessous consistant à :
- recevoir (S12) des informations d'affectation de bits représentant la façon dont
un nombre de bits a été affecté parmi un étage de codage paramétrique et un étage
de codage non paramétrique dans un second processus de codage de signal paramétrique
et non paramétrique hybride à étages multiples correspondant ; et
- déterminer (S13), sur la base desdites informations d'affectation de bits, la manière
d'interpréter lesdites secondes données de reconstruction de signal dans ledit processus
de décodage de signaux à étages multiples (240).
18. Dispositif destiné à coder un signal audio à canaux multiples, comportant :
- un premier codeur (130) pour coder une première représentation de signal d'au moins
l'un desdits canaux multiples ;
- un second codeur à étages multiples (140) pour coder une seconde représentation
de signal d'au moins l'un desdits canaux multiples ;
caractérisé en ce que ledit second codeur à étages multiples (140) est un codeur non paramétrique et paramétrique
hybride impliquant un étage de codage paramétrique (142) et un étage de codage non
paramétrique (144), et en ce que ledit dispositif comporte en outre un moyen (150) pour commander de manière adaptative
l'affectation d'un nombre de bits de codage parmi ledit étage de codage paramétrique
(142) et ledit étage de codage non paramétrique (144) du second codeur à étages multiples
(140), en fonction de caractéristiques de corrélation inter-canaux du signal audio
à canaux multiples et sur la base de la performance estimée d'au moins l'un desdits
étages de codage (142,144), et pour affecter plus de bits à l'autre étage (144, 142)
du processus de codage à étages multiples, au cas où la performance estimée dudit
au moins un desdits étages de codage (142, 144) sature.
19. Dispositif selon la revendication 18, dans lequel ledit moyen de commande (150) est
exploitable en vue de commander de manière adaptative l'affectation de bits parmi
les différents étages de codage sur une base « trame par trame ».
20. Dispositif selon la revendication 18, dans lequel ledit moyen de commande (150) est
exploitable en vue de commander de manière adaptative l'affectation d'un nombre de
bits de codage parmi les différents étages de codage sur la base de la performance
estimée d'au moins l'un des étages de codage, en affectant plus de bits audit étage
de codage non paramétrique (144) au cas où la performance dudit étage de codage paramétrique
(142) sature.
21. Dispositif selon la revendication 18, dans lequel ledit moyen de commande comporte
:
- un moyen pour évaluer la performance estimée dudit étage de codage paramétrique
(142) dudit second codeur à étages multiples (140), en fonction du nombre de bits
supposé destiné à être affecté audit étage de codage paramétrique (142) ; et
- un moyen pour affecter ladite première quantité de bits de codage audit étage de
codage paramétrique (142) sur la base de ladite évaluation.
22. Dispositif selon la revendication 18 ou 21, dans lequel ledit étage de codage paramétrique
(142) inclut un filtre de prédiction inter-canaux adaptative pour mettre en oeuvre
une prédiction de second signal sur la base de la première représentation de signal
et de la seconde représentation de signal, et ledit moyen de commande (150) comporte
un moyen pour évaluer la performance estimée d'au moins ledit étage de codage paramétrique
(142), au moins en partie sur la base d'une erreur de prédiction de signaux.
23. Dispositif selon la revendication 22, dans lequel ledit moyen d'évaluation est exploitable
de manière à évaluer la performance estimée d'au moins ledit étage de codage paramétrique
(142) sur la base d'une évaluation d'une erreur de quantification estimée en fonction
du nombre de bits affecté pour la quantification dudit filtre de prédiction inter-canaux.
24. Dispositif selon la revendication 22, dans lequel ledit étage de codage non paramétrique
(144) est exploitable de manière à coder une représentation de l'erreur de prédiction
de signaux provenant dudit étage de codage paramétrique (142).
25. Dispositif selon la revendication 18, dans lequel ledit étage de codage paramétrique
(142) présente un filtre de prédiction inter-canaux (ICP) et un premier quantificateur
associé, en vue de la quantification du filtre de prédiction ICP, et ledit étage de
codage non paramétrique (144) présente un second quantificateur en vue de la quantification
de l'erreur de prédiction résiduelle du filtre de prédiction ICP.
26. Dispositif selon la revendication 18, dans lequel ledit nombre de bits de codage est
déterminé par un budget de bits pour ledit second codeur (140), et ledit second codeur
(140) est exploitable de manière à générer des données de génération en sortie représentatives
de l'affectation de bits.
27. Dispositif selon la revendication 18, comportant un moyen (150) pour sélectionner
une combinaison d'affectation de bits et de longueur de filtre pour le codage, en
vue de minimiser l'erreur quadratique moyenne (MSE) d'une erreur de prédiction dudit
étage de codage paramétrique (142).
28. Dispositif selon la revendication 21, comportant un moyen (150) pour sélectionner
une combinaison de nombre de bits à affecter audit étage de codage paramétrique (142)
et de longueur de filtre à utiliser dans ledit étage de codage paramétrique (142),
en vue de minimiser l'erreur quadratique moyenne (MSE) d'une erreur de prédiction
dudit étage de codage paramétrique (142).
29. Dispositif selon la revendication 27 ou 28, dans lequel ledit second codeur (140 ;
150) est exploitable de manière à générer des données de génération en sortie représentatives
de l'affectation de bits et de la longueur de filtre sélectionnées.
30. Dispositif selon la revendication 18, comportant en outre :
un moyen pour sélectionner une combinaison comportant une configuration de répartition
de trames d'une trame de codage en un ensemble de sous-trames, une affectation de
bits et une longueur de filtre pour le codage de chaque sous-trame, en vue de minimiser
l'erreur quadratique moyenne (MSE) d'une erreur de prédiction dudit étage de codage
paramétrique sur une trame de codage complète ; et
un moyen pour coder ladite seconde représentation de signal dans chacune des sous-trames
de l'ensemble de sous-trames sélectionné, séparément selon la combinaison sélectionnée.
31. Dispositif selon la revendication 21, comportant en outre :
- un moyen (150) pour sélectionner une configuration des éléments ci-après : i) une
configuration de répartition de trames d'une trame de codage en un ensemble de sous-trames
; ii) un nombre de bits destinés à être affectés audit premier étage de codage pour
chaque sous-trame ; et iii) une longueur de filtre à utiliser dans ledit étage de
codage paramétrique (142) pour chaque sous-trame, en vue de minimiser l'erreur quadratique
moyenne (MSE) d'une erreur de prédiction dudit étage de codage paramétrique (142)
sur une trame de codage complète ; et
- un moyen (140) pour coder ladite seconde représentation de signal dans chacune des
sous-trames de l'ensemble de sous-trames sélectionné, séparément selon la combinaison
sélectionnée.
32. Dispositif selon la revendication 30 ou 31, dans lequel ledit second codeur (140 ;
150) est exploitable de manière à générer des données de génération en sortie, représentatives
de la configuration de répartition de trames sélectionnée, et, pour chaque sous-trame
de la configuration de répartition de trames sélectionnée, une affectation de bits
et une longueur de filtre.
33. Dispositif selon la revendication 32, dans lequel ledit second codeur (140 ; 150)
est exploitable de manière à sélectionner la longueur de filtre, pour chaque sous-trame,
en fonction de la longueur de la sous-trame, de sorte qu'une indication de configuration
de répartition de trames d'une trame de codage en un ensemble de sous-trames fournit
simultanément une indication de dimension de filtre sélectionnée pour chaque sous-trame,
en vue de réduire par conséquent la signalisation requise.
34. Dispositif destiné à décoder un signal audio codé à canaux multiples, comportant :
- un premier décodeur (230) pour décoder, en réponse à des premières données de reconstruction
de signal, une première représentation de signal codé d'au moins l'un desdits canaux
multiples ;
- un second décodeur à étages multiples (240) pour décoder, en réponse à des secondes
données de reconstruction de signal, une seconde représentation de signal codé d'au
moins l'un desdits canaux multiples ;
caractérisé par :
- un moyen (210 ; 250) pour recevoir des informations d'affectation de bits représentant
la façon dont un nombre de bits a été affecté, parmi un étage de codage paramétrique
et un étage de codage non paramétrique, dans un second codeur paramétrique et non
paramétrique hybride à étages multiples correspondant ; et
- un moyen (250) pour interpréter, sur la base desdites informations d'affectation
de bits, lesdites secondes données de reconstruction de signal dans ledit second décodeur
à étages multiples (240 ; 250), en vue de décoder la seconde représentation de signal.
35. Système de transmission audio, caractérisé en ce que ledit système comporte un dispositif de codage selon la revendication 18, et un dispositif
de décodage selon la revendication 34.