TECHNICAL FIELD
[0001] The present invention relates in general to encoding of audio signals, and in particular
to encoding of multi-channel audio signals.
BACKGROUND
[0002] There is a high market need to transmit and store audio signals at low bit rate while
maintaining high audio quality. Particularly, in cases where transmission resources
or storage is limited low bit rate operation is an essential cost factor. This is
typically the case, e.g. in streaming and messaging applications in mobile communication
systems such as GSM, UMTS, or CDMA.
[0003] Today, there are no standardised codecs available providing high stereophonic audio
quality at bit rates that are economically interesting for use in mobile communication
systems. What is possible with available codecs is monophonic transmission of the
audio signals. To some extent also stereophonic transmission is available. However,
bit rate limitations usually require limiting the stereo representation quite drastically.
[0004] The simplest way of stereophonic or multi-channel coding of audio signals is to encode
the signals of the different channels separately as individual and independent signals.
Another basic way used in stereo FM radio transmission and which ensures compatibility
with legacy mono radio receivers is to transmit a sum and a difference signal of the
two involved channels.
[0005] State-of-the-art audio codecs, such as MPEG-1/2 Layer III and MPEG-2/4 AAC make use
of so-called joint stereo coding. According to this technique, the signals of the
different channels are processed jointly, rather than separately and individually.
The two most commonly used joint stereo coding techniques are known as "Mid/Side"
(M/S) stereo coding and intensity stereo coding, which usually are applied on sub-bands
of the stereo or multi-channel signals to be encoded.
[0006] M/S stereo coding is similar to the described procedure in stereo FM radio, in a
sense that it encodes and transmits the sum and difference signals of the channel
sub-bands and thereby exploits redundancy between the channel sub-bands. The structure
and operation of an encoder based on M/S stereo coding is described, e.g. in
US patent 5,285,498 by J.D. Johnston.
[0007] Intensity stereo on the other hand is able to make use of stereo irrelevancy. It
transmits the joint intensity of the channels (of the different sub-bands) along with
some location information indicating how the intensity is distributed among the channels.
Intensity stereo does only provide spectral magnitude information of the channels.
Phase information is not conveyed. For this reason and since the temporal inter-channel
information (more specifically the inter-channel time difference) is of major psycho-acoustical
relevancy particularly at lower frequencies, intensity stereo can only be used at
high frequencies above e.g. 2 kHz. An intensity stereo coding method is described,
e.g. in the
European patent 0497413 by R. Veldhuis et al.
[0008] A recently developed stereo coding method is described, e.g. in a conference paper
with the title "
Binaural cue coding applied to stereo and multi-channel audio compression", 112th
AES convention, May 2002, Munich, Germany by C. Faller et al. This method is a parametric multi-channel audio coding method. The basic principle
is that at the encoding side, the input signals from N channels c
1, c
2, ... c
N are combined to one mono signal m. The mono signal is audio encoded using any conventional
monophonic audio codec. In parallel, parameters are derived from the channel signals,
which describe the multi-channel image. The parameters are encoded and transmitted
to the decoder, along with the audio bit stream. The decoder first decodes the mono
signal m' and then regenerates the channel signals c
1', c
2',..., c
N', based on the parametric description of the multi-channel image.
[0009] The principle of the Binaural Cue Coding (BCC) method is that it transmits the encoded
mono signal and so-called BCC parameters. The BCC parameters comprise coded inter-channel
level differences and inter-channel time differences for sub-bands of the original
multi-channel input signal. The decoder regenerates the different channel signals
by applying sub-band-wise level and phase adjustments of the mono signal based on
the BCC parameters. The advantage over e.g. M/S or intensity stereo is that stereo
information comprising temporal inter-channel information is transmitted at much lower
bit rates. However, this technique requires computational demanding time-frequency
transforms on each of the channels, both at the encoder and the decoder.
[0010] Moreover, BCC does not handle the fact that a lot of the stereo information, especially
at low frequencies, is diffuse, i.e. it does not come from any specific direction.
Diffuse sound fields exist in both channels of a stereo recording but they are to
a great extent out of phase with respect to each other. If an algorithm such as BCC
is subject to recordings with a great amount of diffuse sound fields the reproduced
stereo image will become confused, jumping from left to right as the BCC algorithm
can only pan the signal in specific frequency bands to the left or right.
[0011] A possible means to encode the stereo signal and ensure good reproduction of diffuse
sound fields is to use an encoding scheme very similar to the technique used in FM
stereo radio broadcast, namely to encode the mono (Left+Right) and the difference
(Left-Right) signals separately.
[0012] A technique, described in
US patent 5,434,948 by C.E. Holt et al. uses a similar technique as in BCC for encoding the mono signal and side information.
In this case, side information consists of predictor filters and optionally a residual
signal. The predictor filters, estimated by a least-mean-square algorithm, when applied
to the mono signal allow the prediction of the multi-channel audio signals. With this
technique one is able to reach very low bit rate encoding of multi-channel audio sources,
however, at the expense of a quality drop, discussed further below.
[0013] Finally, for completeness, a technique is to be mentioned that is used in 3D audio.
This technique synthesises the right and left channel signals by filtering sound source
signals with so-called head-related filters. However, this technique requires the
different sound source signals to be separated and can thus not generally be applied
for stereo or multi-channel coding.
SUMMARY
[0014] A problem with existing encoding schemes based on encoding of frames of signals,
in particular a main signal and one or more side signals, is the existence of the
pre-echo effect. In Fig. 7a-b, diagrams are illustrating such an artefact. Assume
a signal component having the time development as shown by curve 100. In the beginning,
starting from t0, the signal component is not present in the audio sample. At a time
t between t1 and t2, the signal component suddenly appears. When the signal component
is encoded, using a frame length of t2-t1, the occurrence of the signal component
will be "smeared out" over the entire frame, as indicated in curve 101. If a decoding
takes place of the curve 101, the signal component appears a time Δt before the intended
appearance of the signal component, and a "pre-echo" is perceived.
[0015] An object of the present invention is therefore to provide an encoding method and
device improving the perception quality of multi-channel audio signals, in particular
to avoid artefacts such as pre-echoing. A further object of the present invention
is to provide an encoding method and device requiring less processing power and having
more constant transmission bit rate requirements.
[0016] The above objects are achieved by methods and devices according to the enclosed patent
claims. In general words, in a first aspect, a method of encoding multi-channel audio
signals comprises generating of a first output signal, being encoding parameters representing
a main signal. The main signal is a first linear combination of signals of at least
a first and a second channel. The method further comprises generating of a second
output signal, being encoding parameters representing a side signal. The side signal
is a second linear combination of signals of at least the first and the second channel
within an encoding frame. The method is characterised in that the generating of the
second output signal further comprises scaling of the side signal to an energy contour
of the main signal.
[0017] In a second aspect, a method of decoding multi-channel audio signals comprises generating
of a decoded main signal from encoding parameters representing a main signal. The
main signal is a first linear combination of signals of at least a first and a second
channel. The method further comprises generating of a decoded side signal from encoding
parameters representing a side signal. The side signal is a second linear combination
of signals of at least a first and a second channel, within an encoding frame
and is scaled to an energy contour of the main signal. The method further comprises combining of at least the decoded main signal and the
decoded side signal into signals of at least the first and the second channel. The
method is characterised in that the generating of a decoded side signal further comprises
inverse scaling of the decoded side signal to an energy contour of the decoded main signal.
[0018] In a third aspect, an encoder apparatus comprises input means for multi-channel audio
signals comprising at least a first and a second channel. The encoder apparatus comprises
means for generating a first output signal, being encoding parameters representing
a main signal. The main signal is a first linear combination of signals of at least
the first and the second channel. The encoder apparatus further comprises means for
generating a second output signal, being encoding parameters representing a side signal.
The side signal is a second linear combination of signals of at least the first and
the second channel, within an encoding frame. The encoder apparatus further comprises
output means. The encoder apparatus is characterised in that the means for generating
a second output signal further comprises means for scaling the side signal to an energy
contour of the main signal.
[0019] In a fourth aspect, a decoder apparatus comprises input means for encoding parameters
representing a main signal and encoding parameters representing a side signal. The
main signal is a first linear combination of a first and a second channel. The side
signal is a second linear combination of a first and a second channel
and scaled to an energy contour of the main signal. The decoder apparatus further comprises means for generating a decoded main signal
from the encoding parameters representing the main signal and means for generating
a decoded side signal from the encoding parameters representing the side signal within
an encoding frame. The decoder apparatus further comprises means for combining at
least the decoded main signal and the decoded side signal into signals of at least
a first and a second channel, and output means. The decoder apparatus is characterised
in that the means for generating a decoded side signal in turn comprises means for
inverse scaling the decoded side signal to an energy contour of the decoded main signal.
[0020] In a fifth aspect, an audio system comprises at least one of an encoder apparatus
according to the third aspect and a decoder apparatus according to the fourth aspect.
[0021] The main advantage with the present invention is that the preservation of the perception
of the audio signals is improved. Furthermore, the present invention still allows
multi-channel signal transmission at very low bit rates.
BRIEF DESCRIPTION OF THE DRAWINGS
[0022] The invention, together with further objects and advantages thereof, may best be
understood by making reference to the following description taken together with the
accompanying drawings, in which:
FIG. 1 is a block scheme of a system for transmitting polyphonic signals;
FIG. 2a is a block diagram of an encoder in a transmitter;
FIG. 2b is a block diagram of a decoder in a receiver;
FIG. 3a is a diagram illustrating encoding frames of different lengths;
FIGS. 3b and 3c are block diagrams of embodiments of side signal encoder units;
FIG. 4 is a block diagram of an embodiment of an encoder using balance factor encoding
of side signal;
FIG. 5 is a block diagram of an embodiment of an encoder for multi-signal systems;
FIG. 6 is a block diagram of an embodiment of a decoder suitable for decoding signals
from the device of Fig. 5;
FIG. 7a and b are diagrams illustrating a pre-echo artefact;
FIG. 8 is a block diagram of an embodiment of a side signal encoder unit, employing
different encoding principles in different sub-frames;
FIG. 9 illustrates the use of different encoding principles in different frequency
sub-bands;
FIG. 10 is a flow diagram of the basic steps of an embodiment of an encoding method;
and
FIG. 11 is a flow diagram of the basic steps of an embodiment of a decoding method.
DETAILED DESCRIPTION
[0023] Fig. 1 illustrates a typical system 1, in which the present invention advantageously
can be utilised. A transmitter 10 comprises an antenna 12 including associated hardware
and software to be able to transmit radio signals 5 to a receiver 20. The transmitter
10 comprises among other parts a multi-channel encoder 14, which transforms signals
of a number of input channels 16 into output signals suitable for radio transmission.
Examples of suitable multi-channel encoders 14 are described in detail further below.
The signals of the input channels 16 can be provided from e.g. an audio signal storage
18, such as a data file of digital representation of audio recordings, magnetic tape
or vinyl disc recordings of audio etc. The signals of the input channels 16 can also
be provided in "live", e.g. from a set of microphones 19. The audio signals are digitised,
if not already in digital form, before entering the multi-channel encoder 14.
[0024] At the receiver 20 side, an antenna 22 with associated hardware and software handles
the actual reception of radio signals 5 representing polyphonic audio signals. Here,
typical functionalities, such as e.g. error correction, are performed. A decoder 24
decodes the received radio signals 5 and transforms the audio data carried thereby
into signals of a number of output channels 26. The output signals can be provided
to e.g. loudspeakers 29 for immediate presentation, or can be stored in an audio signal
storage 28 of any kind.
[0025] The system 1 can for instance be a phone conference system, a system for supplying
audio services or other audio applications. In some systems, such as e.g. the phone
conference system, the communication has to be of a duplex type, while e.g. distribution
of music from a service provider to a subscriber can be essentially of a one-way type.
The transmission of signals from the transmitter 10 to the receiver 20 can also be
performed by any other means, e.g. by different kinds of electromagnetic waves, cables
or fibres as well as combinations thereof.
[0026] Fig. 2a illustrates an embodiment of an encoder. In this embodiment, the polyphonic
signal is a stereo signal comprising two channels a and b, received at input 16A and
16B, respectively. The signals of channel a and b are provided to a pre-processing
unit 32, where different signal conditioning procedures may be performed. The (perhaps
modified) signals from the output of the pre-processing unit 32 are summed in an addition
unit 34. This addition unit 34 also divides the sum by a factor of two. The signal
x
mono produced in this way is a main signal of the stereo signals, since it basically comprises
all data from both channels. In this embodiment the main signal thus represents a
pure "mono" signal. The main signal x
mono is provided to a main signal encoder unit 38, which encodes the main signal according
to any suitable encoding principles. Such principles are available within prior-art
and are thus not further discussed here. The main signal encoder unit 38 gives an
output signal p
mono, being encoding parameters representing a main signal.
[0027] In a subtraction unit 36, a difference (divided by a factor of two) of the channel
signals is provided as a side signal x
side. In this embodiment, the side signal represents the difference between the two channels
in the stereo signal. The side signal x
side is provided to a side signal encoding unit 30. Preferred embodiments of the side
signal encoding unit 30 will be discussed further below. According to a side signal
encoding procedure, which will be described more in detail further below, the side
signal x
side is transferred into encoding parameters p
side representing a side signal x
side. In certain embodiments, this encoding takes place utilising also information of
the main signal x
mono. The arrow 42 indicates such a provision, where the original uncoded main signal
x
mono is utilised. In further other embodiments, the main signal information that is used
in the side signal encoding unit 30 can be deduced from the encoding parameters p
mono representing the main signal, as indicated by the broken line 44.
[0028] The encoding parameters p
mono representing the main signal x
mono is a first output signal, and the encoding parameters p
side representing the side signal x
side is a second output signal. In a typical case, these two output signals p
mono, p
side, together representing the full stereo sound, are multiplexed into one transmission
signal 52 in a multiplexor unit 40. However, in other embodiments, the transmission
of the first and second output signals p
mono, p
side may take place separately.
[0029] In Fig. 2b, an embodiment of a decoder 24 is illustrated as a block scheme. The received
signal 54, comprising encoding parameters representing the main and side signal information
are provided to a demultiplexor unit 56, which separates a first and second input
signal, respectively. The first input signal, corresponding to encoding parameters
p
mono of a main signal, is provided to a main signal decoder unit 64. In a conventional
manner, the encoding parameters p
mono representing the main signal are used to generate an decoded main signal x"
mono, being as similar to the main signal x
mono (Fig. 2a) of the encoder 14 (Fig. 2a) as possible.
[0030] Similarly, the second input signal, corresponding to a side signal, is provided to
a side signal decoder unit 60. Here, the encoding parameters p
side representing the side signal are used to recover a decoded side signal x"
side. In some embodiments, the decoding procedure utilises information about the main
signal x"
mono, as indicated by an arrow.
[0031] The decoded main and side signals x"
mono, x"
side are provided to an addition unit 70, which provides an output signal that is a representation
of the original signal of channel a. Similarly, a difference provided by a subtraction
unit 68 provides an output signal that is a representation of the original signal
of channel b. These channel signals may be post-processed in a post-processor unit
74 according to prior-art signal processing procedures. Finally, the channel signals
a and b are provided at the outputs 26A and 26B of the decoder.
[0032] As mentioned in the summary, encoding is typically performed in one frame at a time.
A frame comprises audio samples within a pre-defined time period. In the bottom part
of Fig. 3a, a frame SF2 of time duration L is illustrated. The audio samples within
the unhatched portion are to be encoded together. The preceding samples and the subsequent
samples are encoded in other frames. The division of the samples into frames will
in any case introduce some discontinuities at the frame borders. Shifting sounds will
give shifting encoding parameters, changing basically at each frame border. This will
give rise to perceptible errors. One way to compensate somewhat for this is to base
the encoding, not only on the samples that are to be encoded, but also on samples
in the absolute vicinity of the frame, as indicated by the hatched portions. In such
a way, there will be a softer transfer between the different frames. As an alternative,
or complement, interpolation techniques are sometimes also utilised for reducing perception
artefacts caused by frame borders. However, all such procedures require large additional
computational resources, and for certain specific encoding techniques, it might also
be difficult to provide it with any resources.
[0033] In this view, it is beneficial to utilise as long frames as possible, since the number
of frame borders will be small. Also the coding efficiency typically becomes high
and the necessary transmission bit-rate will typically be minimised. However, long
frames give problems with pre-echo artefacts and ghost-like sounds.
[0034] By instead utilising shorter frames, such as SF1 or even SF0, having the durations
of L/2 and L/4, respectively, anyone skilled in the art realises that the coding efficiency
may be decreased, the transmission bit-rate may have to be higher and the problems
with frame border artefacts will increase. However, shorter frames suffer less from
e.g. other perception artefacts, such as ghost-like sounds and pre-echoing. In order
to be able to minimise the coding error as much as possible, one should use an as
short frame length as possible.
[0035] The audio perception will be improved by using a frame length for encoding of the
side signal that is dependent on the present signal content. Since the influence of
different frame lengths on the audio perception will differ depending on the nature
of the sound to be encoded, an improvement can be obtained by letting the nature of
the signal itself affect the frame length that is used. The encoding of the main signal
is not the object of the present invention and is therefore not described in detail.
However, the frame lengths used for the main signal may or may not be equal to the
frame lengths used for the side signal.
[0036] Due to small temporal variations, it may e.g. in some cases be beneficial to encode
the side signal with use of relatively long frames. This may be the case with recordings
with a great amount of diffuse sound field such as concert recordings. In other cases,
such as stereo speech conversation, short frames are probably to prefer. The decision
which frame length is to prefer can be performed in two basic ways.
[0037] One embodiment of a side signal encoder unit 30 is illustrated in Fig. 3b, in which
a closed loop decision is utilised. A basic encoding frame of length L is used here.
A number of encoding schemes 81, characterised by a separate set 80 of sub-frames,
are created. Each set 80 of sub-frames comprises one or more sub-frames of equal or
differing lengths. The total length of the set 80 of sub-frames is, however, always
equal to the basic encoding frame length L. With references to Fig. 3b, the top encoding
scheme is characterised by a set of sub-frames comprising only one sub-frame of length
L. The next set of sub-frames comprises two frames of length L/2. The third set comprises
two frames of length L/4 followed by a L/2 frame.
[0038] The signal x
side provided to the side signal encoder unit 30 is encoded by all encoding schemes 81.
In the top encoding scheme, the entire basic encoding frame is encoded in one piece.
However, in the other encoding schemes, the signal x
side is encoded in each sub-frame separately from each other. The result from each encoding
scheme is provided to a selector 85. A fidelity measurement means 83 determines a
fidelity measure for each of the encoded signals. The fidelity measure is an objective
quality value, preferably a signal-to-noise measure or a weighted signal-to-noise
ratio. The fidelity measures associated with each encoding scheme are compared and
the result controls a switching means 87 to select the encoding parameters representing
the side signal from the encoding scheme giving the best fidelity measure as the output
signal p
side from the side signal encoder unit 30.
[0039] Preferably, all possible combinations of frame lengths are tested and the set of
sub-frames that gives the best objective quality, e.g. signal-to-noise ratio is selected.
[0040] In the present embodiment, the lengths of the sub-frames used are selected according
to:

where
lsf are the lengths of the sub-frames,
lf is the length of the encoding frame and
n is an integer. In the present embodiment, n is selected between 0 and 3. However,
any frame lengths will be possible to use as long as the total length of the set is
kept constant.
[0041] In Fig. 3c, another embodiment of a side signal encoder unit 30 is illustrated. Here,
the frame length decision is an open loop decision, based on the statistics of the
signal. In other words, the spectral characteristics of the side signal will be used
as a base for deciding which encoding scheme that is going to be used. As before,
different encoding schemes characterised by different sets of sub-frames are available.
However, in this embodiment, the selector 85 is placed before the actual encoding.
The input side signal x
side enters the selector 85 and a signal analysing unit 84. The result of the analysis
becomes the input of a switch 86, in which only one of the encoding schemes 81 are
utilised. The output from that encoding scheme will also be the output signal p
side from the side signal encoder unit 30.
[0042] The advantage with an open loop decision is that only one actual encoding has to
be performed. The disadvantage is, however, that the analysis of the signal characteristics
may be very complicated indeed and it may be difficult to predict possible behaviours
in advance to be able to give an appropriate choice in the switch 86. A lot of statistical
analysis of sound has to be performed and included in the signal analysing unit 84.
Any small change in the encoding schemes may turn upside down on the statistical behaviour.
[0043] By using closed loop selection (Fig. 3b), encoding schemes may be exchanged without
making any changes in the rest of the unit. On the other hand, if many encoding schemes
are to be investigated, the computational requirements will be high.
[0044] The benefit with such a variable frame length coding for the side signal is that
one can select between a fine temporal resolution and coarse frequency resolution
on one side and coarse temporal resolution and fine frequency resolution on the other.
The above embodiments will preserve the stereo image in the best possible manner.
[0045] There are also some requirements on the actual encoding utilised in the different
encoding schemes. In particular when the closed loop selection is used, the computational
resources to perform a number of more or less simultaneous encoding have to be large.
The more complicated the encoding process is, the more computational power is needed.
Furthermore, a low bit rate at transmission is also to prefer.
[0046] The method presented in
US 5,434,948, uses a filtered version of the mono (main) signal to resemble the side or difference
signal. The filter parameters are optimised and allowed to vary in time. The filter
parameters are then transmitted representing an encoding of the side signal. In one
embodiment, also a residual side signal is transmitted. In many cases, such an approach
would be possible to use as side signal encoding method. This approach has, however,
some disadvantages. The quantisation of the filter coefficients and any residual side
signal often require relatively high bit rates for transmission, since the filter
order has to be high to provide an accurate side signal estimate. The estimation of
the filter itself may be problematic, especially in cases of transient rich music.
Estimation errors will give a modified side signal that is sometimes larger in magnitude
than the unmodified signal. This will lead to higher bit rate demands. Moreover, if
a new set of filter coefficients are computed every N samples, the filter coefficients
need to be interpolated to yield a smooth transition from one set of filter coefficients
to another, as discussed above. Interpolation of filter coefficients is a complex
task and errors in the interpolation will manifest itself in large side error signals
leading to higher bit rates needed for the difference error signal encoder.
[0047] A means to avoid the need for interpolation is to update the filter coefficients
on a sample-by-sample basis and rely on backwards-adaptive analysis. For this to work
well it is needed that the bit rate of the residual encoder is fairly high. This is
therefore not a good alternative for low bit rate stereo coding.
[0048] There exist cases, e.g. quite common with music, where the mono and the difference
signals are almost un-correlated. The filter estimation then becomes very troublesome
with the added risk of just making things worse for the difference error signal encoder.
[0049] The solution according to
US 5,434,948 can work pretty well in cases where the filter coefficients vary very slowly in time,
e.g. conference telephony systems. In the case of music signals, this approach does
not work very well as the filters need to change very fast to track the stereo image.
This means that sub-frame lengths of very differing magnitude has to be utilised,
which means that the number of combinations to test increases rapidly. This in turn
means that the requirements for computing all possible encoding schemes becomes impracticably
high.
[0050] Therefore, in a preferred embodiment, the encoding of the side signal is based on
the idea to reduce the redundancy between the mono and side signal by using a simple
balance factor instead of a complex bit rate consuming predictor filter. The residual
of this operation is then encoded. The magnitude of such a residual is relatively
small and does not call for very high bit rate need for transfer. This idea is very
suitable indeed to combine with the variable frame set approach described earlier,
since the computational complexity is low.
[0051] The use of a balance factor combined with the variable frame length approach removes
the need for complex interpolation and the associated problems that interpolation
may cause. Moreover, the use of a simple balance factor instead of a complex filter
gives fewer problems with estimation as possible estimation errors for the balance
factor has less impact. The preferred solution will be able to reproduce both panned
signals and diffuse sound fields with good quality and with limited bit rate requirements
and computational resources.
[0052] Fig. 4 illustrates a preferred embodiment of a stereo encoder. This embodiment is
very similar to the one shown in Fig. 2a, however, with the details of the side signal
encoder unit 30 revealed. The encoder 14 of this embodiment does not have any pre-processing
unit, and the input signals are provided directly to the addition and subtraction
units 34, 36. The mono signal x
mono is multiplied with a certain balance factor g
sm in a multiplier 33. In a subtraction unit 35, the multiplied mono signal is subtracted
from the side signal x
side, i.e. essentially the difference between the two channels, to produce a side residual
signal. The balance factor g
sm is determined based on the content of the mono and side signals by the optimiser
37 in order to minimise the side residual signal according to a quality criterion.
The quality criterion is preferably a least mean square criterion. The side residual
signal is encoded in a side residual encoder 39 according to any encoder procedures.
Preferably, the side residual encoder 39 is a low bit rate transform encoder or a
CELP (Codebook Excited Linear Prediction) encoder. The encoding parameters p
side representing the side signal then comprises the encoding parameters p
side residual representing the side residual signal and the optimised balance factor 49.
[0053] In the embodiment of Fig. 4, the mono signal 42 used for synthesising the side signals
is the target signal x
mono for the mono encoder 38. As mentioned above (in connection with Fig. 2a), the local
synthesis signal of the mono encoder 38 can also be utilised. In the latter case,
the total encoder delay may be increased and the computational complexity for the
side signal may increase. On the other hand, the quality may be better as it is then
possible to repair coding errors made in the mono encoder.
[0054] In a more mathematical way, the basic encoding scheme can be described as follows.
Denote the two channel signals as a and b, which may be the left and right channel
of a stereo pair. The channel signals are combined into a mono signal by addition
and to a side signal by a subtraction. In equation form, the operations are described
as:

[0055] It is beneficial to scale the x
mono and x
side signals down by a factor of two. It is here implied that other ways of creating the
x
mono and x
side exist. One can for instance use:

[0056] On blocks of the input signals, a modified or residual side signal is computed according
to:

where f(x
mono,x
side) is a balance factor function that based on the block on N samples, i.e. a sub-frame,
from the side and mono signals strive to remove as much as possible from the side
signal. In other words, the balance factor is used to minimise the residual side signal.
In the special case where it is minimised in a mean square sense, this is equivalent
to minimising the energy of the residual side signal x
side residual.
[0057] In the above mentioned special case
f(
xmono,xside) is described as:

where x
side is the side signal and x
mono is the mono signal. Note that the function is based on a block starting at "frame
start" and ending at "frame end".
[0058] It is possible to add weighting in the frequency domain to the computation of the
balance factor. This is done by convoluting the x
side and x
mono signals with the impulse response of a weighting filter. It is then possible to move
the estimation error to a frequency range where they are less easy to hear. This is
referred to as perceptual weighting.
[0059] A quantized version of the balance factor value given by the function
f(
xmono,xside) is transmitted to the decoder. It is preferable to account for the quantization
already when the modified side signal is generated. The expression below is then achieved:

[0060] Qg(..) is a quantization function that is applied to the balance factor given by the
function
f(
xmono,x
side). The balance factor is transmitted on the transmission channel. In normal left-right
panned signals the balance factor is limited to the interval [-1.0 1.0]. If on the
other hand the channels are out of phase with regards to one another, the balance
factor may extend beyond these limits.
[0061] As an optional means to stabilise the stereo image, one can limit the balance factor
if the normalised cross correlation between the mono and the side signal is poor as
given by the equation below:

where

[0062] These situations occur quite frequently with e.g. classical music or studio music
with a great amount of diffuse sounds, where in some cases the a and b channels might
almost cancel out one another on occasions when a mono signal is created. The effect
on the balance factor is that is can jump rapidly, causing a confused stereo image.
The fix above alleviates this problem.
[0063] The filter-based approach in
US 5,434,948 has the similar problems, but in that case the solution is not so simple.
[0064] If
Es is the encoding function (e.g. a transform encoder) of the residual side signal and
Em is the encoding function of the mono signal, then the decoded a" and b" signals in
the decoder end can be described as (it is assumed here that
γ = 0.5).

[0065] One important benefit from computing the balance factor for each frame is that one
avoids the use of interpolation. Instead, normally, as described above, the frame
processing is performed with overlapping frames.
[0066] The encoding principle using balance factors operates particularly well in the case
of music signals, where fast changes typically are needed to track the stereo image.
[0067] Lately, multi-channel coding has become popular. One example is 5.1 channel surround
sound in DVD movies. The channels are there arranged as: front left, front centre,
front right, rear left, rear right and subwoofer. In Fig. 5, an embodiment of an encoder
that encodes the three front channels in such an arrangement exploiting interchannel
redundancies is shown.
[0068] Three channel signals L, C, R are provided on three inputs 16A-C, and the mono signal
x
mono is created by a sum of all three signals. A centre signal encoder unit 130 is added,
which receives the centre signal x
centre. The mono signal 42 is in this embodiment the encoded and decoded mono signal x"
mono, and is multiplied with a certain balance factor g
Q in a multiplier 133. In a subtraction unit 135, the multiplied mono signal is subtracted
from the centre signal x
centre, to produce a centre residual signal. The balance factor g
Q is determined based on the content of the mono and centre signals by an optimiser
137 in order to minimise the centre residual signal according to the quality criterion.
The centre residual signal is encoded in a centre residual encoder 139 according to
any encoder procedures. Preferably, the centre residual encoder 139 is a low bit rate
transform encoder or a CELP encoder. The encoding parameters p
centre representing the centre signal then comprises the encoding parameters p
centre residual representing the centre residual signal and the optimised balance factor 149. The
centre residual signal and the scaled mono signal are added in an addition unit 235,
creating a modified centre signal 142 being compensated for encoding errors.
[0069] The side signal x
side, i.e. the difference between the left L and right R channels is provided to the side
signal encoder unit 30 as in earlier embodiments. However, here, the optimiser 37
also depends on the modified centre signal 142 provided by the centre signal encoder
unit 130. The side residual signal will therefore be created as an optimum linear
combination of the mono signal 42, the modified centre signal 142 and the side signal
in the subtraction unit 35.
[0070] The variable frame length concept described above can be applied on either of the
side and centre signals, or on both.
[0071] Fig. 6 illustrates a decoder unit suitable for receiving encoded audio signals from
the encoder unit of Fig. 5. The received signal 54 is divided into encoding parameters
p
mono representing the main signal, encoding parameters p
centre representing the centre signal and encoding parameters p
side representing the side signal. In the decoder 64, the encoding parameters p
mono representing the main signal are used to generate a main signal x"
mono. In the decoder 160, the encoding parameters p
centre representing the centre signal are used to generate a centre signal x"
centre, based on main signal x"
mono. In the decoder 60, the encoding parameters p
side representing the side signal are decoded, generating a side signal x"
side, based on main signal x"
mono and centre signal x"
centre.
[0072] The procedure can be mathematically expressed as follows:
[0073] The input signals x
left, x
right and x
centre are combined to a mono channel according to:

[0074] α, β and χ are in the remaining section set to 1.0 for simplicity, but they can be
set to arbitrary values. The α, β and χ values can be either constant or dependent
of the signal contents in order to emphasise one or two channels in order to achieve
an optimal quality.
[0076] xcentre is the centre signal and
xmono is the mono signal. The mono signal comes from the mono target signal but it is possible
to use the local synthesis of the mono encoder as well.
[0077] The centre residual signal to be encoded is:

[0078] Qg(
..) is a quantization function that is applied to the balance factor. The balance factor
is transmitted on the transmission channel.
[0079] If
Ec is the encoding function (e.g. a transform encoder) of the centre residual signal
and
Em is the encoding function of the mono signal then the decoded
x"centre signal in the decoder end can be described as:

[0080] The side residual signal to be encoded is:

where
gQsm and
gQsc are quantized values of the parameters
gsm and
gsc that minimises the expression:

[0081] η can for instance be equal to 2 for a least square minimisation of the error. The
gsm and
gsc parameters can be quantized jointly or separately.
[0082] If
Es is the encoding function of the side residual signal, then the decoded
x"left and
x"right channel signals are given as:

[0083] One of the perception artefacts that are most annoying is the pre-echo effect. In
Fig. 7a-b, diagrams are illustrating such an artefact. Assume a signal component having
the time development as shown by curve 100. In the beginning, starting from t0, the
signal component is not present in the audio sample. At a time t between t1 and t2,
the signal component suddenly appears. When the signal component is encoded, using
a frame length of t2-t1, the occurrence of the signal component will be "smeared out"
over the entire frame, as indicated in curve 101. If a decoding takes place of the
curve 101, the signal component appears a time Δt before the intended appearance of
the signal component, and a "pre-echo" is perceived.
[0084] The pre-echoing artefacts become more accentuated if long encoding frames are used.
By using shorter frames, the artefact is somewhat suppressed. Another way to deal
with the pre-echoing problems described above is to utilise the fact that the mono
signal is available at both the encoder and decoder end. This makes it possible to
scale the side signal according to the energy contour of the mono signal. In the decoder
end, the inverse scaling is performed and thus some of the pre-echo problems may be
alleviated.
[0085] An energy contour of the mono signal is computed over the frame as:

where w(n) is a windowing function. The simplest windowing function is a rectangular
window, but other window types such as a hamming window may be more desirable.
[0086] The side residual signal is then scaled as:

[0087] In a more general form the equation above can be written as:

where
f(..) is a monotonic continuous function. In the decoder, the energy contour is computed
on the decoded mono signal and is applied to the decoded side signal as:

[0088] Since this energy contour scaling in some sense is alternative to the use of shorter
frame lengths, this concept is particularly well suited to be combined with the variable
frame length concept, described further above. By having some encoding schemes that
applies energy contour scaling, some that do not and some that applies energy contour
scaling only during certain sub-frames, a more flexible set of encoding schemes may
be provided. In Fig. 8, an embodiment of a signal encoder unit 30 according to the
present invention is illustrated. Here, the different encoding schemes 81 comprise
hatched sub-frames, representing encoding applying the energy contour scaling, and
un-hatched sub-frames, representing encoding procedures not applying the energy contour
scaling. In this manner, combinations not only of sub-frames of differing lengths,
but sub-frames also of differing encoding principles are available. In the present
explanatory example, the application of energy contour scaling differs between different
encoding schemes. In a more general case, any encoding principles can be combined
with the variable length concept in an analogous manner.
[0089] The set of encoding schemes of Fig. 8 comprises schemes that handle e.g. pre-echoing
artefacts in different ways. In some schemes, longer sub-frames with pre-echoing minimisation
according to the energy contour principle are used. In other schemes, shorter sub-frames
without energy contour scaling are utilised. Depending on the signal content, one
of the alternatives may be more advantageous. For very severe pre-echoing cases, encoding
schemes utilising short sub-frames with energy contour scaling may be necessary.
[0090] The proposed solution can be used in the full frequency band or in one or more distinct
sub bands. The use of sub-band can be applied either on both the main and side signals,
or on one of them separately. A preferred embodiment comprises a split of the side
signal in several frequency bands. The reason is simply that it is easier to remove
the possible redundancy in an isolated frequency band than in the entire frequency
band. This is particularly important when encoding music signals with rich spectral
content.
[0091] One possible use is to encode the frequency band below a pre-determined threshold
with the above method. The pre-determined threshold can preferably be 2 kHz, or even
more preferably 1 kHz. For the remaining part of the frequency range of interest,
one can either encode another additional frequency band with the above method, or
use a completely different method.
[0092] One motivation to use the above method preferably for low frequencies is that the
diffuse sound fields generally have little energy content at high frequencies. The
natural reason is that sound absorption typically increases with frequency. Also,
the diffuse sound field components seem to play a less important role for the human
auditory system at higher frequencies.
[0093] Therefore, it is beneficial to employ this solution at low frequencies (below 1 or
2 kHz) and rely on other, even more bit efficient coding schemes at higher frequencies.
The fact that the scheme is only applied at low frequencies gives a large saving in
bit rate as the necessary bit rate with the proposed method is proportional to the
required bandwidth. In most cases, the mono encoder can encode the entire frequency
band, while the proposed side signal encoding is suggested to be performed only in
the lower part of the frequency band, as schematically illustrated by Fig. 9. Reference
number 301 refers to an encoding scheme of the side signal, reference number 302 refers
to any other encoding scheme of the side signal and reference number 303 refers to
an encoding scheme of the side signal.
[0094] There also exist the possibility to use the proposed method for several distinct
frequency bands.
[0095] In Fig. 10, the main steps of an embodiment of an encoding method are illustrated
as a flow diagram. The procedure starts in step 200. In step 210, a main signal deduced
from the polyphonic signals is encoded. In step 212, encoding schemes are provided,
which comprise sub-frames with differing lengths and/or order. A side signal deduced
in step 214 from the polyphonic signals is encoded by an encoding scheme selected
dependent at least partly on the actual signal content of the present polyphonic signals.
The procedure ends in step 299.
[0096] In Fig. 11, the main steps of an embodiment of a decoding method are illustrated
as a flow diagram. The procedure starts in step 200. In step 220, a received encoded
main signal is decoded. In step 222, encoding schemes are provided, which comprise
sub-frames with differing lengths and/or order. A received side signal is decoded
in step 224 by a selected encoding scheme. In step 226, the decoded main and side
signals are combined to a polyphonic signal. The procedure ends in step 299.
[0097] The embodiments described above are to be understood as a few illustrative examples
of the present invention. It will be understood by those skilled in the art that various
modifications, combinations and changes may be made to the embodiments without departing
from the scope of the present invention. In particular, different part solutions in
the different embodiments can be combined in other configurations, where technically
possible. The scope of the present invention is, however, defined by the appended
claims.
REFERENCES
1. A method of encoding multi-channel audio signals, comprising the steps of:
generating (210) a first output signal (pmono) being encoding parameters representing a main signal (xmono);
said main signal (xmono) being a first linear combination of signals of at least a first and a second channel
(a, b; L, R); and
generating (214) a second output signal (pside) being encoding parameters representing a side signal (xside);
said side (xside) signal being a second linear combination of signals of at least the first and the
second channel (a, b; L, R) within an encoding frame (80),
characterised in that the step of generating the second output signal (p
side) further comprises the step of:
scaling the side signal (xside) to an energy contour of the main signal (xmono).
2. A method according to claim 1, characterised in that the side signal (xside) is scaled by a factor being a monotonic continuous function of the energy contour
of the main signal (xmono).
3. A method according to claim 1, characterised in that the step of generating a second output signal (pside) comprises the step of creating a side residual signal (xside residual) based on a balanced difference between the side signal (xside) and the main signal (xmono), whereby the residual side signal (xside residual) is scaled to an energy contour of the main signal (xmono).
4. A method according to claim 3, characterised in that the side residual signal (xside residual) is divided by a factor being a monotonic continuous function of the energy contour
of the main signal (xmono).
5. A method of decoding multi-channel audio signals, comprising the steps of:
generating (220) a decoded main signal (x"mono) from encoding parameters (pmono) representing a main signal (xmono);
said main signal (xmono) being a first linear combination of signals of at least a first and a second channel
(a, b; L, R);
generating (224) a decoded side signal (x"side) from encoding parameters (pside) representing a side signal (xside);
said side signal (xside) being a second linear combination of signals of at least the first and the second
channel (a, b; L, R), within an encoding frame (80); and
combining (226) at least the decoded main signal (x"mono) and the decoded side signal (x"side) into signals of at least said first and said second channel (a, b; L, R),
characterised in that
said side signal (x
side) being scaled to an energy contour of the main signal (x
mono);
the step of generating the decoded side signal (x"
side) further comprises the step of:
inverse scaling the decoded side signal (x"side) to an energy contour of the decoded main signal (x"mono).
6. A method according to claim 5, characterised in that the decoded side signal (x"side) is inverse scaled by a factor being a monotonic continuous function of the energy
contour of the decoded main signal (x"mono).
7. A method according to claim 5, characterised in that the step of generating (224) the decoded side signal (x"side) comprises the step of generating a decoded side residual signal (x"side residual) and generating the decoded side signal (x"side) based on the decoded side residual signal (x"side residual), whereby the decoded side residual signal (x"side residual) is inverse scaled to an energy contour of the decoded main signal (x"mono).
8. A method according to claim 7, characterised in that the decoded side residual signal (x"side residual) is multiplied by a factor being a monotonic continuous function of the energy contour
of the decoded main signal (x"mono).
9. Encoder apparatus (14), comprising:
input means (16; 16A-C) for multi-channel audio signals (a, b; L, R, C) comprising
at least a first and a second channel (a, b; L, R),
means (38) for generating a first output signal (pmono) being encoding parameters representing a main signal (xmono);
said main signal (xmono) being a first linear combination of signals of at least the first and the second
channel (a, b; L, R);
means (30) for generating a second output signal (pside) being encoding parameters representing a side signal (xside);
said side signal (xside) being a second linear combination of signals of at least the first and the second
channel (a, b; L, R), within an encoding frame (80); and
output means (52);
characterised in that the means for generating a second output signal (p
side) further comprises:
means for scaling the side signal (xside) to an energy contour of the main signal (xmono).
10. Encoder apparatus according to claim 9, characterised in that the means for scaling the side signal (xside) are adapted to scale the side signal (xside) by a factor being a monotonic continuous function of the energy contour of the main
signal (xmono).
11. Encoder apparatus according to claim 9, characterised in that the means for generating the second output signal (pside) further comprises means for creating a side residual signal (Xside residual) based on a balanced difference between the side signal (xside) and the main signal (xmono), whereby the means for scaling the side signal (xside) are adapted to scale the side residual signal (xside residual) to an energy contour of the main signal (xmono).
12. Encoder apparatus according to claim 11, characterised in that the means for scaling the side signal (xside) are adapted to divide the side residual signal (xside residual) by a factor being a monotonic continuous function of the energy contour of the main
signal (xmono).
13. Decoder apparatus (24), comprising:
input means (54) for encoding parameters (pmono) representing a main signal and encoding parameters (pside) representing a side signal;
said main signal (xmono) being a first linear combination of a first and a second channel (a, b; L, R);
said side signal (xside) being a second linear combination of the first and the second channel (a, b; L,
R);
means (64) for generating a decoded main signal (x"mono) from the encoding parameters (pmono) representing the main signal;
means (60) for generating a decoded side signal (x"side) from the encoding parameters (pside) representing the side signal within an encoding frame (80);
means (68, 70) for combining at least the decoded main signal (x"mono) and the decoded side signal (x"side) into signals of at least the first and the second channel (a, b; L, R); and
output means (26; 26A-C),
characterised in that
said side signal (x
side) being scaled to an energy contour of the main signal (x
mono);
the means (60) for generating the decoded side signal (x"
side) in turn comprises:
means for inverse scaling the decoded side signal (x"side) to an energy contour of the decoded main signal (x"mono).
14. Decoder apparatus according to claim 13, characterised in that the means for inverse scaling the decoded side signal (x"side) are adapted to inverse scale the decoded side signal (x"side) by a factor being a monotonic continuous function of the energy contour of the decoded
main signal (x"mono).
15. Decoder apparatus according to claim 13, characterised in that the means (60) for generating the decoded side signal (x"side) further comprises means for generating a decoded side residual signal (x"side residual) and for generating the decoded side signal (x"side) based on the decoded side residual signal (x"side residual), whereby the means for inverse scaling the decoded side signal (x"side) are adapted to inverse scale the decoded side residual signal (xside residual) to an energy contour of the decoded main signal (x"mono).
16. Decoder apparatus according to claim 15, characterised in that the means for inverse scaling the decoded side signal (x"side) are adapted to multiply the decoded side residual signal (xside residual) by a factor being a monotonic continuous function of the energy contour of the decoded
main signal (x"mono).
17. Audio system (1) comprising at least one of:
an encoder apparatus (14) according to any of the claims 9 to 12, and
a decoder apparatus (24) according to any of the claims 13 to 16.
1. Verfahren eines Kodierens von Mehrkanalaudiosignalen, mit den Schritten:
Erzeugen (210) eines ersten Ausgabesignals (pmono), das Kodierparameter ist, die ein Hauptsignal (xmono) darstellen;
wobei das Hauptsignal (xmono) eine erste Linearkombination von Signalen zumindest eines ersten und eines zweiten
Kanals (a, b; L, R) ist; und
Erzeugen (214) eines zweiten Ausgabesignals (pside), das Kodierparameter ist, die ein Seitensignal (xside) darstellen;
wobei das Seitensignal (xside) eine zweite Linearkombination von Signalen von zumindest dem erstem und dem zweiten
Kanal (a, b; L, R) innerhalb eines Kodierrahmens (80) ist,
dadurch gekennzeichnet, dass der Schritt eines Erzeugens des zweiten Ausgabesignals (pside) weiter den Schritt umfasst:
Skalieren des Seitensignals (xside) auf eine Energiekontur des Hauptsignals (xmono)
2. Verfahren nach Anspruch 1, dadurch gekennzeichnet, dass das Seitensignal (xside) um einen Faktor skaliert wird, der eine monotone, kontinuierliche Funktion der Energiekontur
des Hauptsignals (xmono) ist.
3. Verfahren nach Anspruch 1, dadurch gekennzeichnet, dass der Schritt eines Erzeugens eines zweiten Ausgabesignals (pside) den Schritt eines Erzeugens eines Seitenrestsignals (xside residual) basierend auf einem ausgeglichenen Unterschied zwischen dem Seitensignal (xside) und dem Hauptsignal (xmono) umfasst, wobei das Seitenrestsignal (xside residual) auf eine Energiekontur des Hauptsignals (xmono) skaliert wird.
4. Verfahren nach Anspruch 3, dadurch gekennzeichnet, dass das Seitenrestsignal (xside residual) durch einen Faktor geteilt wird, der eine monotone, kontinuierliche Funktion der
Energiekontur des Hauptsignals (xmono) ist.
5. Verfahren eines Dekodierens von Mehrkanalaudiosignalen, mit den Schritten:
Erzeugen (220) eines dekodierten Hauptsignals (x"mono) aus den Kodierparametern (pmono), die ein Hauptsignal (xmono) darstellen;
wobei das Hauptsignal (xmono) eine erste Linearkombination von Signalen von zumindest einem ersten und einem zweiten
Kanal (a, b; L, R) ist;
Erzeugen (224) eines dekodierten Seitensignals (x"side) aus den Kodierparametern (pside) , die ein Seitensignal (xside) darstellen;
wobei das Seitensignal (xside) eine zweite Linearkombination von Signalen von zumindest dem ersten und dem zweiten
Kanal (a, b; L, R) innerhalb eines Kodierrahmens (80) ist; und
Kombinieren (226) zumindest des dekodierten Hauptsignals (x"mono) und des dekodierten
Seitensignals (x"side) in Signale zumindest des ersten und des zweiten Kanals (a, b; L, R),
dadurch gekennzeichnet, dass
das Seitensignal (xside) auf eine Energiekontur des Hauptsignals (xmono) skaliert ist;
der Schritt eines Erzeugens des dekodierten Seitensignals (x"side) weiter den Schritt umfasst:
inverses Skalieren des dekodierten Seitensignals (x"side) an eine Energiekontur des dekodierten Hauptsignals (x"mono).
6. Verfahren nach Anspruch 5, dadurch gekennzeichnet, dass das dekodierte Seitensignal (x"side) invers durch einen Faktor skaliert wird, der eine monotone, kontinuierliche Funktion
der Energiekontur des dekodierten Hauptsignals (x"mono) ist.
7. Verfahren nach Anspruch 5, dadurch gekennzeichnet, dass der Schritt eines Erzeugens (224) des dekodierten Seitensignals (x"side) den Schritt eines Erzeugens eines dekodierten Seitenrestsignals (x"side residual) und eines Erzeugens des dekodierten Seitensignals (x"side) basierend auf dem dekodierten Seitenrestsignal (x"side residual) umfasst, wobei das dekodierte Seitenrestsignal (x"side residual) auf eine Energiekontur des dekodierten Hauptsignals (x"mono) invers skaliert wird.
8. Verfahren nach Anspruch 7, dadurch gekennzeichnet, dass das dekodierte Seitenrestsignal (x"side residual) mit einem Faktor multipliziert wird, der eine monotone, kontinuierliche Funktion
der Energiekontur des dekodierten Hauptsignals (x"mono) ist.
9. Kodiergerät (14) mit:
einer Eingabevorrichtung (16; 16A-C) für Mehrkanalaudiosignale (a, b; L, R, C), die
zumindest einen ersten und einen zweiten Kanal (a, b; L, R) umfassen,
einer Vorrichtung (38) zum Erzeugen eines ersten Ausgabesignals (pmono) , das Kodierparameter ist, die ein Hauptsignal (xmono) darstellen;
wobei das Hauptsignal (xmono) eine erste Linearkombination von Signalen von zumindest dem ersten und dem zweiten
Kanal (a, b; L, R) ist;
einer Vorrichtung (30) zum Erzeugen eines zweiten Ausgabesignals (pside), das Kodierparameter ist, die ein Seitensignal (xside) darstellen;
wobei das Seitensignal (xside) eine zweite Linearkombination von Signalen von zumindest dem ersten und dem zweiten
Kanal (a, b; L, R) innerhalb eines Kodierrahmens (80) ist; und
einer Ausgabevorrichtung (52);
dadurch gekennzeichnet, dass die Vorrichtung zum Erzeugen eines zweiten Ausgabesignals (pside) weiter umfasst:
eine Vorrichtung zum Skalieren des Seitensignals (xside) auf eine Energiekontur des Hauptsignals (xmono) .
10. Kodiergerät nach Anspruch 9, dadurch gekennzeichnet, dass die Vorrichtung zum Skalieren des Seitensignals (xside) angepasst ist, das Seitensignal (xside) mit einem Faktor zu skalieren, der eine monotone, kontinuierliche Funktion der Energiekontur
des Hauptsignals (xmono) ist.
11. Kodiergerät nach Anspruch 9, dadurch gekennzeichnet, dass die Vorrichtung zum Erzeugen des zweiten Ausgabesignals (pside) weiter eine Vorrichtung zum Erzeugen eines Seitenrestsignals (xside residual) basierend auf einem ausgeglichenen Unterschied zwischen dem Seitensignal (xside) und dem Hauptsignal (xmono) umfasst, wobei die Vorrichtung zum Skalieren des Seitensignals (xside) angepasst ist, das Seitenrestsignal (xside residual) auf eine Energiekontur des Hauptsignals (xmono) zu skalieren.
12. Kodiergerät nach Anspruch 11, dadurch gekennzeichnet, dass die Vorrichtung zum Skalieren des Seitensignals (xside) angepasst ist, das Seitenrestsignal (xside residual) durch einen Faktor zu teilen, der eine monotone, kontinuierliche Funktion der Energiekontur
des Hauptsignals (xmono) ist.
13. Dekodiergerät (24), mit:
einer Eingabevorrichtung (54) für Kodierparameter (pmono), die ein Hauptsignal darstellen und Kodierparameter (pside), die ein Seitensignal darstellen;
wobei das Hauptsignal (xmono) eine erste Linearkombination eines ersten und eines zweiten Kanals (a, b; L, R)
ist;
wobei das Seitensignal (xside) eine zweite Linearkombination des ersten und des zweiten Kanals (a, b; L, R) ist;
einer Vorrichtung (64) zum Erzeugen eines dekodierten Hauptsignals (x"mono) aus den Kodierparametern (pmono), die das Hauptsignal darstellen;
einer Vorrichtung (60) zum Erzeugen eines dekodierten Seitensignals (x"side) aus den Kodierparametern (pside), die das Seitensignal innerhalb eines Kodierrahmens (80) darstellen;
einer Vorrichtung (68, 70) zum Kombinieren zumindest des dekodierten Hauptsignals
(x'mono) und des dekodierten Seitensignals (x"side) in Signale zumindest des ersten und des zweiten Kanals (a, b; L, R); und
einer Ausgabevorrichtung (26; 26A-C),
dadurch gekennzeichnet, dass
das Seitensignal (xside) auf eine Energiekontur des Hauptsignals (xmono) skaliert ist;
die Vorrichtung (60) zum Erzeugen des dekodierten Seitensignals (x"side) wiederum umfasst:
eine Vorrichtung zum inversen Skalieren des dekodierten Seitensignals (x"side) auf eine Energiekontur des dekodierten Hauptsignals (x"mono).
14. Dekodiergerät nach Anspruch 13, dadurch gekennzeichnet, dass die Vorrichtung zum inversen Skalieren des dekodierten Seitensignals (x"side) angepasst ist, das dekodierte Seitensignal (x"side) durch einen Faktor invers zu skalieren, der eine monotone, kontinuierliche Funktion
der Energiekontur des dekodierten Hauptsignals (x"mono) ist.
15. Dekodiergerät nach Anspruch 13, dadurch gekennzeichnet, dass die Vorrichtung (60) zum Erzeugen des dekodierten Seitensignals (x"side) weiter eine Vorrichtung zum Erzeugen eines dekodierten Seitenrestsignals (x"side residual) und zum Erzeugen des dekodierten Seitensignals (x"side) basierend auf dem dekodierten Seitenrestsignal (x"side residual) umfasst, wobei die Vorrichtung zum inversen Skalieren des dekodierten Seitensignals
(x"side) angepasst ist, das dekodierte Seitenrestsignal (xside residual) auf eine Energiekontur des dekodierten Hauptsignals (x"mono) invers zu skalieren.
16. Dekodiergerät nach Anspruch 15, dadurch gekennzeichnet, dass die Vorrichtung zum inversen Skalieren des dekodierten Seitensignals (x"side) angepasst ist, das dekodierte Seitenrestsignal (x"side residual) mit einem Faktor zu multiplizieren, der eine monotone, kontinuierliche Funktion
der Energiekontur des dekodierten Hauptsignals (x"mono) ist.
17. Audiosystem (1) mit zumindest einem aus:
einem Kodiergerät (14) nach einem der Ansprüche 9 bis 12 und
einem Dekodiergerät (24) nach einem der Ansprüche 13 bis 16.
1. Procédé de codage de signaux audio à canaux multiples, comprenant les étapes consistant
à :
produire (210) un premier signal de sortie (pmono) constitué de paramètres de codage représentant un signal principal (xmono) :
ledit signal principal (xmono) étant une première combinaison linéaire de signaux d'au moins un premier et un second
canal (a, b ; L, R) ; et
produire (214) un second signal de sortie (pside) constitué de paramètres de codage représentant un signal latéral (xside) :
ledit signal latéral (xside) étant une seconde combinaison linéaire de signaux d'au moins lesdits premier et
second canal (a, b ; L, R) au sein d'une trame de codage (80) ;
caractérisé en ce que l'étape de production du second signal de sortie (pside) comprend en outre l'étape consistant à :
recadrer le signal latéral (xside) sur un contour d'énergie du signal principal (xmono) .
2. Procédé selon la revendication 1, caractérisé en ce que le signal latéral (xside) est recadré à l'aide d'un facteur qui est une fonction continue monotone du contour
d'énergie du signal principal (xmono) .
3. Procédé selon la revendication 1, caractérisé en ce que l'étape de production d'un second signal de sortie (pside) comprend l'étape consistant à créer un signal latéral résiduel (xside residual) fondé sur une différence équilibrée entre le signal latéral (xside) et le signal principal (xmono) , si bien que le signal latéral résiduel (xside residual) est recadré sur le contour d'énergie du signal principal (xmono).
4. Procédé selon la revendication 3, caractérisé en ce que le signal latéral résiduel (xside residual) est divisé par un facteur qui est une fonction continue monotone du contour d'énergie
du signal principal (xmono) -
5. Procédé de décodage de signaux audio à canaux multiples, comprenant les étapes consistant
à :
produire (220) un signal principal décodé (x"mono) à partir de paramètres de codage (pmono) représentant un signal principal (xmono) ;
ledit signal principal (xmono) étant une première combinaison linéaire de signaux d'au moins un premier et un second
canal (a, b ; L, R) ;
produire (224) un signal latéral décodé (x"side) à partir de paramètres de codage (pside) représentant un signal latéral (xside) ;
ledit signal latéral (xside) étant une seconde combinaison linéaire de signaux d'au moins les premier et second
canal (a, b ; L, R) au sein d'une trame de codage (80) ; et
combiner (226) au moins le signal principal décodé (x"mono) et le signal latéral décodé (x"side) en signaux d'au moins lesdits premier et second canaux (a, b ; L, R),
caractérisé en ce que :
ledit signal latéral (xside) est recadré sur un contour d'énergie du signal principal (xmono) ;
l'étape de production du signal latéral décodé (x"side) comprend en outre l'étape consistant à :
réaliser un recadrage inverse du signal latéral décodé (x"side) sur un contour d'énergie du signal principal décodé (x"mono).
6. Procédé selon la revendication 5, caractérisé en ce que le signal latéral décodé (x"side) subit un recadrage inverse d'un facteur qui est une fonction continue monotone du
contour d'énergie du signal principal décodé (x"mono).
7. Procédé selon la revendication 5, caractérisé en ce que l'étape de production (224) du signal latéral décodé (x"side) comprend les étapes consistant à produire un signal latéral résiduel décodé (x"side residual) et à produire le signal latéral décodé (x"side) sur la base du signal latéral résiduel décodé (x"side residual), si bien que le signal latéral résiduel décodé (x"side residual) subit un recadrage inverse sur un contour d'énergie du signal principal décodé (x"mono).
8. Procédé selon la revendication 7, caractérisé en ce que le signal latéral résiduel décodé (x"side residual) est multiplié par un facteur qui est une fonction continue monotone du contour d'énergie
du signal principal décodé (x"mono).
9. Appareil de codage (14) comprenant :
un moyen d'entrée (16 ; 16A à 16C) de signaux audio à canaux multiples comprenant
au moins un premier et un second canal (a, b ; L, R) ;
un moyen (38) destiné à produire un premier signal de sortie (pmono) constitué de paramètres de codage représentant un signal principal (xmono) ;
ledit signal principal (xmono) étant une première combinaison linéaire de signaux d'au moins le premier et le second
canal (a, b ; L, R) ;
un moyen (30) destiné à produire un second signal de sortie (pside) constitué de paramètres de codage représentant un signal latéral (xside) ;
ledit signal latéral (xside) étant une seconde combinaison linéaire de signaux d'au moins lesdits premier et
second canal (a, b ; L, R) au sein d'une trame de codage (80) ; et
un moyen de sortie (52) ;
caractérisé en ce que le moyen de production du second signal de sortie (pside) comprend en outre :
un moyen destiné à recadrer le signal latéral (xside) sur un contour d'énergie du signal principal (xmono).
10. Appareil de codage selon la revendication 9, caractérisé en ce que le moyen de recadrage du signal latéral (xside) est conçu pour recadrer le signal latéral (xside) à l'aide d'un facteur qui est une fonction continue monotone du contour d'énergie
du signal principal (xmono).
11. Appareil de codage selon la revendication 9, caractérisé en ce que le moyen de production du second signal de sortie (pside) comprend en outre un moyen destiné à créer un signal latéral résiduel (xside residual) fondé sur une différence équilibrée entre le signal latéral (xside) et le signal principal (xmono) , si bien que moyen de recadrage du signal latéral résiduel (xside residual) est conçu pour recadrer le signal latéral résiduel (xside residual) sur un contour d'énergie du signal principal (xmono).
12. Appareil de codage selon la revendication 11, caractérisé en ce que le moyen de recadrage du signal latéral (xside) est conçu pour diviser le signal latéral résiduel (xside residual) par un facteur qui est une fonction continue monotone du contour d'énergie du signal
principal (xmono).
13. Appareil de décodage (24) comprenant :
un moyen d'entrée (54) de paramètres de codage (pmono) représentant un signal principal et de paramètres de codage (pside) représentant un signal latéral ;
ledit signal principal (xmono) étant une première combinaison linéaire de signaux d'au moins un premier et un second
canal (a, b ; L, R) ;
ledit signal latéral (xside) étant une seconde combinaison linéaire de signaux d'au moins lesdits premier et
second canal (a, b ; L, R) ;
un moyen (64) destiné à produire un signal principal décodé (x"mono) à partir des paramètres de codage (pmono) représentant le signal principal ;
un moyen (60) destiné à produire un signal latéral décodé (x"side) à partir des paramètres de codage (pside) représentant le signal latéral au sein d'une trame de codage (80) ;
un moyen (68, 70) destiné à combiner au moins le signal principal décodé (x"mono) et le signal latéral décodé (x"side) pour donner des signaux d'au moins lesdits premier et second canal (a, b ; L, R)
; et
un moyen de sortie (26 ; 26A à 26C) ;
caractérisé en ce que :
ledit signal latéral (xside) est recadré sur un contour d'énergie du signal principal (xmono)
le moyen (60) de production du signal latéral décodé (x"side) comprend en outre :
un moyen destiné à réaliser un recadrage inverse du signal latéral décodé (x"side) sur un contour d'énergie du signal principal décodé (x"mono) .
14. Appareil de décodage selon la revendication 13, caractérisé en ce que le moyen de recadrage inverse du signal latéral décodé (x"side) est conçu pour appliquer un recadrage inverse au signal latéral décodé (x"side) à l'aide d'un facteur qui est une fonction continue monotone du contour d'énergie
du signal principal décodé (x"mono) .
15. Appareil de décodage selon la revendication 13, caractérisé en ce que le moyen (60) de production du signal latéral décodé (x"side) comprend en outre un moyen destiné à produire un signal latéral résiduel décodé
(x"side residual) et à produire le signal latéral décodé (x"side) sur la base du signal latéral résiduel décodé (x"side residual) , si bien que le moyen de recadrage inverse du signal latéral résiduel décodé (x"side residual) est conçu pour appliquer un recadrage inverse au signal latéral résiduel décodé
(x"side residual) sur un contour d'énergie du signal principal décodé (x"mono) .
16. Appareil de décodage selon la revendication 15, caractérisé en ce que le moyen de recadrage inverse du signal latéral décodé (x"side) est conçu pour multiplier le signal latéral résiduel décodé (x"side residual) par un facteur qui est une fonction continue monotone du contour d'énergie du signal
principal décodé (x"mono) .
17. Système audio (1) comprenant au moins un des éléments suivantes :
un appareil de codage (14) selon l'une quelconque des revendications 9 à 12 ; et
un appareil de décodage (24) selon l'une quelconque des revendications 13 à 16.