TECHNICAL FIELD
[0001] The present invention relates in general to audio coding, and in particular to code
excited linear prediction coding.
BACKGROUND
[0002] Existing stereo, or in general multi-channel, coding techniques require a rather
high bit-rate. Parametric stereo is often used at very low bit-rates. However, these
techniques are designed for a wide class of generic audio material, i.e. music, speech
and mixed content.
[0003] In multi-channel speech coding, very little has been done. Most work has focused
on an inter-channel prediction (ICP) approach. ICP techniques utilize the fact that
there is correlation between a left and a right channel. Many different methods that
reduce this redundancy in the stereo signal are described in the literature, e.g.
in [1][2][3].
[0004] The ICP approach models quite well the case where there is only one speaker, however
it fails to model multiple speakers and diffuse sound sources (e.g. diffuse background
noises). Therefore, encoding a residual of ICP is a must in several cases and puts
quite high demands on the required bit-rate.
[0005] Most existing speech codecs are monophonic and are based on the code-excited linear
predictive (CELP) coding model. Examples include AMR-NB and AMR-WB (Adaptive Multi-Rate
Narrow Band and Adaptive Multi-Rate Wide Band). In this model, i.e. CELP, an excitation
signal at an input of a short-term LP synthesis filter is constructed by adding two
excitation vectors from adaptive and fixed (innovative) codebooks, respectively. The
speech is synthesized by feeding the two properly chosen vectors from these codebooks
through the short-term synthesis filter. The optimum excitation sequence in a codebook
is chosen using an analysis-by-synthesis search procedure in which the error between
the original and synthesized speech is minimized according to a perceptually weighted
distortion measure.
[0006] There are two types of fixed codebooks. A first type of codebook is the so-called
stochastic codebooks. Such a codebook often involves substantial physical storage.
Given the index in a codebook, the excitation vector is obtained by conventional table
lookup. The size of the codebook is therefore limited by the bit-rate and the complexity.
[0007] A second type of codebook is an algebraic codebook. By contrast to the stochastic
codebooks, algebraic codebooks are not random and require virtually no storage. An
algebraic codebook is a set of indexed code vectors whose amplitudes and positions
of the pulses constituting the k
th code vector are derived directly from the corresponding index k. This requires virtually
no memory requirements. Therefore, the size of algebraic codebooks is not limited
by memory requirements. Additionally, the algebraic codebooks are well suited for
efficient search procedures.
[0008] It is important to note that a substantial and often also major part of the speech
codec available bits are allocated to the fixed codebook excitation encoding. For
instance, in the AMR-WB standard, the amount of bits allocated to the fixed codebook
procedures ranges from 36% up to 76%. Additionally, it is the fixed codebook excitation
search that represents most of the encoder complexity.
[0009] In [7], a multi-part fixed codebook including an individual fixed codebook for each
channel and a shared codebook common to all channels is used. With this strategy it
is possible to have a good representation of the inter-channel correlations. However,
this comes at an extent of increased complexity as well as storage. Additionally,
the required bit rate to encode the fixed codebook excitations is quite large because
in addition to each channel codebook index one needs also to transmit the shared codebook
index. In [8] and [9], similar methods for encoding multi-channel signals are described
where the encoding mode is made dependent on the degree of correlation of the different
channels. These techniques are already well known from Left/Right and Mid/Side encoding,
where switching between the two encoding modes is dependent on a residual, thus dependent
on correlation.
[0010] In [10], a method for encoding multichannel signals is described which generalizes
different elements of a single channel linear predictive codec. The method has the
disadvantage of requiring an enormous amount of computations rendering it unusable
in real-time applications such as conversational applications. Another disadvantage
of this technology is the amount of bits needed in order to encode the various decorrelation
filters used for encoding.
[0011] Another disadvantage with the previously cited solutions described above is their
incompatibility towards existing standardized monophonic conversational codecs, in
the sense that no monophonic signal is separately encoded thus prohibiting the ability
to directly decode a monophonic only signal.
[0012] In [11], multiple stage audio encoding/decoding of a multi-pulse signal is disclosed.
An auxiliary multi-pulse setting circuit sets candidates of pulse positions so that
the pulse positions to which no pulse is located are selected in an auxiliary multi-phase
searching circuit prior to the pulse positions at which pulses have already been encoded
in a multi-pulse searching circuit. The auxiliary multi-phase searching circuit generates
an auxiliary multi-pulse signal according to the candidates and encodes the auxiliary
multi-pulse signal so that the difference between the reproduced audio signal which
is obtained by driving a linear predictive filter synthesis filter with the auxiliary
multi-pulse signal and an input audio signal is minimized similarly to the multi-pulse
searching circuit.
SUMMARY
[0013] A general problem with prior art speech coding is that it requires high bit rates
and complex encoders.
[0014] A general object of the present invention is thus to provide improved methods and
devices for speech coding. A subsidiary object of the present invention is to provide
CELP methods and devices having reduced requirement in terms of bit rates and encoder
complexity.
[0015] The above objects are achieved by methods and devices according to the enclosed patent
claims wherein the invention is defined by methods for encoding and decoding audio
signals according to claims 1 and 12, respectively, and by an encoder according to
claim 18 an a decoder according to claim 29. In general words, excitation signals
of a first signal encoded by CELP are used to derive a limited set of candidate excitation
signals for a second signal, different from the first signal. Preferably, the second
signal is correlated with the first signal. In a particular embodiment, the limited
set of candidate excitation signals is derived by a rule, which was selected from
a predetermined set of rules based on the encoded first signal and/or the second signal.
Preferably, pulse locations of the excitation signals of the first encoded signal
are used for determining the set of candidate excitation signals. More preferably,
the pulse locations of the set of candidate excitation signals are positioned in the
vicinity of the pulse locations of the excitation signals of the first encoded signal.
The first and second signals may be multi-channel signals of a common speech or audio
signal.
[0016] One advantage with the present invention is that the coding complexity is reduced.
Furthermore, in the case of multi-channel signals, the required bit rate for transmitting
coded signals is reduced.
[0017] Another advantage of the invention is the possibility to be implemented as an extension
to existing speech codecs with very few modifications.
BRIEF DESCRIPTION OF THE DRAWINGS
[0018] The invention, together with further objects and advantages thereof, may best be
understood by making reference to the following description taken together with the
accompanying drawings, in which:
FIG. 1A is a schematic illustration of a code excited linear prediction model;
FIG. 1B is a schematic illustration of a process of deriving an excitation signal;
FIG. 1C is a schematic illustration of an embodiment of an excitation signal for use
in a code excited linear prediction model;
FIG. 2 is a block scheme of an encoder and decoder according to the code excited linear
prediction model;
FIG. 3A is a diagram illustrating one embodiment of a principle of selecting candidate
excitation signals according to the present invention;
FIG. 3B is a diagram illustrating another embodiment of a principle of selecting candidate
excitation signals according to the present invention;
FIG. 4 illustrates a possibility to reduce required data entities according to an
embodiment of the present invention;
FIG. 5A is a block scheme of an embodiment of encoders and decoders for two signals
according to the present invention;
FIG. 5B is a block scheme of another embodiment of encoders and decoders for two signals
according to the present invention;
FIG. 6 is a block scheme of an example of encoders and decoders for re-encoding of
a signal.
FIG. 7 is a block scheme of an example of encoders and decoders for parallel encoding
of a signal for different bit rates;
FIG. 8 is a diagram illustrating the perceptual quality achieved by embodiments of
the present invention;
FIG. 9 is a flow diagram of the main steps of an embodiment of an encoding method
according to the present invention;
FIG. 10 is a flow diagram of the main steps of a re-encoding method and
FIG. 11 is a flow diagram of the main steps of an embodiment of a decoding method
according to the present invention.
DETAILED DESCRIPTION
[0019] A general CELP speech synthesis model is depicted in Fig. 1A. A fixed codebook 10
comprises a number of candidate excitation signals 30, characterized by a respective
index k. In the case of an algebraic codebook, the index k alone characterizes the
corresponding candidate excitation signal 30 completely. Each candidate excitation
signal 30 comprises a number of pulses 32 having a certain position and amplitude.
An index k determines a candidate excitation signal 30 that is amplified in an amplifier
11 giving rise to an output excitation signal c
k(n) 12. An adaptive codebook 14, which is not the primary subject of the present invention,
provides an adaptive signal v(n), via an amplifier 15. The excitation signal c
k(n) and the adaptive signal v(n) are summed in an adder 17, giving a composite excitation
signal u(n). The composite excitation signal u(n) influences the adaptive codebook
for subsequent signals, as indicated by the dashed line 13.
[0020] The composite excitation signal u(n) is used as input signal to a transform 1/A(z)
in a linear prediction synthesis section 20, resulting in a "predicted" signal ŝ(n)
21, which, typically after post-processing 22, is provided as the output from the
CELP synthesis procedure.
[0021] The CELP speech synthesis model is used for analysis-by-synthesis coding of the speech
signal of interest. A target signal s(n), i.e. the signal that is going to be resembled
is provided. A long-term prediction is made by use of the adaptive codebook, adjusting
a previous coding to the present target signal, giving an adaptive signal v(n)=g
p u(n-δ). The remaining difference is the target for the fixed codebook excitation
signal, whereby a codebook index k corresponding to an entry C
k should minimize the difference according to typically an objective function, e.g.
a mean square measure. In general, the algebraic codebook is searched by minimizing
the mean square error between the weighted input speech and the weighted synthesis
speech. The fixed codebook search, aims to find the algebraic codebook entry
Ck corresponding to index k, such that

is maximized. The matrix
H is a filtering matrix whose elements are derived from the impulse response of a weighting
filter.
y2 is a vector of components which are dependent on the signal to be encoded.
[0022] This fixed codebook procedure can be illustrated as in Fig. 1B, where an index k
selects an entry C
k from the fixed codebook 10 as excitation signal 12. In a stochastic fixed codebook,
the index k typically serves as an input to a table look-up, while in an algebraic
fixed codebook, the excitation signal 12 are derived directly from the index k. In
general the multi-pulse excitation can be written as:

[0023] Where
pi,k are the pulses positions for index k, while
bi,k are the individual pulses amplitudes and P is the number of pulses and δ is the Dirac
pulse function:

[0024] Fig. 1C illustrates an example of a candidate excitation signal 30 of the fixed codebook
10. The candidate excitation signal 30 is characterized by a number of pulses 32,
in this example 8 pulses. The pulses 32 are characterized by their position P(1)-P(8)
and their amplitude, which in a typical algebraic fixed codebook is either + 1 or
-1.
[0025] In an exemplary encoder/decoder system for a single channel, the CELP model is typically
implemented as illustrated in Fig. 2. The different parts corresponding to the different
functions of the CELP synthesis model of Fig. 1A are given the same reference numbers,
since the parts mainly are characterized by their function and typically not in the
same degree by their actual implementation. For instance, error weighting filters,
usually present in an actual implementation of a linear prediction analysis by synthesis
are not represented.
[0026] A signal to be encoded s(n) 33 is provided to an encoder unit 40. The encoder unit
comprises a CELP synthesis block 25 according to the above discussed principles. (Post-processing
is omitted in order to facilitate the reading of the figure.) The output from the
CELP synthesis block 25 is compared with the signal s(n) in a comparator block 31.
A difference 37, which may be weighted by a weighting filter, is provided to an codebook
optimization block 35, which is arranged according to any prior-art principles to
find an optimum or at least reasonably good excitation signal c
k(n) 12. The codebook optimization block 35 provides the fixed codebook 10 with the
corresponding index k. When the final excitation signal is found, the index k and
the delay δ of the adaptive codebook 12 are encoded in an index encoder 38 to provide
an output signal 45 representing the index k and the delay δ.
[0027] The representation of the index k and the delay δ is provided to a decoder unit 50.
The decoder unit comprises a CELP synthesis block 25 according to the above discussed
principles. (Post-processing is also here omitted in order to facilitate the reading
of the figure.) The representation of index k and delay δ are decoded in an index
decoder 53, and index k and delay δ are provided as input parameters to the fixed
codebook and the adaptive code, respectively, resulting in a synthesized signal ŝ(n)
21, which is supposed to resemble the original signal s(n).
[0028] The representation of the index k and the delay δ can be stored for a shorter or
longer time anywhere between the encoder and decoder, enabling e.g. audio recordings
storing requiring relatively small storing capability.
[0029] The present invention is related to speech and in general audio coding. In a typical
case, it deals with cases where a main signal
sM(
n) has been encoded according to the CELP technique and the desire is to encode another
signal
sS(
n). The other signal could be a signal corresponding to another channel, e.g. stereo,
multi-channel 5.1, etc.
[0030] This invention is thus directly applicable to stereo and in general multi-channel
coding for speech in teleconferencing applications. The application of this invention
can also include audio coding as part of an open-loop or closed-loop content dependent
encoding.
[0031] There should preferably exist a correlation between the main signal and the other
signal, in order for the present invention to operate in optimal conditions. However,
the existence of such correlation is not a mandatory requirement for the proper operation
of the invention. In fact, the invention can be operated adaptively and made dependent
on the degree of correlation between the main signal and the other signal. Since there
exist no causal relationship between a left and right channel in stereo applications,
the main signal
sM(
n) is often chosen as the sum signal and
sS(n) as the difference signal of the left and right channels.
[0032] The presumption of the present invention is that the main signal
sM(n) is available in a CELP encoded representation. One basic idea of the present invention
is to limit the search in the fixed codebook during the encoding of the other signal
sS(
n) to a subset of candidate excitation signals. This subset is selected dependent on
the CELP encoding of the main signal. In a preferred embodiment, the pulses of the
candidate excitation signals of the subset are restricted to a set of pulse positions
that are dependent on the pulse positions of the main signal. This is equivalent to
defining constrained candidate pulse locations. The set of available pulse positions
can typically be set to the pulse positions of the main signal plus neighboring pulse
positions.
[0033] This reduction of the number of candidate pulses reduces dramatically the computational
complexity of the encoder.
[0034] Below, an illustrative example is given for the general case of two channel signals.
However, this is easily extended to multiple channels. However, in the case of multiple
channels, the target may be different given different weighting filters on each channel,
but also the targets on each channels may be delayed with respect to each other.
[0035] A main channel and a side channel can be constructed by

where
sL(n) and
sR(n) are the input of the left and right channel respectively. One can clearly see that
even if the left and right channel were a delayed version of each other, then this
would not be the case for the main and the side channel, since in general these would
contain information from both channels.
[0036] In the following, it is assumed that the main channel is the first encoded channel
and that the pulses locations for the fixed codebook excitation for that encoding
are available.
[0037] The target for the side signal fixed codebook excitation encoding is computed as
the difference between the side signal and the adaptive codebook excitation:

where
gPν(
n) is the adaptive codebook excitation and
sC(
n) is the target signal for adaptive codebook search.
[0038] In the present embodiment, the number of potential pulse positions of the candidate
excitation signals are defined relative to the main signal pulse positions. Since
they are only a fraction of all possible positions, the amount of bits required for
encoding the side signal with an excitation signal within this limited set of candidate
excitation signals is therefore largely reduced, compared with the case where all
pulse positions may occur.
[0039] The selection of the pulses candidate positions relatively to the main pulse position
is fundamental in determining the complexity as well as the required bit-rate.
[0040] For example, if the frame length is L and if the number of pulses in the main signal
encoding is N, then one would need roughly N*log2(L) bits to encode the pulse positions.
However for encoding the side signal, if one retains only the main signal pulse positions
as candidates, and the number of pulses in candidate excitation signals for the side
signal is P, then one needs roughly P*log2(N) bits. For reasonable numbers for N,
P and L, this corresponds to quite a reduction in bit rate requirements.
[0041] One interesting aspect is when the pulse positions for the side signal are set equal
to the pulse positions of the main signal. Then there is no encoding of the pulse
positions needed and only encoding of the pulse amplitudes is needed. In the case
of algebraic code books with pulses having +1/-1 amplitudes, then only the signs (N
bits) need to be encoded.
[0042] If we denote by
PM(
i),
i =1,··
·n, the main signal pulse positions. The pulse positions of candidate excitation signals
for the side signal are selected based on the main signal pulse positions and possible
additional parameters. The additional parameters may consist of time delay between
the two channels and/or difference of adaptive codebook index.
[0043] In this embodiment, the set of pulse positions for the side signal candidate excitation
signal is constructed as

where
J(
i,k) denote some delay index. This means that each mono pulse position generates a set
of pulse positions used for constructing the candidate excitation signals for the
side signal pulse search procedure. This is illustrated in Fig. 3A. Here,
PM denotes the pulse positions of the excitation signal for the main signal, and

denotes possible pulse positions of the candidate excitation signals for the side
signal analysis.
[0044] This of course is optimal with highly correlated signals. For low correlated or uncorrelated
signals the inverse strategy would be adopted. This consists in taking the pulses
candidates as all pulses not belonging to the set

[0045] Since this is a complementary case, it is easily understood by those skilled in the
art that both strategies are similar and only the correlated case will be described
in more detail.
[0046] It is easily seen that the position and number of pulse candidates is dependent on
the delay index
J(
i,
k)
. The delay index may be made dependent on the effective delay between the two channels
and/or the adaptive codebook index. In Fig. 3A,
k max=3, and
J(
i,
k) =
J(
k)∈ {-1,0,+1}.
[0047] In Fig. 3B, another slightly different selection of pulse positions is made. Here
k max = 3 , but
J(
i,
k) =
J(
k)
∈ {0,+1,+2}.
[0048] Anyone skilled in the art realizes that the rules how to select the pulse positions
can be constructed in many various manners. The actual rule to use may be adapted
to the actual implementation. The important characteristics are, however, that the
pulse positions candidates are selected dependent on the pulse positions resulting
from the main signal analysis following a certain rule. This rule may be unique and
fixed or may be selected from a set of predetermined rules dependent on e.g. the degree
of correlation between the two channels and/or the delay between the two channels.
[0049] Dependent on the rule used, the set of pulse candidates of the side signal is constructed.
The set of the side signal pulse candidates is in general very small compared to the
entire frame length. This allows reformulating the objective maximization problem
based on a decimated frame.
[0050] In the general case, the pulses are searched by using, for example, the depth-first
algorithm described in [5] or by using an exhaustive search if the number of candidate
pulses is really small. However, even with a small number of candidates it is recommended
to use a fast search procedure.
[0051] A backward filtered signal is in general pre-computed using

[0052] The matrix Φ =
HTH is the matrix of correlations of
h(
n) (the impulse response of a weighting filter), elements of which are computed by

[0053] The objective function can therefore be written as

[0054] Given the set of possible candidate pulse positions on the side signal, only a subset
of indices of the backward filtered vector
d and the matrix Φ are needed. The set of candidate pulses can be sorted in ascending
order

are the candidate pulses positions and
p is their number. It should be noted that
p is always less than, and typically much less than, the frame length
L.
[0055] If we denote the decimated signal

[0056] And the decimated correlations matrix Φ
2 
[0057] Φ
2 is symmetric and is positive definite. We can directly write

where
c'
k' is the new algebraic code vector. The index becomes
k' which is a new entry in a reduced size codebook.
[0058] The summary of these decimation operations is illustrated in Fig. 4. In the top of
the figure, a reduction of an algebraic codebook 10 of ordinary size to a reduced
size codebook 10' is illustrated. In the middle, a reduction of a weighting filter
covariance matrix 60 of ordinary size to a reduced weighting filter covariance matrix
60' is illustrated. Finally, in the bottom part, a reduction of a backward filtered
target 62 of ordinary size to a reduced size backward filtered target 62' is illustrated.
Anyone skilled in the art realizes the reduction in complexity that is the result
of such a reduction.
[0059] Maximizing the objective function on the decimated signals has several advantages.
One of them is the reduction of memory requirements, for instance the matrix Φ
2 needs lower memory. Another advantage is the fact that because the main signal pulse
locations are in all cases transmitted to the receiver, the indices of the decimated
signals are always available to the decoder. This in turn allows the encoding of the
other signal (side) pulse positions relatively to the main signal pulse positions,
which consumes much less bits. Another advantage is the reduction in computational
complexity since the maximization is performed on decimated signals.
[0060] In Fig. 5A, an embodiment of a system of encoders 40A, 40B and decoders 50A, 50B
according to the present invention is illustrated. Many details are similar as those
illustrated in Fig. 2 and will therefore not be discussed in detail again, if their
functions are essentially unaltered. A main signal 33A
sm(
n) is provided to a first encoder 40A. The first encoder 40A operates according to
any prior art CELP encoding model, producing an index k
m for the fixed codebook and a delay measure δ
m for the adaptive codebook. The details of this encoding are not of any importance
for the present invention and is omitted in order to facilitate the understanding
of Fig. 5A. The parameters k
m and δ
m are encoded in a first index encoder 38A, giving representations k*
m and δ*
m of the parameters that are sent to a first decoder 50A. In the first decoder, the
representations k*
m and δ*
m are decoded into parameters k
m and δ
m in a first index decoder 53A. From these parameters, the original signal is reproduced
according to any CELP decoding model according to prior art. The details of this decoding
are not of any importance for the present invention and is omitted in order to facilitate
the understanding of Fig. 5A. A reproduced first output signal 21A
ŝm(
n) is provided.
[0061] A side signal 33B
sS(
n) is provided as an input signal to a second encoder 40B. The second encoder 40B is
to most parts similar as the encoder of Fig. 2. The signals are now given an index
"s" to distinguish them from any signals used for encoding the main signal. The second
encoder 40B comprises a CELP synthesis block 25
1. According to the present invention, the index k
m or a representation thereof is provided from the first encoder 40A to an input 45
of the fixed codebook 10 of the second encoder 40B. The index k
m is used by a candidate deriving means 47 to extract a reduced fixed codebook 10'
according to the above presented principles. The synthesis of the CELP synthesis block
25' of the second encoder 40B is thus based on indices k'
s representing excitation signals
c'k's (
n) from the reduced fixed codebook 10'. An index k'
s is thus found to represent a best choice of the CELP synthesis. The parameters k'
s and δ
s are encoded in a second index encoder 38B, giving representations k'*
s and δ*
s of the parameters that are sent to a second decoder 50B.
[0062] In the second decoder 50B, the representations k'*
s and δ*
s are decoded into parameters k'
s and δ
s in a second index decoder 53B. Furthermore, the index parameter k
m is available from the first decoder 50A and is provided to the input 55 of the fixed
codebook 10 of the second decoder 50B, in order to enable an extraction by a candidate
deriving means 57 of a reduced fixed codebook 10' equal to what was used in the second
encoder 40B. From the parameters k'
s and δ
s and the reduced fixed codebook 10', the original side signal is reproduced according
to ordinary CELP decoding models 25". The details of this decoding are performed essentially
in analogy with Fig. 2, but using the reduced fixed codebook 10' instead. A reproduced
side output signal 21B
ŝs (
n) is thus provided.
[0063] Selection of the rule to construct the set of candidate pulses, e.g. the indexing
function
J(
i,
k), can advantageously be made adaptive and dependent on additional inter-channel characteristics,
such as delay parameters, degree of correlation, etc. In this case, i.e. adaptive
rule selection, the encoder has preferably to transmit to the decoder which rule has
been selected for deriving the set of candidate pulses for encoding the other signal.
The rule selection could for instance be performed by a closed-loop procedure, where
a number of rules are tested and the one giving the best result finally is selected.
[0064] Fig. 5B illustrates an embodiment, using the rule selection approach. The mono signal
s
m(n) and preferably also the side signal s
s(n) are here additionally provided to a rule selecting unit 39. Alternatively to the
mono signal, the parameter k
m representing the mono signal can be used. In the rule selection unit 39, the signals
are analysed, e.g. with respect to delay parameters or degree of correlation. Depending
on the results, a rule, e.g. represented by an index r is selected from a set of predefined
rules. The index of the selected rule is provided to the candidate deriving means
47 for determining how the candidate sets should be derived. The rule index r is also
provided to the second index encoder 38B giving a representation r* of the index,
which subsequently is sent to the second decoder 50B. The second index decoder 53B
decodes the rule index r, which then is used to govern the operation of the candidate
deriving means 57.
[0065] In this manner, a set of rules can be provided, which will be suitable for different
types of signals. A further flexibility is thus achieved, just by adding a single
rule index in the transfer of data.
[0066] The specific rule used as well as the resulting number of candidate side signal pulses
are the main parameters governing the bit rate and the complexity of the algorithm.
[0067] The same principles as described above could equally well be applied for re-encoding
of one and the same channel. Fig. 6 illustrates an exemplary encoder and decoder,
where different parts of a transmission path allows for different bit rates. It is
thus applicable as part of a rate transcoding solution. A signal s(n) is provided
as an input signal 33A to a first encoder 40A, which produces representations k* and
δ* of parameters that are transmitted according to a first bit rate. At a certain
place, the available bit rate is reduced, and a re-encoding for lower bit-rates has
to be performed. A first decoder 50A uses the representations k* and δ* of parameters
for producing a reproduced signal 21A
ŝ(
n). This reproduced signal 21A
ŝ(
n) is provided to a second encoder 40B as an input signal 33B. Also the index k from
the first decoder 50A is provided to the second encoder 40B. The index k is in analogy
with Fig. 6 used for extracting a reduced fixed codebook 10'. The second encoder 40B
encodes the signal
ŝ(
n) for a lower bit rate, giving an index
k̂' representing the selected excitation signal
c'k̂, (
n). However, this index
k̂' is of little use in a distant decoder, since the decoder does not have the information
necessary to construct a corresponding reduced fixed codebook. The index k' thus has
to be associated with an index
k̂, referring to the original codebook 10. This is preferably performed in connection
with the fixed codebook 10 and is represented in Fig. 6 by the arrows 41 and 43 illustrating
the input of k' and the output of
k̂. The encoding of the index
k̂ is then performed with reference to a full set of candidate excitation signals.
[0068] In a typical case, a first encoding is made with a bit rate n and the second encoding
is made with a bit rate m, where n>m.
[0069] In certain applications, for instance real-time transmission of live content through
different types of networks with different capacities (for example teleconferencing),
it may also be of interest to provide parallel encodings with differing bit rates,
e.g. in situation where real time encoding of the same signal at several different
bit-rates is needed in order to accommodate the different types of networks, so-called
parallel multirate encoding. Fig. 7 illustrates an exemplary system, where a signal
s(n) is provided to both a first encoder 40A and a second encoder 40B. In analogy
with previous exemplary encoder and decoder, the second encoder provides a reduced
fixed codebook 10' based on an index k
a representing the first encoding. The second encoding is here denoted by the index
"b". The second encoder 40B thus becomes independent of the first decoder 50B. Most
other parts are in analogy with Fig. 6, however, with adapted indexing.
[0070] For these two applications, re-encoding of the same signal at a lower rate, a substantial
reduction in complexity is offered thus allowing the implementation of these applications
with low cost hardware.
[0071] An embodiment of the above-described algorithm has been implemented in association
with an AMR-WB speech codec. For encoding a side signal, the same adaptive codebook
index is used as is used for encoding the mono excitation. The LTP gain as well as
the innovation vector gain was not quantized.
[0072] The algorithm for the algebraic codebook was based on the mono pulse positions. As
described in e.g. [6], the codebook may be structured in tracks. Except for the lowest
mode, the number of tracks is equal to 4. For each mode a certain number of pulses
positions is used. For example, for mode 5, i.e. 15.85kbps, the candidate pulse positions
are as follows
Table 1. Candidate pulse positions.
Track |
Pulse |
Positions |
1 |
i0, i4, i8 |
0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60 |
2 |
i1, i5, i9 |
1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61 |
3 |
i2, i6, i10 |
2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62 |
4 |
i3, i7, i11 |
3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, 55, 59, 63 |
[0073] The implemented algorithm retains all the mono pulses as the pulse positions of the
side signal, i.e. the pulse positions are not encoded. Only the signs of the pulses
are encoded.
Table 2. Side and mono signal pulses.
Track |
Side signal pulse |
Mono signal pulse |
1 |
p0, p4, p8 |
i0, i4, i8 |
2 |
p1, p5, p9 |
i1, i5, i9 |
3 |
p2, p6, p10 |
i2, i6, i10 |
4 |
p3, p7, p11 |
i3, i7, i11 |
[0074] Thus, each pulse will consume only 1 bit for encoding the sign, which leads to a
total bit rate equal to the number of mono pulses. In the above example, there are
12 pulses per sub-frame and this leads to a total bit rate equal to 12 bits x 4 x
50 = 2.4 kbps for encoding the innovation vector. This is the same number of bits
required for the very lowest AMR-WB mode (2 pulses for the 6.6kbps mode), but in this
case we have higher pulses density.
[0075] It should be noted that no additional algorithmic delay is needed for encoding the
stereo signal.
[0076] Fig. 8 shows the results obtained with PEAQ [4] for evaluating the perceptual quality.
PEAQ has been chosen since to the best knowledge, it is the only tool that provides
objective quality measures for stereo signals. From the results, it is clearly seen
that the stereo 100 does in fact provide a quality lift with respect to the mono signal
102. The used sound items were quite various, sound 1, S1, is an extract from a movie
with background noise, sound 2, S2, is a 1 min radio recording, sound 3, S3, a cart
racing sport event, and sound 4, S4, is a real two microphone recording.
[0077] Fig. 9 illustrates an embodiment of an encoding method according to the present invention.
The procedure starts in step 200. In step 210, a representation of a CELP excitation
signal for a first audio signal is provided. Note that it is not absolutely necessary
to provide the entire first audio signal, just the representation of the CELP excitation
signal. In step 212, a second audio signal is provided, which is correlated with the
first audio signal. A set of candidate excitation signals is derived in step 214 depending
on the first CELP excitation signal. Preferably, the pulse positions of the candidate
excitation signals are related to the pulse positions of the CELP excitation signal
of the first audio signal. In step 216, a CELP encoding is performed on the second
audio signal, using the reduced set of candidate excitation signals derived in step
214. Finally, the representation, i.e. typically an index, of the CELP excitation
signal for the second audio signal is encoded, using references to the reduced candidate
set. The procedure ends in step 299.
[0078] Fig. 10 illustrates an exemplary re-encoding method. The procedure starts in step
200. In step 211, an audio signal is provided. In step 213, a representation of a
first CELP excitation signal for the same audio signal is provided. A set of candidate
excitation signals is derived in step 215 depending on the first CELP excitation signal.
Preferably, the pulse positions of the candidate excitation signals are related to
the pulse positions of the CELP excitation signal of the first audio signal. In step
217, a CELP re-encoding is performed on the audio signal, using the reduced set of
candidate excitation signals derived in step 215. Finally, the representation, i.e.
typically an index, of the second CELP excitation signal for the audio signal is encoded,
using references to the nonreduced candidate set, i.e. the set used for the first
CELP encoding. The procedure ends in step 299.
[0079] Fig. 11 illustrates an embodiment of a decoding method according to the present invention.
The procedure starts in step 200. In step 210, a representation of a first CELP excitation
signal for a first audio signal is provided. In step 252, a representation of a second
CELP excitation signal for a second audio signal is provided. In step 254, a second
excitation signal is derived from the second excitation signal and with knowledge
of the first excitation signal. Preferably, a reduced set of candidate excitation
signals is derived depending on the first CELP excitation signal, from which a second
excitation signal is selected by use of an index for the second CELP excitation signal.
In step 256, the second audio signal is reconstructed using the second excitation
signal. The procedure ends in step 299.
[0080] The embodiments described above are to be understood as a few illustrative examples
of the present invention. It will be understood by those skilled in the art that various
modifications, combinations and changes may be made to the embodiments without departing
from the scope of the present invention. In particular, different part solutions in
the different embodiments can be combined in other configurations, where technically
possible. The scope of the present invention is, however, defined by the appended
claims.
[0081] The invention allows a dramatic reduction of complexity (both memory and arithmetic
operations) as well as bit-rate when encoding multiple audio channels by using algebraic
codebooks and CELP.
REFERENCES
[0082]
- [1] H. Fuchs, "Improving joint stereo audio coding by adaptive inter-channel prediction",
in Proc. IEEE WASPAA, Mohonk, NY, Oct. 1993.
- [2] S.A. Ramprashad, "Stereophonic CELP coding using cross channel prediction", in Proc.
IEEE Workshop Speech Coding, pp. 136-138, Sept. 2000.
- [3] T. Liebschen, "Lossless audio coding using adaptive multichannel prediction", in
Proc. AES 113th Conv., Los Angeles, CA, Oct. 2002.
- [4] ITU-R BS.1387
- [5] WO 96/28810.
- [6] 3GPP TS 26.190, p. 28, table 7
- [7] US 2004/0044524 A1
- [8] US 2004/0109471 A1
- [9] US 2003/0191635 A1
- [10] US 6,393,392 B1
- [11]US 6,192,334 B1
1. Method for encoding audio signals, comprising the steps of:
providing a representation (k, km, ka) of a first excitation signal of a code excited linear prediction of a first audio
signal (33A)characterised by,
deriving a set (10') of candidate excitation signals (c'(n)) based on said first excitation
signal,
providing a second audio signal (33B), said second audio signal (33B) being different
from said first audio signal (33A) characterised by
performing a code excited linear prediction encoding of said second audio signal (33B)
using said set (10') of candidate excitation signals (c'(n)).
2. Method according to claim 1, wherein said second audio signal (33B) is correlated
to said first audio signal (33A).
3. Method according to claim 1 or 2, wherein said step of deriving said set (10') of
candidate excitation signals (c'(n)) comprises selecting a rule out of a predetermined
set of rules based on said first excitation signal and/or said second audio signal,
whereby said set (10') of candidate excitation signals (c'(n)) is derived according
to said selected rule.
4. Method according to any of the claims 1 to 3, wherein
said first excitation signal has n pulse locations (PM) out of a set of N possible pulse locations;
said candidate excitation signals (c'(n)) have pulse locations (P*s) only at a subset
of said N possible pulse locations; and
said subset of pulse locations (P*s) being selected based on the n pulse locations
(PM) of said first excitation signal.
5. Method according to claim 4, wherein pulse locations (P*s) of said subset of pulse
locations are positioned at positions pj, where index j is within intervals {i+L, i+K}, where i is an index of said n pulse
locations, K and L are integers and K>L.
6. Method according to claim 5, wherein K=1 and L=-1.
7. Method according to any of the claims 1 to 6, wherein said code excited linear prediction
of said second audio signal (33B) is performed with a global search within said set
(10') of candidate excitation signals.
8. Method according to any of the claims 1 to 7, comprising the further steps of:
encoding a second excitation signal of said code excited linear prediction of said
second audio signal (33B) with reference to said set (10') of candidate excitation
signals; and
providing said encoded second excitation signal together with said representation
(k, km, ka) of said first excitation signal.
9. Method according to claim 3 and claim 8, comprising the further step of providing
data representing an identification of said selected rule together with said representation
(k, km, ka) of said first excitation signal.
10. Method according to any of the claims 1 to 7, comprising the further step of:
encoding a second excitation signal of said code excited linear prediction of said
second audio signal (33B) with reference to a set (10) of candidate excitation signals
having N possible pulse locations.
11. Method according to any of the claims 4 to 10, wherein the second excitation signal
has m pulse locations, where m<n.
12. Method for decoding of audio signals (33A, 33B), comprising the step of:
providing a representation (k, km, ka) of a first excitation signal of a code excited linear prediction of a first audio
signal (33A),
providing a representation (k's) of a second excitation signal of a code excited linear prediction of a second audio
signal (33B), said second audio signal (33B) being different from said first audio
signal (33A);
said second excitation signal being one of a set (10') of candidate excitation signals;
said set (10') of candidate excitation signals being based on said first excitation
signal;
deriving said second excitation signal (c'k's (n)) from said representation (k's) of said second excitation signal and based on information related to said set (10')
of candidate excitation signals; and
reconstructing said second audio signal (ŝs(n)) by prediction filtering said second excitation signal (c'k's (n)).
13. Method according to claim 12, wherein said second audio signal (33B) is correlated
to said first audio signal (33A) .
14. Method according to claim 12 or 13, wherein said information related to said set (10')
of candidate excitation signals comprises identification of a rule out of a pre-determined
set of rules, said rule determining derivation of said set (10') of candidate excitation
signals.
15. Method according to any of the claims 12 to 14, wherein
said first excitation signal has n pulse locations (PM) out of a set of N possible pulse locations;
said candidate excitation signals have pulse locations (P*s) only at a subset of said
N possible pulse locations; and
said subset of pulse locations (P*s) being selected based on the n pulse locations
(PM) of said first excitation signal.
16. Method according to claim 15, wherein pulse locations (P*s) of said subset of pulse
locations are positioned at positions pj, where index j is within intervals {i+L, i+K}, where i is an index of said n pulse
locations, K and L are integers and K>L.
17. Method according to claim 16, wherein K=1 and L=-1.
18. Encoder (40B) for audio signals, comprising:
means (45) for providing a representation (k, km, ka) of a first excitation signal of a code excited linear prediction of a first audio
signal (33A)characterised by,
means (47) for deriving a set (10') of candidate excitation signals, connected to
receive said representation (k, km, ka) of said first excitation signal, said set (10') of candidate excitation signals
being based on said first excitation signal,
means for providing a second audio signal (33B), said second audio signal (33B) being
different from said first audio signal (33A), characterised by
means (25') for performing a code excited linear prediction connected to receive said
second audio signal (33B) and a representation of said set (10') of candidate excitation
signals, said means (25') for performing a code excited linear prediction being arranged
for performing a code excited linear prediction of said second audio signal (33B)
using said set (10') of candidate excitation signals.
19. Encoder according to claim 18, wherein said second audio signal (33B) is correlated
to said first audio signal (33A).
20. Encoder according to claim 18 or 19, wherein said means (47) for deriving a set (10')
of candidate excitation signals is arranged to select a rule out of a predetermined
set of rules based on said first excitation signal and/or said second audio signal
and to derive said set (10') of candidate excitation signals (c'(n)) according to
said selected rule.
21. Encoder according to any of the claims 18 to 20, wherein
said first excitation signal has n pulse locations (PM) out of a set of N possible pulse locations;
said candidate excitation signals have pulse locations (P*s) only at a subset of said
N possible pulse locations; and
said subset of pulse locations (P*s) being selected based on the n pulse locations
(PM) of said first excitation signal.
22. Encoder according to claim 21, wherein pulse locations (P*s) of said subset of pulse
locations are positioned at positions pj, where index j is within intervals {i+L, i+K}, where i is an index of said n pulse
locations, K and L are integers and K>L.
23. Encoder according to claim 22, wherein K=1 and L=-1.
24. Encoder according to any of the claims 18 to 23, wherein said means (25') for performing
code excited linear prediction of said second audio signal (33B) is arranged to perform
a global search within said set (10') of candidate excitation signals.
25. Encoder according to any of the claims 18 to 24, further comprising:
means (38B) for encoding a second excitation signal of said code excited linear prediction
of said second audio signal (33B) with reference to said set (10') of candidate excitation
signals; and
means for providing said encoded second excitation signal together with said representation
(k, km, ka) of said first excitation signal.
26. Encoder according to claim 25 and 20, further comprising:
means for providing data representing an identification of said selected rule together
with said representation (k, km, ka) of said first excitation signal.
27. Encoder according to any of the claims 18 to 24, further comprising:
means (38B) for encoding a second excitation signal of said code excited linear prediction
of said second audio signal (33B) with reference to a set (10) of candidate excitation
signals having N possible pulse locations.
28. Encoder according to any of the claims 21 to 27, wherein the second excitation signal
has m pulse locations, where m<n.
29. Decoder (50B) for audio signals, comprising:
means (55) for providing a representation (km) of a first excitation signal of a code excited linear prediction of a first audio
signal (33A),
means (53B) for providing a representation (k's) of a second excitation signal of a code excited linear prediction of a second audio
signal (33B), said second audio signal (33B) being different from said first audio
signal (33A);
said second excitation signal being one of a set (10') of candidate excitation signals;
said set (10') of candidate excitation signals being based on said first excitation
signal;
means (57) for deriving said second excitation signal, connected to receive information
associated with said representation (km) of a first excitation signal and said representation (k's) of said second excitation signal, said means (57) for deriving being arranged to
derive said second excitation signal (c'k's (n)) from said representation (k's) of said second excitation signal and based on information related to said set (10')
of candidate excitation signals; and
means (25") for reconstructing said second audio signal (ŝs(n)) by prediction filtering said second excitation signal (c'k's (n)).
30. Decoder according to claim 29, wherein said second audio signal (33B) is correlated
to said first audio signal (33A).
31. Decoder according to claim 29 or 30, wherein said information related to said set
(10') of candidate excitation signals comprises identification of a rule out of a
pre-determined set of rules, said rule determining derivation of said set (10') of
candidate excitation signals.
32. Decoder according to any of the claims 29 to 31, wherein
said first excitation signal has n pulse locations (PM) out of a set of N possible pulse locations;
said candidate excitation signals have pulse locations (P*s) only at a subset of said
N possible pulse locations; and
said subset of pulse locations (P*s) being selected based on the n pulse locations
(PM) of said first excitation signal.
33. Decoder according to claim 32, wherein pulse locations (PM) of said subset of pulse locations are positioned at positions pj, where index j is within intervals {i+L, i+K}, where i is an index of said n pulse
locations, K and L are integers and K>L.
34. Decoder according to claim 33, wherein K=1 and L=-1.
1. Verfahren zum Codieren von Audiosignalen, folgende Schritte umfassend:
Bereitstellen einer Repräsentation (k, km, ka) eines ersten Anregungssignals einer codeangeregten linearen Prädiktion eines ersten
Audiosignals (33A),
Ableiten einer Menge (10') von Anregungssignal-Kandidaten (c'(n)) auf der Basis des
ersten Anregungssignals,
Bereitstellen eines zweiten Audiosignals (33B), wobei das zweite Audiosignal (33B)
vom ersten Audiosignal (33A) verschieden ist,
gekennzeichnet durch
Ausführen einer codeangeregten linearen Prädiktionscodierung des zweiten Audiosignals
(33B) unter Verwendung der Menge (10') von Anregungssignal-Kandidaten (c' (n)).
2. Verfahren nach Anspruch 1, worin das zweite Audiosignal (33B) mit dem ersten Audiosignal
(33A) korreliert ist.
3. Verfahren nach Anspruch 1 oder 2, worin der Schritt des Ableitens der Menge (10')
von Anregungssignal-Kandidaten (c'(n)) das Auswählen einer Regel aus einer vorbestimmten
Menge von Regeln auf der Basis des ersten Anregungssignals und/oder des zweiten Audiosignals
umfasst, wodurch die Menge (10') von Anregungssignal-Kandidaten (c'(n)) gemäß der
ausgewählten Regel abgeleitet ist.
4. Verfahren nach einem der Ansprüche 1 bis 3, worin
das erste Anregungssignal n Impulspositionen (PM) aus einer Menge von N möglichen Impulspositionen hat;
die Anregungssignal-Kandidaten (c'(n)) Impulspositionen (P*s) nur in einer Teilmenge
der N möglichen Impulspositionen haben; und
die Teilmenge von Impulspositionen (P*s) auf der Basis der n Impulspositionen (PM) des ersten Anregungssignals ausgewählt wird.
5. Verfahren nach Anspruch 4, worin Impulspositionen (P*s) der Teilmenge von Impulspositionen
an Positionen pj positioniert sind, wo der Index j innerhalb der Intervalle {i+L, i+K} liegt, wo i
ein Index der n Impulspositionen ist, wo K und L Ganzzahlen sind und K>L.
6. Verfahren nach Anspruch 5, worin K=1 und L=-1.
7. Verfahren nach einem der Ansprüche 1 bis 6, worin die codeangeregte lineare Prädiktion
des zweiten Audiosignals (33B) mit einer Globalsuche innerhalb der Menge (10') von
Anregungssignal-Kandidaten ausgeführt wird.
8. Verfahren nach einem der Ansprüche 1 bis 7, folgende weitere Schritte umfassend:
Codieren eines zweiten Anregungssignals der codeangeregten linearen Prädiktion des
zweiten Audiosignals (33B) mit Bezug auf die Menge (10') von Anregungssignal-Kandidaten;
und
Bereitstellen des codierten zweiten Anregungssignals zusammen mit der Repräsentation
(k, km, ka) des ersten Anregungssignals.
9. Verfahren nach Anspruch 3 und Anspruch 8, den weiteren Schritt umfassend, Daten bereitzustellen,
die eine Identifikation der ausgewählten Regel zusammen mit der Repräsentation (k,
km, ka) des ersten Anregungssignals repräsentieren.
10. Verfahren nach einem der Ansprüche 1 bis 7, den folgenden weiteren Schritt umfassend:
Codieren eines zweiten Anregungssignals der codeangeregten linearen Prädiktion des
zweiten Audiosignals (33B) mit Bezug auf eine Menge (10) von Anregungssignal-Kandidaten
mit N möglichen Impulspositionen.
11. Verfahren nach einem der Ansprüche 4 bis 10, worin das zweite Anregungssignal m Impulspositionen
hat mit m<n.
12. Verfahren zum Decodieren von Audiosignalen (33A, 33B), den folgenden Schritt umfassend:
Bereitstellen einer Repräsentation (k, km, ka) eines ersten Anregungssignals einer codeangeregten linearen Prädiktion eines ersten
Audiosignals (33A),
Bereitstellen einer Repräsentation (k's) eines zweiten Anregungssignals einer codeangeregten linearen Prädiktion eines zweiten
Audiosignals (33B), wobei das zweite Audiosignal (33B) vom ersten Audiosignal (33A)
verschieden ist;
wobei das zweite Anregungssignal eines einer Menge (10') von Anregungssignal-Kandidaten
ist;
wobei die Menge (10') von Anregungssignal-Kandidaten auf dem ersten Anregungssignal
basiert;
Ableiten des zweiten Anregungssignals (c'k's (n)) aus der Repräsentation (k's) des zweiten Anregungssignals und auf der Basis von Information, die die Menge (10')
von Anregungssignal-Kandidaten betrifft; und
Rekonstruieren des zweiten Audiosignals (Ŝs (n)) durch Prädiktionsfilterung des zweiten Anregungssignals (c'k's (n)).
13. Verfahren nach Anspruch 12, worin das zweite Audiosignal (33B) mit dem ersten Audiosignal
(33A) korreliert ist.
14. Verfahren nach Anspruch 12 oder 13, worin die die Menge (10') von Anregungssignal-Kandidaten
betreffende Information die Identifikation einer Regel aus einer vorbestimmten Menge
von Regeln umfasst, wobei die Regel die Ableitung der Menge (10') von Anregungssignal-Kandidaten
bestimmt.
15. Verfahren nach einem der Ansprüche 12 bis 14, worin
das erste Anregungssignal n Impulspositionen (PM) aus einer Menge von N möglichen Impulspositionen hat;
die Anregungssignal-Kandidaten Impulspositionen (P*s) nur in einer Teilmenge der N
möglichen Impulspositionen haben; und
die Teilmenge von Impulspositionen (P*s) auf der Basis der n Impulspositionen (PM) des ersten Anregungssignals ausgewählt wird.
16. Verfahren nach Anspruch 15, worin Impulspositionen (P*s) der Teilmenge von Impulspositionen
an Positionen pj positioniert sind, wo der Index j innerhalb der Intervalle {i+L, i+K} liegt, wo i
ein Index der n Impulspositionen ist, wo K und L Ganzzahlen sind und K>L.
17. Verfahren nach Anspruch 16, worin K=1 und L=-1.
18. Codierer (40B) für Audiosignale, umfassend:
Mittel (45) zum Bereitstellen einer Repräsentation (k, km, ka) eines ersten Anregungssignals einer codeangeregten linearen Prädiktion eines ersten
Audiosignals (33A),
Mittel (47) zum Ableiten einer Menge (10') von Anregungssignal-Kandidaten, angeschlossen
zum Empfangen der Repräsentation (k, km, ka) des ersten Anregungssignals, wobei die Menge (10') von Anregungssignal-Kandidaten
auf dem ersten Anregungssignal basiert,
Mittel zum Bereitstellen eines zweiten Audiosignals (33B), wobei das zweite Audiosignal
(33B) vom ersten Audiosignal (33A) verschieden ist,
gekennzeichnet durch
Mittel (25') zum Ausführen einer codeangeregten linearen Prädiktion, angeschlossen
zum Empfangen des zweiten Audiosignals (33B) und einer Repräsentation der Menge (10')
von Anregungssignal-Kandidaten, wobei das Mittel (25') zum Ausführen einer codeangeregten
linearen Prädiktion dazu angeordnet ist, eine codeangeregte lineare Prädiktion des
zweiten Audiosignals (33B) unter Verwendung der Menge (10') von Anregungssignal-Kandidaten
auszuführen.
19. Codierer nach Anspruch 18, worin das zweite Audiosignal (33B) mit dem ersten Audiosignal
(33A) korreliert ist.
20. Codierer nach Anspruch 18 oder 19, worin das Mittel (47) zum Ableiten einer Menge
(10') von Anregungssignal-Kandidaten dazu angeordnet ist, eine Regel aus einer vorbestimmten
Menge von Regeln auf der Basis des ersten Anregungssignals und/oder des zweiten Audiosignals
auszuwählen und die Menge (10') von Anregungssignal-Kandidaten (c'(n)) gemäß der ausgewählten
Regel abzuleiten.
21. Codierer nach einem der Ansprüche 18 bis 20, worin
das erste Anregungssignal n Impulspositionen (PM) aus einer Menge von N möglichen Impulspositionen hat;
die Anregungssignal-Kandidaten Impulspositionen (P*s) nur in einer Teilmenge der N
möglichen Impulspositionen haben; und
die Teilmenge von Impulspositionen (P*s) auf der Basis der n Impulspositionen (PM) des ersten Anregungssignals ausgewählt wird.
22. Codierer nach Anspruch 21, worin Impulspositionen (P*s) der Teilmenge von Impulspositionen
an Positionen pj positioniert sind, wo der Index j innerhalb der Intervalle {i+L, i+K} liegt, wo i
ein Index der n Impulspositionen ist, wo K und L Ganzzahlen sind und K>L.
23. Codierer nach Anspruch 22, worin K=1 und L=-1.
24. Codierer nach einem der Ansprüche 18 bis 23, worin das Mittel (25') zum Ausführen
von codeangeregter linearer Prädiktion des zweiten Audiosignals (33B) dazu angeordnet
ist, eine Globalsuche innerhalb der Menge (10') von Anregungssignal-Kandidaten auszuführen.
25. Codierer nach einem der Ansprüche 18 bis 24, außerdem umfassend:
Mittel (38B) zum Codieren eines zweiten Anregungssignals der codeangeregten linearen
Prädiktion des zweiten Audiosignals (33B) mit Bezug auf die Menge (10') von Anregungssignal-Kandidaten;
und
Mittel zum Bereitstellen des codierten zweiten Anregungssignals zusammen mit der Repräsentation
(k, km, ka) des ersten Anregungssignals.
26. Codierer nach Anspruch 25 und 20, außerdem umfassend:
Mittel zum Bereitstellen von Daten, die eine Identifikation der ausgewählten Regel
zusammen mit der Repräsentation (k, km, ka) des ersten Anregungssignals repräsentieren.
27. Codierer nach einem der Ansprüche 18 bis 24, außerdem umfassend:
Mittel (38B) zum Codieren eines zweiten Anregungssignals der codeangeregten linearen
Prädiktion des zweiten Audiosignals (33B) mit Bezug auf eine Menge (10) von Anregungssignal-Kandidaten
mit N möglichen Impulspositionen.
28. Codierer nach einem der Ansprüche 21 bis 27, worin das zweite Anregungssignal m Impulspositionen
hat und m<n.
29. Decodierer (50B) für Audiosignale, umfassend:
Mittel (55) zum Bereitstellen einer Repräsentation (km) eines ersten Anregungssignals einer codeangeregten linearen Prädiktion eines ersten
Audiosignals (33A),
Mittel (53B) zum Bereitstellen einer Repräsentation (k's) eines zweiten Anregungssignals einer codeangeregten linearen Prädiktion eines zweiten
Audiosignals (33B), wobei das zweite Audiosignal (33B) vom ersten Audiosignal (33A)
verschieden ist;
wobei das zweite Anregungssignal eines einer Menge (10') von Anregungssignal-Kandidaten
ist;
wobei die Menge (10') von Anregungssignal-Kandidaten auf dem ersten Anregungssignal
basiert;
Mittel (57) zum Ableiten des zweiten Anregungssignals, angeschlossen zum Empfangen
von Information, die mit der Repräsentation (km) eines ersten Anregungssignals und der Repräsentation (k's) des zweiten Anregungssignals assoziiert ist, wobei das Mittel (57) zum Ableiten
dazu angeordnet ist, das zweite Anregungssignal (c'k's (n)) aus der Repräsentation (k's) des zweiten Anregungssignals abzuleiten und auf der Basis von Information, die die
Menge (10') von Anregungssignal-Kandidaten betrifft; und
Mittel (25") zum Rekonstruieren des zweiten Audiosignals (ŝs(n))durch Prädiktionsfilterung des zweiten Anregungssignals (c'k's (n)).
30. Decodierer nach Anspruch 29, worin das zweite Audiosignal (33B) mit dem ersten Audiosignal
(33A) korreliert ist.
31. Decodierer nach Anspruch 29 oder 30, worin die Information, die die Menge (10') von
Anregungssignal-Kandidaten betrifft, Identifikation einer Regel aus einer vorbestimmten
Menge von Regeln umfasst, wobei die Regel die Ableitung der Menge (10') von Anregungssignal-Kandidaten
bestimmt.
32. Decodierer nach einem der Ansprüche 29 bis 31, worin
das erste Anregungssignal n Impulspositionen (PM) aus einer Menge von N möglichen Impulspositionen hat;
die Anregungssignal-Kandidaten Impulspositionen (P*s) nur in einer Teilmenge der N
möglichen Impulspositionen haben; und
die Teilmenge von Impulspositionen (P*s) auf der Basis der n Impulspositionen (PM) des ersten Anregungssignals ausgewählt wird.
33. Decodierer nach Anspruch 32, worin Impulspositionen (PM) der Teilmenge von Impulspositionen an Positionen pj positioniert sind, wo der Index j innerhalb der Intervalle {i+L, i+K} liegt, wo i
ein Index der n Impulspositionen ist, wo K und L Ganzzahlen sind und K>L.
34. Decodierer nach Anspruch 33, worin K=1 und L=-1 ist.
1. Procédé destiné à coder des signaux audio, comportant les étapes ci-dessous consistant
à :
fournir une représentation (k, km, ka) d'un premier signal d'excitation d'une prédiction linéaire à excitation par code
d'un premier signal audio (33A) ;
calculer un ensemble (10') de signaux d'excitation candidats (c'(n)) sur la base dudit
premier signal d' excitation ;
fournir un second signal audio (33B), ledit second signal audio (33B) étant distinct
dudit premier signal audio (33A),
caractérisé par l'étape ci-dessous consistant à :
mettre en oeuvre un codage par prédiction linéaire à excitation par code dudit second
signal audio (33B) en faisant appel audit ensemble (10') de signaux d'excitation candidats
(c'(n)).
2. Procédé selon la revendication 1, dans lequel ledit second signal audio (33B) est
corrélé avec ledit premier signal audio (33A).
3. Procédé selon la revendication 1 ou 2, dans lequel ladite étape consistant à calculer
ledit ensemble (10') de signaux d'excitation candidats (c'(n)) comporte l'étape consistant
à sélectionner une règle parmi un ensemble de règles prédéterminé sur la base dudit
premier signal d'excitation et/ou dudit second signal audio, moyennant quoi ledit
ensemble (10') de signaux d'excitation candidats (c'(n)) est obtenu selon ladite règle
sélectionnée.
4. Procédé selon l'une quelconque des revendications 1 à 3, dans lequel
ledit premier signal d'excitation présente n emplacements d'impulsion (PM) sur un ensemble de N emplacements d'impulsion possibles ;
lesdits signaux d'excitation candidats (c' (n)) présentent des emplacements d'impulsion
(P*s) uniquement au niveau d'un sous-ensemble desdits N emplacements d'impulsion possibles
; et
ledit sous-ensemble d'emplacements d'impulsion (P*s) étant sélectionné sur la base des n emplacements d'impulsion (PM) dudit premier signal d'excitation.
5. Procédé selon la revendication 4, dans lequel les emplacements d'impulsion (P*s) dudit sous-ensemble d'emplacements d'impulsion sont positionnés au niveau de positions
pj, où l'indice j se situe au sein des intervalles {i + L, i + K}, où i est un indice
desdits n emplacements d'impulsion, K et L sont des nombres entiers et K > L.
6. Procédé selon la revendication 5, dans lequel K = 1 et L = -1.
7. Procédé selon l'une quelconque des revendications 1 à 6, dans lequel ladite prédiction
linéaire à excitation par code dudit second signal audio (33B) est mise en oeuvre
avec une recherche globale dans ledit ensemble (10') de signaux d'excitation candidats.
8. Procédé selon l'une quelconque des revendications 1 à 7, comportant en outre les étapes
ci-dessous consistant à :
coder un second signal d'excitation de ladite prédiction linéaire à excitation par
code dudit second signal audio (33B) en référence audit ensemble (10') de signaux
d'excitation candidats ; et
fournir ledit second signal d'excitation codé avec ladite représentation (k, km, ka) dudit premier signal d'excitation.
9. Procédé selon les revendications 3 et 8, comportant en outre l'étape consistant à
fournir des données représentant une identification de ladite règle sélectionnée avec
ladite représentation (k, km, ka) dudit premier signal d'excitation.
10. Procédé selon l'une quelconque des revendications 1 à 7, comportant en outre l'étape
consistant à :
coder un second signal d'excitation de ladite prédiction linéaire à excitation par
code dudit second signal audio (33B) en référence à un ensemble (10) de signaux d'excitation
candidats présentant N emplacements d'impulsion possibles.
11. Procédé selon l'une quelconque des revendications 4 à 10, dans lequel le second signal
d'excitation présente m emplacements d'impulsion, où m < n.
12. Procédé destiné à décoder des signaux audio (33A, 33B), comportant les étapes ci-dessous
consistant à :
fournir une représentation (k, km, ka) d'un premier signal d'excitation d'une prédiction linéaire à excitation par code
d'un premier signal audio (33A) ;
fournir une représentation (k's) d'un second signal d'excitation d'une prédiction linéaire à excitation par code
d'un second signal audio (33B), ledit second signal audio (33B) étant distinct dudit
premier signal audio (33A) ;
ledit second signal d'excitation étant l'un d'un ensemble (10') de signaux d'excitation
candidats ;
ledit ensemble (10') de signaux d'excitation candidats étant basé sur ledit dudit
premier signal d' excitation ;
calculer ledit second signal d'excitation (C'K's, (n)) à partir de ladite représentation (k's) dudit second signal d'excitation et sur la base d'informations connexes audit ensemble
(10') de signaux d'excitation candidats ; et
reconstruire ledit second signal audio (Ŝs (n)) par le biais d'un filtrage de prédiction dudit second signal d'excitation (C'K's, (n)).
13. Procédé selon la revendication 12, dans lequel ledit second signal audio (33B) est
corrélé avec ledit premier signal audio (33A).
14. Procédé selon la revendication 12 ou 13, dans lequel lesdites informations connexes
audit ensemble (10') de signaux d'excitation candidats comportent une identification
d'une règle parmi un ensemble de règles prédéterminé, ladite règle déterminant le
calcul dudit ensemble (10') de signaux d'excitation candidats.
15. Procédé selon l'une quelconque des revendications 12 à 14, dans lequel
ledit premier signal d'excitation présente n emplacements d'impulsion (PM) sur un ensemble de N emplacements d'impulsion possibles ;
lesdits signaux d'excitation candidats présentent des emplacements d'impulsion (P*s) uniquement au niveau d'un sous-ensemble desdits N emplacements d'impulsion possibles
; et
ledit sous-ensemble d'emplacements d'impulsion (P*s) étant sélectionné sur la base des n emplacements d'impulsion (PM) dudit premier signal d'excitation.
16. Procédé selon la revendication 15, dans lequel les emplacements d'impulsion (P*s) dudit sous-ensemble d'emplacements d'impulsion sont positionnés au niveau de positions
pj, où l'indice j se situe au sein des intervalles {i + L, i + K}, où i est un indice
desdits n emplacements d'impulsion, K et L sont des nombres entiers et K > L.
17. Procédé selon la revendication 16, dans lequel K = 1 et L=-1.
18. Codeur (40B) de signaux audio, comportant :
un moyen (45) pour fournir une représentation (k, km, ka) d'un premier signal d'excitation d'une prédiction linéaire à excitation par code
d'un premier signal audio (33A) ;
un moyen (47) pour calculer un ensemble (10') de signaux d'excitation candidats, connecté
de manière à recevoir ladite représentation (k, km, ka) dudit premier signal d'excitation, ledit ensemble (10') de signaux d'excitation
candidats étant basé sur ledit dudit premier signal d'excitation,
un moyen pour fournir un second signal audio (33B), ledit second signal audio (33B)
étant distinct dudit premier signal audio (33A) ;
caractérisé par :
un moyen (25') pour mettre en oeuvre une prédiction linéaire à excitation par code,
connecté de manière à recevoir ledit second signal audio (33B) et une représentation
dudit ensemble (10') de signaux d'excitation candidats, ledit moyen (25') pour mettre
en oeuvre une prédiction linéaire à excitation par code étant agencé de manière à
mettre en oeuvre une prédiction linéaire à excitation par code dudit second signal
audio (33B) en faisant appel audit ensemble (10') de signaux d'excitation candidats.
19. Codeur selon la revendication 18, dans lequel ledit second signal audio (33B) est
corrélé avec ledit premier signal audio (33A).
20. Codeur selon la revendication 18 ou 19, dans lequel ledit moyen (47) pour calculer
un ensemble (10') de signaux d'excitation candidats est agencé de manière à sélectionner
une règle parmi un ensemble de règles prédéterminé sur la base dudit premier signal
d'excitation et/ou dudit second signal audio et à calculer ledit ensemble (10') de
signaux d'excitation candidats (c'(n)) selon ladite règle sélectionnée.
21. Codeur selon l'une quelconque des revendications 18 à 20, dans lequel
ledit premier signal d'excitation présente n emplacements d'impulsion (PM) sur un ensemble de N emplacements d'impulsion possibles ;
lesdits signaux d'excitation candidats présentent des emplacements d'impulsion (P*s) uniquement au niveau d'un sous-ensemble desdits N emplacements d'impulsion possibles
; et
ledit sous-ensemble d'emplacements d'impulsion (P*s) étant sélectionné sur la base des n emplacements d'impulsion (PM) dudit premier signal d'excitation.
22. Codeur selon la revendication 21, dans lequel les emplacements d'impulsion (P*s) dudit sous-ensemble d'emplacements d'impulsion sont positionnés au niveau de positions
pj, où l'indice j se situe au sein des intervalles {i + L, i + K}, où i est un indice
desdits n emplacements d'impulsion, K et L sont des nombres entiers et K > L.
23. Codeur selon la revendication 22, dans lequel K = 1 et L= -1.
24. Codeur selon l'une quelconque des revendications 18 à 23, dans lequel ledit moyen
(25') pour mettre en oeuvre une prédiction linéaire à excitation par code dudit second
signal audio (33B) est agencé de manière à mettre en oeuvre une recherche globale
dans ledit ensemble (10') de signaux d'excitation candidats.
25. Codeur selon l'une quelconque des revendications 18 à 24, comportant en outre :
un moyen (38B) pour coder un second signal d'excitation de ladite prédiction linéaire
à excitation par code dudit second signal audio (33B) en référence audit ensemble
(10') de signaux d'excitation candidats ; et
un moyen pour fournir ledit second signal d'excitation codé avec ladite représentation
(k, km, ka) dudit premier signal d'excitation.
26. Codeur selon les revendications 25 et 20, comportant en outre :
un moyen pour fournir des données représentant une identification de ladite règle
sélectionnée avec ladite représentation (k, km, ka) dudit premier signal d'excitation.
27. Codeur selon l'une quelconque des revendications 18 à 24, comportant en outre :
un moyen (38B) pour coder un second signal d'excitation de ladite prédiction linéaire
à excitation par code dudit second signal audio (33B) en référence à un ensemble (10)
de signaux d'excitation candidats présentant N emplacements d'impulsion possibles.
28. Codeur selon l'une quelconque des revendications 21 à 27, dans lequel le second signal
d'excitation présente m emplacements d'impulsion, où m < n.
29. Décodeur (50B) de signaux audio, comportant :
un moyen (55) pour fournir une représentation (km) d'un premier signal d'excitation d'une prédiction linéaire à excitation par code
d'un premier signal audio (33A) ;
un moyen (53B) pour fournir une représentation (k's) d'un second signal d'excitation d'une prédiction linéaire à excitation par code
d'un second signal audio (33B), ledit second signal audio (33B) étant distinct dudit
premier signal audio (33A) ;
ledit second signal d'excitation étant l'un d'un ensemble (10') de signaux d'excitation
candidats ;
ledit ensemble (10') de signaux d'excitation candidats étant basé sur ledit dudit
premier signal d' excitation ;
un moyen (57) pour calculer ledit second signal d'excitation, connecté de manière
à recevoir des informations associées à ladite représentation (km) d'un premier signal d'excitation et à ladite représentation (k's) dudit second signal d'excitation, ledit moyen (57) pour calculer étant agencé à
calculer ledit second signal d'excitation (C'K's, (n)) à partir de ladite représentation (k's) dudit second signal d'excitation et sur la base d'informations connexes audit ensemble
(10') de signaux d'excitation candidats ; et
un moyen (25") pour reconstruire ledit second signal audio (Ŝs (n)) par le biais d'un filtrage de prédiction dudit second signal d'excitation (C'K's, (n)).
30. Décodeur selon la revendication 29, dans lequel ledit second signal audio (33B) est
corrélé avec ledit premier signal audio (33A).
31. Décodeur selon la revendication 29 ou 30, dans lequel lesdites informations connexes
audit ensemble (10') de signaux d'excitation candidats comportent une identification
d'une règle parmi un ensemble de règles prédéterminé, ladite règle déterminant le
calcul dudit ensemble (10') de signaux d'excitation candidats.
32. Décodeur selon l'une quelconque des revendications 29 à 31, dans lequel
ledit premier signal d'excitation présente n emplacements d'impulsion (PM) sur un ensemble de N emplacements d'impulsion possibles ;
lesdits signaux d'excitation candidats présentent des emplacements d'impulsion (P*s) uniquement au niveau d'un sous-ensemble desdits N emplacements d'impulsion possibles
; et
ledit sous-ensemble d'emplacements d'impulsion (P*s) étant sélectionné sur la base des n emplacements d'impulsion (PM) dudit premier signal d'excitation.
33. Décodeur selon la revendication 32, dans lequel les emplacements d'impulsion (PM) dudit sous-ensemble d'emplacements d'impulsion sont positionnés au niveau de positions
pj, où l'indice j se situe au sein des intervalles {i + L, i + K}, où i est un indice
desdits n emplacements d'impulsion, K et L sont des nombres entiers et K > L.
34. Décodeur selon la revendication 33, dans lequel K = 1 et L = -1.