[0001] The invention relates to a method and to an apparatus for encoding or decoding a
speech and/or non-speech audio input signal.
Background
[0003] This wideband speech coder includes an embedded G.729 speech coder, which is used
permanently. Therefore the quality for music-like signals (non-speech) is not very
good. Although this coder uses transform coding techniques it is a speech coder.
[0005] This coder uses a principle structure similar to that of the above-mentioned coder.
The processing is based on time domain signals, which implies a difficult handling
of the delay in the core encoder/decoder (speech coder). Therefore the processing
is based on a common transform in order to reduce this problem. Again, the core coder
(i.e. the speech coder) is used permanently, which results in a non-optimal quality
for music like (non-speech) signals.
M. Purat, P. Noll, "A new orthonormal wavelet packet decomposition for audio coding
using frequency-varying modulated lapped transforms", IEEE ASSP Workshop on Applications
of Signal Processing to Audio and Acoustics, 1995, pp.183-186.
M. Purat, P. Noll, "Audio coding with a dynamic wavelet packet decomposition based
on frequency-varying modulated lapped transforms", IEEE International Conference on
Acoustics, Speech, and Signal Processing 1996, ICASSP 1996, vol.2, pp.1021-1024.
Invention
[0006] A disadvantage of the known audio/speech codecs is a clear dependency of the coding
quality on the types of content, i.e. music-like audio signals are best coded by audio
codecs and speech-like audio signals are best coded by speech codecs. No known codec
is holding a dominant position for mixed speech/music content.
[0007] A problem to be solved by the invention is to provide a good codec performance for
both, speech and music, and to further improve the codec performance for such mixed
signals. This problem is solved by the methods disclosed in claims 1 and 3. Apparatuses
that utilise these methods are disclosed in claims 2 and 4.
[0008] The inventive joined speech/audio codec uses speech coding techniques as well as
audio transform coding techniques.
[0009] Known transform-based audio coding processing is combined in an advantageous way
with linear prediction-based speech coding processing using one or more Modulated
Lapped Transform (MLT) at the codec input and one or more inverse Modulated Lapped
Transform (IMLT) at the codec output. The MLT output spectrum is separated into frequency
bins (low frequencies) assigned to the speech coding section of the codec, and the
remaining frequency bins (high frequencies) assigned to the transform-based coding
section of the codec, wherein the transform length at the codec input and output can
be switched signal adaptively.
[0010] As an alternative, in the transform-based coding/decoding sections the transform
length can be switched input signal adaptively.
[0011] The invention achieves a uniform good codec quality for both speech-like and music-like
audio signals, especially for very low bit rates but also for higher bit rates.
[0012] In principle, the inventive method is suited for encoding a speech and/or non-speech
audio input signal, including the steps:
- transforming successive and possibly overlapping sections of said input signal by
at least one initial MLT transform and splitting the resulting output frequency bins
into a low band signal and a remaining band signal;
- passing said low band signal to a speech/audio switching and through a speech coding/decoding
loop including at least one short first-type MLT transform, a speech encoding, a corresponding
speech decoding, and at least one short second-type MLT transform having a type opposite
than that of said first-type short MLT transform;
- quantising and encoding said remaining band signal, controlled by a psycho-acoustic
model that receives as its input said audio input signal;
- combining the output signal of said quantising and encoding, a switching information
signal of said switching, possibly the output signal of said speech encoding, and
optionally other encoding side information, in order to form for said current section
of said input signal an output bit stream,
wherein said speech/audio switching receives said low band signal and a second input
signal derived from the output of said short second-type MLT transform and decides,
whether said second input signal bypasses said quantising and encoding step or said
low band signal is coded together with said remaining band signal in said quantising
and encoding step, and wherein in the latter case said output signal of said speech
encoding is not included in the current section of said output bit stream.
[0013] In principle the inventive apparatus is suited for encoding a speech and/or non-speech
audio input signal, said apparatus including means being adapted for:
- transforming successive and possibly overlapping sections of said input signal by
at least one initial MLT transform and splitting the resulting output frequency bins
into a low band signal and a remaining band signal;
- passing said low band signal to a speech/audio switching and through a speech coding/decoding
loop including at least one short first-type MLT transform, a speech encoding, a corresponding
speech decoding, and at least one short second-type MLT transform having a type opposite
than that of said first-type short MLT transform;
- quantising and encoding said remaining band signal, controlled by a psycho-acoustic
model that receives as its input said audio input signal;
- combining the output signal of said quantising and encoding, a switching information
signal of said switching, possibly the output signal of said speech encoding, and
optionally other encoding side information, in order to form for said current section
of said input signal an output bit stream,
wherein said speech/audio switching receives said low band signal and a second input
signal derived from the output of said short second-type MLT transform and decides,
whether said second input signal bypasses said quantising and encoding step or said
low band signal is coded together with said remaining band signal in said quantising
and encoding step, and wherein in the latter case said output signal of said speech
encoding is not included in the current section of said output bit stream.
[0014] In principle, the inventive method is suited for decoding a bit stream representing
an encoded speech and/or non-speech audio input signal that was encoded according
to the above method, said decoding method including the steps:
- demultiplexing successive sections of said bitstream to regain the output signal of
said quantising and encoding, said switching information signal, possibly the output
signal of said speech encoding, and said encoding side information if present;
- if present in a current section of said bitstream, passing said output signal of said
speech encoding through a speech decoding and said short second-type MLT transform;
- decoding said output signal of said quantising and encoding, controlled by said encoding
side information if present, in order to provide for said current section a reconstructed
remaining band signal and a reconstructed low band signal;
- providing a speech/audio switching with said reconstructed low band signal and a second
input signal derived from the output of said second-type MLT transform, and passing
according to said switching information signal either said reconstructed low band
signal or said second input signal;
- inversely MLT transforming the output signal of said switching combined with said
reconstructed remaining band signal, and possibly overlapping successive sections,
in order to form a current section of the reconstructed output signal.
[0015] In principle the inventive apparatus is suited for decoding a bit stream representing
an encoded speech and/or non-speech audio input signal that was encoded according
to the above encoding method, said apparatus including means being adapted for:
- demultiplexing successive sections of said bitstream to regain the output signal of
said quantising and encoding, said switching information signal, possibly the output
signal of said speech encoding, and said encoding side information if present;
- if present in a current section of said bitstream, passing said output signal of said
speech encoding through a speech decoding and said short second-type MLT transform;
- decoding said output signal of said quantising and encoding, controlled by said encoding
side information if present, in order to provide for said current section a reconstructed
remaining band signal and a reconstructed low band signal;
- providing a speech/audio switching with said reconstructed low band signal and a second
input signal derived from the output of said second-type MLT transform, and passing
according to said switching information signal either said reconstructed low band
signal or said second input signal;
- inversely MLT transforming the output signal of said switching combined with said
reconstructed remaining band signal, and possibly overlapping successive sections,
in order to form a current section of the reconstructed output signal.
[0016] Advantageous additional embodiments of the invention are disclosed in the respective
dependent claims.
Drawings
[0017] Exemplary embodiments of the invention are described with reference to the accompanying
drawings, which show in:
- Fig. 1
- Block diagram of the inventive joint speech and audio coder;
- Fig. 2
- Higher time resolution processing in the 'quantisa- tion&coding' step/stage (short
block coding);
- Fig. 3
- Block diagram of the inventive joint speech and audio decoder;
- Fig. 4
- Higher time resolution processing in the 'decoding' step/stage (short block decoding);
- Fig. 5
- Block diagram of an other embodiment of the inventive joint speech and audio coder;
- Fig. 6
- Higher time resolution processing in the 'quantisa- tion&coding' step/stage (short
block coding) of the other embodiment;
- Fig. 7
- Block diagram of the inventive joint speech and audio decoder of the other embodiment;
- Fig. 8
- Higher time resolution processing in the 'decoding' step/stage (short block decoding)
of the other em- bodiment;
- Fig. 9
- Block diagram of a further embodiment of the inven- tive joint speech and audio coder
(short block cod- ing).
Exemplary embodiments
[0018] In the inventive joint speech and audio codec according to Fig. 1, known coding processing
for speech-like signals (linear prediction based speech coding processing, e.g. CELP,
ACELP, cf. ISO/IEC 14496-3, Subparts 2 and 3, and MPEG4-CELP) is combined with state-of-the-art
coding processing for general audio or music-like signals based on a time-frequency
transform, e.g. MDCT. The PCM audio input signal IS is transformed by a Modulated
Lapped Transform MLT having a pre-determined length in step/stage 10. As a special
processing of an MLT, e.g. a Modified Discrete Cosine Transform MDCT, is appropriate
for audio coding applications. The MDCT was first called by Princen and Bradley "Oddly-stacked
Time Domain Alias Cancellation Transform" and was published in
John P. Princen and Alan B. Bradley, "Analysis/synthesis filter bank design based
on time domain aliasing cancellation", IEEE Transactions on Acoustics Speech Sigal
Processing ASSP-34 (5), pp.1153-1161, 1986.
H.S.Malvar, "Signal processing with lapped transform", Artech House Inc., Norwood,
1992, and
M.Temerinac, B.Edler, "A unified approach to lapped or-thogonal transforms", IEEE
Transactions on Image Processing, Vol.1, No.1, pp.111-116, Januar 1992, called it Modulated Lapped Transform (MLT) and showed its relations to Lapped orthogonal
Transforms in general and also proved it to be a special case of a QMF Filter bank.
The Modified Discrete Cosine Transformation (MDCT) and the inverse MDCT (iMDCT) can
be regarded as a critically sampled filter-bank with perfect reconstruction properties.
The MDCT is calculated by:

[0019] At the MLT output the obtained spectrum is separated into frequency bins belonging
to the speech band (representing a low band signal) and the remaining bins (high frequencies)
representing a remaining band signal RBS. In step/stage 11 the speech band bins are
transformed back into time domain using the inverse MLT, e.g. an inverse MDCT, with
a short transform length with respect to the pre-determined length used in step/stage
10. The resulting time signal has a lower sampling frequency than the input time signal
and contains only the corresponding frequencies of the speech band bins. The theory
behind using only a subset of the MLT bins in an inverse MLT is described in the above-cited
1995 and 1996 Purat articles.
[0020] The generated time domain signal is then used as input signal for a speech encoding
step/stage 12. The output of the speech encoding can be transmitted in the output
bit stream OBS, depending on a decision made by a below-described speech/audio switch
15. The encoded 'speech' signal is decoded in a related speech decoding step/stage
13, and the decoded 'speech' signal is transformed back into frequency domain in step/stage
14 using the MLT corresponding to the inverse MLT of step/stage 11 (i.e. an 'opposite
type' MLT having the short length) in order to re-generate the speech band signal,
i.e. a reconstructed speech signal RSS. The difference signal DS between these frequency
bins and the original low frequency bins, as well as the original low frequency bins
signal, serve as input to the speech/audio switch 15. In that switch it is decided,
whether the original low frequency bins are coded together with the remaining high
frequency bins (this indicates that the coded 'speech' signal is not transmitted in
bit stream OBS), or the difference signal DS is coded together with the remaining
high frequency bins in a following quantisation&coding step/stage 16 (this indicates
that the coded 'speech' signal is transmitted in bit stream OBS). That switch may
be operated by using a rate-distortion optimisation. An information item SWI about
the decision of switch 15 is included in bit stream OBS for use in the decoding. In
this switch, but also in the other steps/stages, the different delays introduced by
the cascaded transforms are to be taken into account. The different delays can be
balanced using corresponding buffering for these steps/stages.
[0021] It is possible to use a mixture of original frequency bins and difference signal
frequency bins in the low frequency band as input to step/stage 16. In such case,
information about how that mixture is composed is conveyed to the decoding side.
[0022] In any case, the remaining frequency bins output by step/stage 10 (i.e. the high
frequencies) are processed in quantisation&coding step/stage 16.
[0023] In step/stage 16 an appropriate quantisation is used (e.g. like the quantisation
techniques used in AAC), and subsequently the quantised frequency bins are coded using
e.g. Huffman coding or arithmetic coding.
[0024] In case the speech/audio switch 15 decides that a music-like input signal is present
and therefore the speech coder/decoder or its output is not used at all, the original
frequency bins corresponding to the speech band are to be encoded (together with the
remaining frequency bins) in the quantisation&coding step/stage 16.
[0025] The quantisation&coding step/stage 16 is controlled by a psycho-acoustic model calculation
18 that exploits masking properties of the input signal IS for the quantisation. Therefore
side information SI can be transmitted in the bit stream multiplex to the decoder.
[0026] Switch 15 can also receive suitable control information (e.g. degree of tonality
or spectral flatness, or how noise-like the signal is) from psycho-acoustic model
step/stage 18.
[0027] A bit stream multiplexer step/stage 17 combines the output code (if present) of the
speech encoder 12, the switch information of switch 15, the output code of the quantisation&coding
step/stage 16, and optionally side information code SI, and provides the output bit
stream OBS.
[0028] As shown in Fig. 2, to achieve a higher time resolution in the transform-based coding,
at the input of the quantisation&coding step/stage 16 several small inverse MLT (matching
the type of MLT 10) can be used (e.g. inverse MDCT, iMDCT) for transforming 22 the
long output spectrum of the initial MLT 10 having high frequency resolution into several
shorter spectra with lower frequency resolution but higher time resolution. The inverse
MLT steps/stages 22 are arranged between a first grouping step/stage 21 and a second
grouping step/stage 23 and provide a doubled number of output values. Again the theory
behind this processing is described in the above-cited 1995 and 1996 Purat articles.
[0029] In the first grouping 21 several neighbouring MLT bins are combined and used as input
for the inverse MLTs 22. The number of combined MLT bins, which means the transform
length of the inverse MLT, defines the resulting time and frequency resolution, wherein
a longer inverse MLTs delivers a higher time resolution. In the following grouping
23, overlap/add is performed (optionally involving application of window functions)
and the output of the inverse MLTs applied on the same input spectrum is sorted such
that it results in several (the quantity depends on the size of the inverse MLTs)
temporally successive 'short block' spectra which are quantised and coded in step/stage
16.
[0030] The information about this 'short block coding' mode being used is included in the
side information SI. Optionally, multiple 'short block coding' modes with different
inverse MLT transform lengths can be used and signalled in SI. Thereby a non-uniform
time-frequency resolution over the short block spectra is facilitated, e.g. a higher
time resolution for high frequencies and a higher frequency resolution for low frequencies.
For instance, for the lowest frequencies the inverse MLT can get a length of 2 successive
frequency bins and for the highest frequencies the inverse MLT can get a length of
16 successive frequency bins. In case a non-uniform frequency resolution is chosen,
it is not possible to group e.g. 8 short block spectra. A different order of coding
the resulting frequency bins can be used, for example one 'spectrum' may contain not
only different frequency bins at a time, but also the same frequency bins at different
points in time may be included.
[0031] The input signal IS adaptive switching between the processing according to Fig. 1
and the processing according to Fig. 2 is controlled by psycho-acoustic model step/stage
18. For example, if from one frame to the following frame the signal energy in input
signal IS rises above a threshold (i.e. there is a transient in the input signal),
the processing according to Fig. 2 is carried out. In case the signal energy is below
that threshold, the processing according to Fig. 1 is carried out. This switching
information, too, is included in output bitstream OBS for a corresponding switching
in the decoding. The transform block sections can be weighted by a window function,
in particular in an overlapping manner, wherein the length of a window function corresponds
to the current transform length.
[0032] Analysis and synthesis windows can be identical, but need not. The functions of the
analysis an synthesis windows h
A(n) and h
S(n) must fulfil some constraints for the overlapping regions of successive blocks
i and i+1 in order to enable a perfect reconstruction:

[0033] A known window function type is the sine window:

[0034] A window with an improved far away rejection, but a broader main lobe is the OGG-window,
which is very similar to the Kaiser-Bessel derived window:

n = O...N-1
[0035] A further window function is disclosed in table 7.33. of the AC-3 audio coding standard.
[0036] In case of switching the transform length, transition window functions are used,
e.g. as described in
B.Edler, "Codierung von Audiosignalen mit überlappender Transformation und adaptiven
Fensterfunktionen", FREQUENZ, vol.43, pp.252-256, 1989, or as used in mp3 and described in the MPEG1 standard ISO/IEC 11172-3 in particular
section 2.4.3.4.10.3, or as in AAC (e.g. as described in the MPEG4 standard ISO/IEC
14496-3, Subpart 4).
[0037] In the inventive decoder in Fig. 3, the received or replayed bit stream OBS is demultiplexed
in a corresponding step/stage 37, thereby providing code (if present) for the speech
decoder 33, the switch information SWI for switch 35, the code and the switching information
for the decoding step/stage 36, and optionally side information code SI. In case the
speech subcoder 11,12,13,14 was used at encoding side for a current data frame, in
that current frame the corresponding encoded speech band frequency bins are correspondingly
reconstructed by the speech decoding step/stage 33 and the downstream MLT step/stage
34, thereby providing the reconstructed speech signal RSS. The remaining encoded frequency
bins are correspondingly decoded in decoding step/stage 36, whereby the encoder-side
quantisation operation is reversed correspondingly. The speech/audio switch 35 operates
corresponding to its operation at encoding side, controlled by switch information
SWI. In case the switch signal SWI indicates that a music-like input signal is present
in the current frame and therefore the speech coding/decoding was not used, the frequency
bins corresponding to the low band are decoded together with the remaining frequency
bins in the decoding step/stage 36, thereby providing the reconstructed remaining
band signal RRBS and the reconstructed low band signal RLBS.
[0038] The output signal or signals of step/stage 36 and of switch 35 are correspondingly
combined in inverse MLT (e.g. iMDCT) step/stage 30 and are synthesised in order to
provide the decoded output signal OS. In switch 35 and in the other steps/stages,
the different delays introduced by the cascaded transforms are to be taken into account.
The different delays can be balanced using corresponding buffering for these steps/stages.
[0039] In case the corresponding option was used at encoding side, not the frequency bins
of the combined signal CS, but the frequency bins of the reconstructed speech signal
RSS are used for the corresponding processing in switch 35 and in step/stage 30, i.e.
in step/stages 16 and 36, respectively, there is no coding/decoding at all of the
low band spectrum. In case at encoding side the 'short block mode' encoding was used
to achieve a higher time resolution in the transform-based coding, the decoding in
step/stage 36 of the 'short block mode' is illustrated in Fig. 4. According to the
encoding process, several temporally successive 'short block' spectra are to be decoded
in step/stage 36 and collected in a first grouping step/stage 43. Overlap/add is performed
(optionally involving application of window functions). Thereafter each set of temporally
successive spectral coefficients is transformed using the corresponding MLT steps/stages
42, and provides a halved number of output values. The generated spectral coefficients
are then grouped in a second grouping step/stage 41 to one MLT spectrum with the initial
high frequency resolution and transform length. Optionally, multiple 'short block
decoding' modes with different MLT transform lengths can be used as signalled in SI,
whereby a non-uniform time-frequency resolution over the short block spectra is facilitated,
e.g. a higher time resolution for high frequencies and a higher frequency resolution
for low frequencies.
[0040] As an alternative embodiment, a different cascading of the MLTs can be used wherein
the order of the inner MLT/inverse MLT pair in the speech encoder is switched. In
Fig. 5 a block diagram of a corresponding encoding is depicted, wherein Fig. 1 reference
signs mean the same operation as in Fig. 1.
[0041] The inverse MLT 11 is replaced by an MLT step/stage 51, and the MLT 14 is replaced
by an inverse MLT step/stage 54 (i.e. an 'opposite type' MLT). Due to the exchanged
order of these MLTs the speech encoder input signal has different properties compared
to those in Fig. 1. Therefore the speech coder 52 and the speech decoder 53 are adapted
to these different properties (e.g. such that aliasing components are cancelled out).
[0042] Like in Fig. 2 for the Fig. 1 embodiment, in decoding step/stage 36 for the Fig.
5 embodiment a 'short block mode' processing can be used as shown in Fig. 6, wherein
MLT steps/stages 62 corresponding to that in Fig. 4 replace the inverse MLT steps/stages
22 in Fig. 2.
[0043] In the alternative embodiment decoder shown in Fig. 7, the speech decoding step/stage
33 in Fig. 3 is replaced by a correspondingly adapted speech decoding step/stage 73
and the MLT step/stage 34 in Fig. 3 is replaced by a corresponding inverse MLT step/stage
74.
[0044] Like in Fig. 4 for the Fig. 3 embodiment, for the Fig. 7 embodiment a 'short block
mode' processing can be used as shown in Fig. 8, wherein corresponding inverse MLT
steps/stages 82 corresponding to that in Fig. 1 replace the MLT steps/stages 42 in
Fig. 4.
[0045] Instead of achieving a higher time resolution by the processing described in connection
with Fig. 2 and Fig. 6 (block switching in the quantisation&coding step/stage 16 and
in the decoding step/stage 36), in the further embodiment of Fig. 9 a different way
of block switching is carried out. Instead of using a fixed large MLT 10 (e.g. an
MDCT) before the separation into speech and audio bands, several short MLTs (or MDCTs)
90 can be switched on. For example, instead of using one MDCT with a transform length
of 2048 samples, 8 short MDCTs with a transform length of 256 samples can be used.
However, it is not mandatory that the sum of the lengths of the short transforms is
equal to the long transform length (although it makes buffer handling even more easier).
[0046] Correspondingly, several short inverse MLTs 91 are used in front of speech encoder
12 and several short MLTs 94 are used following speech decoder 13. Advantageously,
for this Fig. 9 long/short block mode switching the internal buffer handling is easier
than for the long/short block mode switching according to figures 1 to 8, at the cost
of a less sharp band separation between the speech frequency band and the remaining
frequency band. The reason for the internal buffer handling being easier is as follows:
at least for each inverse MLT operation an additional buffer is required, which leads
in case of an inner transform to the necessity of an additional buffer also in the
parallel high frequency path. Therefore the switching at the outmost transform has
the least side effects concerning buffers.
[0047] On the other hand, because the short blocks are used only for encoding transient
input signals, the sharp separation in time domain is more important.
[0048] In Fig. 9, the Fig. 1 reference signs do mean the same operation as in Fig. 1. The
MLT 10 is input signal IS adaptively replaced by short MLT steps/stages 90, the inverse
MLT 11 is replaced by shorter inverse MLT steps/stages 91, and the MLT 14 is replaced
by shorter MLT steps/stages 94. Due to this kind of blocks switching, the lengths
of the first transform 90, 30 and the second transform 11, 34, 51, 74 (iMDCT to reconstruct
the speech band) and the third transform 14, 54 are coordinated. Furthermore, several
short blocks of the speech band signal can be buffered after the iMDCT 91 in Fig.
9 in order to collect enough samples for a complete input frame for the speech coder.
[0049] The encoding of Fig. 9 can also be adapted correspondingly to the encoding described
for Fig. 5.
[0050] Based on the Fig. 9 embodiment, the decoding according to Fig. 3, or the decoding
according to Fig. 7, is adapted correspondingly, i.e. the inverse MLTs 34 and 30 are
each replaced by corresponding adaptively switched shorter inverse MLTs.
[0051] Based on the Fig. 9 embodiment, the transform block sections are weighted at encoding
side in MLT 90 and at decoding side in inverse MLT 30 by window functions, in particular
in an overlapping manner, wherein the length of a window function corresponds to the
current transform length. In case of switching the transform length, to achieve a
smooth transition between long and short blocks, especially shaped long windows (the
start and stop windows, or transition windows) are used.
1. Method for encoding a speech and/or non-speech audio input signal (IS), said method
including the steps:
- transforming (10, 90) successive and possibly overlapping sections of said input
signal (IS) by at least one initial MLT transform and splitting the resulting output
frequency bins into a low band signal and a remaining band signal (RBS);
- passing said low band signal to a speech/audio switching (15) and through a speech
coding/decoding loop including at least one short first-type MLT transform (11, 51,
91), a speech encoding (12, 52), a corresponding speech decoding (13, 53), and at
least one short second-type MLT transform (14, 54, 94) having a type opposite than
that of said first-type short MLT transform;
- quantising and encoding (16) said remaining band signal (RBS), controlled by a psycho-acoustic
model that receives as its input said audio input signal (IS);
- combining (17) the output signal of said quantising and encoding (16), a switching
information signal (SWI) of said switching (15), possibly the output signal of said
speech encoding (12, 52), and optionally other encoding side information (SI), in
order to form for said current section of said input signal (IS) an output bit stream
(OBS),
wherein said speech/audio switching (15) receives said low band signal and a second
input signal (DS) derived from the output of said short second-type MLT transform
(14, 54, 94) and decides, whether said second input signal bypasses said quantising
and encoding (16) step or said low band signal is coded together with said remaining
band signal (RBS) in said quantising and encoding (16) step,
and wherein in the latter case said output signal of said speech encoding (12, 52)
is not included in the current section of said output bit stream (OBS).
2. Apparatus for encoding a speech and/or non-speech audio input signal (IS), said apparatus
including means being adapted for:
- transforming (10, 90) successive and possibly overlapping sections of said input
signal (IS) by at least one initial MLT transform and splitting the resulting output
frequency bins into a low band signal and a remaining band signal (RBS);
- passing said low band signal to a speech/audio switching (15) and through a speech
coding/decoding loop including at least one short first-type MLT transform (11, 51,
91), a speech encoding (12, 52), a corresponding speech decoding (13, 53), and at
least one short second-type MLT transform (14, 54, 94) having a type opposite than
that of said first-type short MLT transform;
- quantising and encoding (16) said remaining band signal (RBS), controlled by a psycho-acoustic
model that receives as its input said audio input signal (IS);
- combining (17) the output signal of said quantising and encoding (16), a switching
information signal (SWI) of said switching (15), possibly the output signal of said
speech encoding (12, 52), and optionally other encoding side information (SI), in
order to form for said current section of said input signal (IS) an output bit stream
(OBS),
wherein said speech/audio switching (15) receives said low band signal and a second
input signal (DS) derived from the output of said short second-type MLT transform
(14, 54, 94) and decides, whether said second input signal bypasses said quantising
and encoding (16) step or said low band signal is coded together with said remaining
band signal (RBS) in said quantising and encoding (16) step,
and wherein in the latter case said output signal of said speech encoding (12, 52)
is not included in the current section of said output bit stream (OBS).
3. Method for decoding a bit stream (OBS) representing an encoded speech and/or non-speech
audio input signal (IS) that was encoded according to the method of claim 1, said
decoding method including the steps:
- demultiplexing (37) successive sections of said bitstream (OBS) to regain the output
signal of said quantising and encoding (16), said switching information signal (SWI),
possibly the output signal of said speech encoding (12, 52), and said encoding side
information (SI) if present;
- if present in a current section of said bitstream (OBS), passing said output signal
of said speech encoding through a speech decoding (33, 73) and said short second-type
MLT transform (34, 74);
- decoding (36) said output signal of said quantising and encoding (16), controlled
by said encoding side information (SI) if present, in order to provide for said current
section a reconstructed remaining band signal (RRBS) and a reconstructed low band
signal (RLBS);
- providing a speech/audio switching (15) with said reconstructed low band signal
and a second input signal (CS) derived from the output of said second-type MLT transform
(34, 74), and passing according to said switching information signal (SWI) either
said reconstructed low band signal (RLBS) or said second input signal (CS);
- inversely MLT transforming (30) the output signal of said switching (15) combined
with said reconstructed remaining band signal (RRBS), and possibly overlapping successive
sections, in order to form a current section of the reconstructed output signal (OS).
4. Apparatus for decoding a bit stream (OBS) representing an encoded speech and/or non-speech
audio input signal (IS) that was encoded according to the method of claim 1, said
apparatus including means being adapted for:
- demultiplexing (37) successive sections of said bitstream (OBS) to regain the output
signal of said quantising and encoding (16), said switching information signal (SWI),
possibly the output signal of said speech encoding (12, 52), and said encoding side
information (SI) if present;
- if present in a current section of said bitstream (OBS), passing said output signal
of said speech encoding through a speech decoding (33, 73) and said short second-type
MLT transform (34, 74);
- decoding (36) said output signal of said quantising and encoding (16), controlled
by said encoding side information (SI) if present, in order to provide for said current
section a reconstructed remaining band signal (RRBS) and a reconstructed low band
signal (RLBS);
- providing a speech/audio switching (15) with said reconstructed low band signal
and a second input signal (CS) derived from the output of said second-type MLT transform
(34, 74), and passing according to said switching information signal (SWI) either
said reconstructed low band signal (RLBS) or said second input signal (CS);
- inversely MLT transforming (30) the output signal of said switching (15) combined
with said reconstructed remaining band signal (RRBS), and possibly overlapping successive
sections, in order to form a current section of the reconstructed output signal (OS).
5. Method according to claim 1 or 3, or apparatus according to claim 2 or 4, wherein
in case a single MLT transform (10) is used at the input of the encoding and a single
inverse MLT transform (30) is used at the output of the decoding, input signal (IS)
adaptively at the input of said quantisation&coding (16) and at the output of said
decoding (36) several short MLT transforms each having a length smaller than the length
of said single MLT transform (10) and said single inverse MLT transform (30), respectively,
are carried out:
either short inverse MLT transforms (22) at the input of said quantisation&coding
(16) and short MLT transforms (22) at the output of said decoding (36),
or short MLT transforms (62) at the input of said quanti-sation&coding (16) and short
inverse MLT transforms (82) at the output of said decoding (36).
6. Method or apparatus according to claim 5, wherein said short MLT transforms and said
short inverse MLT transforms, respectively, are carried out if the signal energy in
a current section of said input signal (IS) exceeds a threshold level.
7. Method according to claim 1 or 3, or apparatus according to claim 2 or 4, wherein
at the input of the encoding it is switched input signal (IS) adaptively from a single
MLT transform (10) to multiple shorter MLT transforms (90), and at the output of said
decoding (36) correspondingly from a single inverse MLT transform (30) to multiple
shorter inverse MLT transforms.
8. Method or apparatus according to claim 7, wherein said multiple shorter MLT transforms
and said multiple shorter inverse MLT transforms, respectively, are carried out if
the signal energy in a current section of said input signal (IS) exceeds a threshold
level.
9. Method according to one of claims 1, 3 and 5 to 8, or apparatus according to one of
claims 2 and 4 to 8, wherein said second input signal (DS) is the difference signal
between said low band signal and the output signal (RSS) of said second-type MLT transform
(14, 54, 94).
10. Method according to one of claims 1, 3 and 5 to 8, or apparatus according to one of
claims 2 and 4 to 8, wherein said second input signal (DS) said output signal (RSS)
of said second-type MLT transform (14, 54, 94).
11. Method according to one of claims 1, 3 and 5 to 10, or apparatus according to one
of claims 2 and 4 to 10, wherein said switching (15) is controlled by information
received from said psycho-acoustic model (18).
12. Method according to one of claims 1, 3 and 5 to 11, or apparatus according to one
of claims 2 and 4 to 11, wherein said switching (15) is operated by using a rate-distortion
optimisation.
13. Method according to one of claims 1, 3 and 5 to 12, or apparatus according to one
of claims 2 and 4 to 12, wherein successive sections of said input signal (IS) and
successive sections for said output signal (OS) are weighted by a window function
having a length corresponding to the related transform length, in particular in an
overlapping manner, and wherein, if the transform length is switched, corresponding
transition window functions are used.
1. Verfahren zum Kodieren eines Sprach- und/oder Nicht-Sprach-Audio-Eingangssignals (IS),
enthaltend die Schritte:
- Transformieren (10, 90) von aufeinanderfolgenden und sich gegebenenfalls überlappenden
Abschnitten des Eingangssignals (IS) durch wenigstens eine Anfangs-MLT-Transformation
und Aufspalten der resultierenden Ausgangsfrequenz-Bins in ein Niederbandsignal und
in ein Restbandsignal (RBS);
- Durchlassen des Niederbandsignals zu einem Sprach/Audio-Schalter (15) und durch
eine Sprach-Kodier/Dekodierschleife, die wenigstens eine kurze MLT-Transformation
(11, 51, 91) vom ersten Typ, eine Sprachkodierung (12, 52), eine entsprechende Sprachdekodierung
(13, 53) und wenigstens eine kurze MLT-Transformation (14, 54, 94) von einem entgegengesetzten
Typ als dem der kurzen MLT-Transformation vom ersten Typ enthält;
- Quantisieren und Kodieren (16) des Restbandsignals (RBS), gesteuert durch ein psycho-akustisches
Modell, das an seinem Eingang das Audiosignal (IS) empfängt;
- Kombinieren (17) des Ausgangssignals der Quantisierung und Kodierung (16), eines
Schaltinformations-Signals (SWI) des Schalters (15), gegebenenfalls des Ausgangssignals
der Sprachkodierung (12, 52) und wahlweise anderer kodierseitiger Informationen (SI),
um für den aktuellen Abschnitt des Eingangssignals (IS) einen Ausgangs-Bitstrom (OBS)
zu bilden,
wobei der Sprach/Audio-Schalter (15) das Niederbandsignal und ein zweites, vom Ausgang
der kurzen MLT-Transformation (14, 54, 94) vom zweiten Typ abgeleitetes Eingangssignal
(DS) empfängt und entscheidet, ob das zweite Eingangssignal den Quantisierungs- und
Kodierungsschritt (16) umgeht oder das Niederbandsignal zusammen mit dem Restbandsignal
(RBS) in dem Quantisierungs- und Kodierungsschritt (16) kodiert wird,
und wobei im letzteren Fall das Ausgangssignal der Sprachkodierung (12, 52) nicht
in den aktuellen Abschnitt des Ausgangs-Bitstroms (OBS) eingeschlossen wird.
2. Vorrichtung zum Kodieren eines Sprach- und/oder Nicht-Sprach-Audio-Eingangssignals
(IS) mit Mitteln zum:
- Transformieren (10, 90) von aufeinanderfolgenden und sich gegebenenfalls überlappenden
Abschnitten des Eingangssignals (IS) durch wenigstens eine Anfangs-MLT-Transformation
und Aufspalten der resultierenden Ausgangsfrequenz-Bins in ein Niederbandsignal und
ein Restbandsignal (RBS);
- Durchlassen des Niederbandsignals zu einem Sprach/Audio-Schalter (15) und durch
eine Sprach-Kodier/Dekodierschleife, die wenigstens eine kurze MLT-Transformation
(11, 51, 91) vom ersten Typ, eine Sprachkodierung (12, 52), eine entsprechende Sprachdekodierung
(13, 53) und wenigstens eine kurze MLT-Transformation (14, 54, 94) von einem entgegengesetzten
Typ als dem der kurzen MLT-Transformation vom ersten Typ enthält;
- Quantisieren und Kodieren (16) des Restbandsignals (RBS), gesteuert durch ein psycho-akustisches
Modell, das an seinem Eingang das Audio-Eingangssignal (IS) empfängt;
- Kombinieren (17) des Ausgangssignals der Quantisierung und Kodierung (16), eines
Schalt-Informationssignals (SWI) des Schalters (15), gegebenenfalls des Ausgangssignals
der Sprachkodierung (12, 52) und wahlweise anderer kodierseitiger Informationen (SI),
um für den aktuellen Abschnitt des Eingangssignals (IS) einen Ausgangs-Bitstrom (OBS)
zu bilden,
wobei der Sprach/Audio-Schalter (15) das Niederbandsignal und ein zweites, vom Ausgang
der kurzen MLT-Transformation (14, 54, 94) vom zweiten Typ abgeleitetes Eingangssignal
(DS) empfängt und entscheidet, ob das zweite Eingangssignal den Quantisierungs- und
Kodierungsschritt (16) umgeht oder das Niederbandsignal zusammen mit dem Restbandsignal
(RBS) in dem Quantisierungs- und Kodierungsschritt (16) kodiert wird,
und wobei im letzteren Fall das Ausgangssignal der Sprachkodierung (12, 52) nicht
in den aktuellen Abschnitt des Ausgangs-Bitstroms (OBS) eingeschlossen wird.
3. Verfahren zum Dekodieren eines Bitstroms (OBS), der ein kodiertes Sprach- und/oder
Nicht-Sprach-Audio-Eingangssignal (IS) darstellt, das nach dem Verfahren von Anspruch
1 kodiert wurde, wobei das Verfahren die Schritte einschließt:
- Demultiplexen (37) aufeinanderfolgender Abschnitte des Bitstroms (OBS), um das Ausgangssignal
der Quantisierung und Kodierung (16), das Schaltinformations-Signal (SWI), gegebenenfalls
das Ausgangssignal der Sprachkodierung (12, 52) und die kodierseitigen Informationen
(SI), falls vorhanden, wiederzugewinnen;
- falls in einem aktuellen Abschnitt des Bitstroms (OBS) vorhanden, Durchlassen des
Ausgangssignals der Sprachkodierung durch eine Sprachdekodierung (33, 73) und die
kurze MLT-Transformation (34, 74) vom zweiten Typ;
- Dekodieren (36) des Ausgangssignals der Quantisierung und Kodierung (16), gesteuert
durch die kodierseitigen Informationen, falls vorhanden, um für den aktuellen Abschnitt
ein rekonstruiertes Restbandsignal (RLBS) zu liefern;
- Vorsehen eines Sprach/Audio-Schalters (15) bei dem rekonstruierten Niederbandsignal
und einem zweiten, von dem Ausgang der MLT-Transformation vom zweiten Typ (34, 74)
abgeleiteten Eingangssignal (CS), und Durchlassen gemäß dem Schaltinformations-Signal
(SWI) entweder des rekonstruierten Niederbandsignals (RLBS) oder des zweiten Eingangssignals
(CS);
- inverse MLT-Transformation (30) des Ausgangssignals des Schalters (15) kombiniert
mit dem rekonstruierten Restbandsignal (RRBS), und gegebenenfalls Überlappen aufeinanderfolgender
Abschnitte, um einen aktuellen Abschnitt des rekonstruierten Ausgangssignals (OS)
zu bilden.
4. Vorrichtung zum Dekodieren eines Bitstroms (OBS), der ein kodiertes Sprach- und/oder
Nicht-Sprach-Audio-Eingangssignal (IS) darstellt, das nach dem Verfahren von Anspruch
1 kodiert wurde, wobei die Vorrichtung Mittel einschließt zum:
- Demultiplexen (37) aufeinanderfolgender Abschnitte des Bitstroms (OBS), um das Ausgangssignal
der Quantisierung und Kodierung (16), das Schaltinformations-Signal (SWI), gegebenenfalls
das Ausgangssignal der Sprachkodierung (12, 52) und die kodierseitigen Informationen,
falls vorhanden, wiederzugewinnen;
- falls in einem aktuellen Abschnitt des Bitstroms (OBS) vorhanden, Durchlassen des
Ausgangssignals der Sprachkodierung durch eine Sprach-Dekodierung (33, 73) und die
kurze MLT-Transformation (34, 74) vom zweiten Typ;
- Dekodieren (36) des Ausgangssignals der Quantisierung und Kodierung (16), gesteuert
durch die kodierseitigen Informationen, falls vorhanden, um für den aktuellen Abschnitt
ein rekonstruiertes Restbandsignal (RRBS) und ein rekonstruiertes Niederbandsignal
(RLBS) zu liefern;
- Vorsehen eines Sprach/Audio-Schalters (15) bei dem rekonstruierten Niederbandsignal
und einem zweiten, von dem Ausgang der MLT-Transformation vom zweiten Typ (34, 74)
abgeleiteten Eingangssignal (CS), und Durchlassen gemäß dem Schaltinformations-Signal
(SWI) entweder des rekonstruierten Niederbandsignals (RLBS) oder des zweiten Eingangssignals
(CS);
- inverse MLT-Transformation (30) des Ausgangssignals des Schalters (15) kombiniert
mit dem rekonstruierten Restbandsignal (RRBS) und gegebenenfalls Überlappen aufeinanderfolgender
Abschnitte, um einen aktuellen Abschnitt des rekonstruierten Ausgangssignals (OS)
zu bilden.
5. Verfahren nach Anspruch 1 oder 3 oder Vorrichtung nach Anspruch 2 oder 4, bei dem
bzw. bei der für den Fall, dass eine einzelne MLT-Transformation (10) am Eingang der
Kodierung verwendet wird und eine einzelne inverse MLT-Transformation (30) am Ausgang
der Dekodierung verwendet wird, das Eingangssignal (IS) adaptiv am Eingang der Quantisierung
und Kodierung (16) liegt und am Ausgang der Kodierung (36) mehrere kurze MLT-Transformationen
ausgeführt werden, deren Länge jeweils kleiner ist als die Länge der einzelnen MLT-Transformation
(10) bzw. der einzelnen inversen MLT-Transformation (30):
entweder kurze inverse MLT-Transformationen (22) am Eingang der Quantisierung und
Kodierung und kurze MLT-Transformationen (22) am Ausgang der Dekodierung (36),
oder kurze MLT-Transformationen (62) am Eingang der Quantisierung und Kodierung (16)
und kurze inverse MTL-Transformationen (82) am Ausgang der Dekodierung (36).
6. Verfahren oder Vorrichtung nach Anspruch 5, bei dem bzw. bei der die kurzen MLT-Transformationen
bzw. die kurzen inversen MLT-Transformationen ausgeführt werden, wenn die Signalenergie
in einem aktuellen Abschnitt des Eingangssignals (IS) einen Schwellwertpegel überschreitet.
7. Verfahren nach Anspruch 1 oder 3 oder Vorrichtung nach Anspruch 2 oder 4, bei dem
bzw. bei der am Eingang der Kodierung das Eingangssignal (IS) adaptiv von einer einzelnen
MLT-Transformation (10) auf mehrfache kürzere MLT-Transformationen (90) geschaltet
wird, und am Ausgang der Dekodierung (36) entsprechend von einer einzelnen inversen
MLT-Transformation (30) auf mehrfache kürzere inverse MLT-Transformationen.
8. Verfahren oder Vorrichtung nach Anspruch 7, bei dem bzw. bei der die mehrfachen kürzeren
MLT-Transformationen bzw. die mehrfachen kürzeren inversen MLT-Transformationen ausgeführt
werden, wenn die Signalenergie in einem aktuellen Abschnitt des Eingangssignals (IS)
einen Schwellwertpegel überschreitet.
9. Verfahren nach einem der Ansprüche 1, 3 und 5 bis 8, bei dem das zweite Eingangssignal
(DS) das Differenzsignal zwischen dem Niederbandsignal und dem Ausgangssignal (RSS)
der MLT-Transformation (14, 54, 94) vom zweiten Typ ist.
10. Verfahren nach einem der Ansprüche 1, 3 und 5 bis 8 oder Vorrichtung nach einem der
Ansprüche 2 und 4 bis 8, bei dem bzw. bei der das zweite Eingangssignal (DS) das Ausgangssignal
(RSS) der MLT-Transformation vom zweiten Typ (14, 54, 94) ist.
11. Verfahren nach einem der Ansprüche 1, 3 und 5 bis 10 oder Vorrichtung nach einem der
Ansprüche 2 und 4 bis 10, bei dem bzw. bei der das Schalten (15) durch Informationen
gesteuert wird, die von dem psycho-akustischen Modell (18) empfangen werden.
12. Verfahren nach einem der Ansprüche 1, 3 und 5 bis 11 oder Vorrichtung nach einem der
Ansprüche 2 und 4 bis 11, bei dem bzw. bei der das Schalten (15) durch Verwendung
einer Optimierung des Verzerrungsverhältnisses (rate-distortion) betätigt wird.
13. Verfahren nach einem der Ansprüche 1, 3 und 5 bis 12 oder Vorrichtung nach einem der
Ansprüche 2 und 4 bis 12, bei dem bzw. bei der aufeinanderfolgende Abschnitte des
Eingangssignals (IS) und aufeinanderfolgende Abschnitte für das Ausgangssignal (OS)
durch eine Fensterfunktion gewichtet werden, deren Länge der einschlägigen Transformationslänge,
insbesondere in einer überlappenden Weise, entspricht, und wobei, wenn die Transformationslänge
geschaltet wird, entsprechende Übergangsfensterfunktionen verwendet werden.
1. Procédé de codage d'un signal d'entrée audio vocal et/ou non vocal (IS), ledit procédé
incluant les étapes suivantes:
- transformation (10, 90) de sections successives et éventuellement chevauchantes
dudit signal d'entrée (IS) par au moins une transformée MLT initiale et division des
bins de fréquence de sortie obtenus en un signal à faible bande passante et un signal
à bande passante restante (RBS) ;
- acheminement dudit signal à faible bande passante vers une commutation de parole/audio
(15) et via une boucle de codage/décodage de parole incluant au moins une courte transformée
MLT de premier type (11, 51, 91), un codage de parole (12, 52), un décodage de parole
correspondant (13, 53), et au moins une courte transformée MLT de second type (14,
54, 94) dont le type est opposé à celui de ladite courte transformée MLT de premier
type ;
- quantification et codage (16) dudit signal à bande passante restante (RBS), commandé
par un modèle psycho-acoustique qui reçoit à son entrée ledit signal d'entrée audio
(IS) ;
- combinaison (17) du signal de sortie desdits quantification et codage (16), d'un
signal d'informations de commutation (SWI) de ladite commutation (15), éventuellement
du signal de sortie dudit codage de parole (12, 52), et, en option, d'autres informations
de codage secondaires (SI), afin de former, pour ladite section actuelle dudit signal
d'entrée (IS), un train de bits de sortie (OBS),
où ladite commutation de parole/audio (15) reçoit ledit signal à faible bande passante
et un deuxième signal d'entrée (DS) dérivé de la sortie de ladite courte transformée
MLT de second type (14, 54, 94) et décide si ledit deuxième signal d'entrée évite
ladite étape de quantification et de codage (16) ou ledit signal à faible bande passante
est codé avec ledit signal à bande passante restante (RBS) lors de ladite étape de
quantification et de codage (16),
et où dans ce dernier cas, ledit signal de sortie dudit codage de parole (12, 52)
n'est pas inclus dans la section actuelle dudit train de bits de sortie (OBS).
2. Appareil de codage d'un signal d'entrée audio vocal et/ou non vocal (IS), ledit appareil
incluant des moyens adaptés pour:
- une transformation (10, 90) de sections successives et éventuellement chevauchantes
dudit signal d'entrée (IS) par au moins une transformée MLT initiale et division des
bins de fréquence de sortie obtenus en un signal à faible bande passante et un signal
à bande passante restante (RBS) ;
- un acheminement dudit signal à faible bande passante vers une commutation de parole/audio
(15) et via une boucle de codage/décodage de parole incluant au moins une courte transformée
MLT de premier type (11, 51, 91), un codage de parole (12, 52), un décodage de parole
correspondant (13, 53), et au moins une courte transformée MLT de second type (14,
54, 94) dont le type est opposé à celui de ladite courte transformée MLT de premier
type ;
- une quantification et un codage (16) dudit signal à bande passante restante (RBS),
commandé par un modèle psycho-acoustique qui reçoit à son entrée ledit signal d'entrée
audio (IS) ;
- une combinaison (17) du signal de sortie desdits quantification et codage (16),
d'un signal d'informations de commutation (SWI) de ladite commutation (15), éventuellement
du signal de sortie dudit codage de parole (12, 52), et, en option, d'autres informations
de codage secondaires (SI), afin de former, pour ladite section actuelle dudit signal
d'entrée (IS), un train de bits de sortie (OBS),
où ladite commutation de parole/audio (15) reçoit ledit signal à faible bande passante
et un deuxième signal d'entrée (DS) dérivé de la sortie de ladite courte transformée
MLT de second type (14, 54, 94) et décide si ledit deuxième signal d'entrée évite
ladite étape de quantification et de codage (16) ou ledit signal à faible bande passante
est codé avec ledit signal à bande passante restante (RBS) lors de ladite étape de
quantification et de codage (16),
et où dans ce dernier cas, ledit signal de sortie dudit codage de parole (12, 52)
n'est pas inclus dans la section actuelle dudit train de bits de sortie (OBS).
3. Procédé de décodage d'un train de bits (OBS) représentant un signal d'entrée audio
vocal et/ou non vocal codé (IS) qui a été codé conformément au procédé selon la revendication
1, ledit procédé de décodage incluant les étapes suivantes:
- démultiplexage (37) de sections successives dudit train de bits (OBS) pour récupérer
le signal de sortie de ladite étape de quantification et de codage (16), ledit signal
d'informations de commutation (SWI), éventuellement le signal de sortie dudit codage
de parole (12, 52), et lesdites informations de codage secondaires (SI) si présentes
;
- s'il est présent dans une section actuelle dudit train de bits (OBS), acheminement
dudit signal de sortie dudit codage de parole via un décodage de parole (33, 73) et
ladite courte transformée MLT de second type (34, 74) ;
- décodage (36) dudit signal de sortie de ladite étape de quantification et de codage
(16), commandé par lesdites informations de codage secondaires (SI) si présentes,
afin de fournir pour ladite section actuelle un signal à bande passante restante reconstruit
(RRBS) et un signal à faible bande passante reconstruit (RLBS);
- fourniture d'une commutation de parole/audio (15) avec ledit signal à faible bande
passante reconstruit et un deuxième signal d'entrée (CS) dérivé de la sortie de ladite
transformée MLT de second type (34, 74), et acheminement, selon ledit signal d'informations
de commutation (SWI), dudit signal à faible bande passante reconstruit (RLBS) ou dudit
deuxième signal d'entrée (CS) ;
- transformée MLT inverse (30) du signal de sortie de ladite commutation (15) combiné
avec ledit signal à bande passante restante reconstruit (RRBS), et chevauchement éventuel
de sections successives, afin de former une section actuelle du signal de sortie reconstruit
(OS).
4. Appareil de décodage d'un train de bits (OBS) représentant un signal d'entrée audio
vocal et/ou non vocal codé (IS) qui a été codé conformément au procédé selon la revendication
1, ledit appareil incluant des moyens adaptés pour:
- un démultiplexage (37) de sections successives dudit train de bits (OBS) pour récupérer
le signal de sortie de ladite étape de quantification et de codage (16), ledit signal
d'informations de commutation (SWI), éventuellement le signal de sortie dudit codage
de parole (12, 52), et lesdites informations de codage secondaires (SI) si présentes
;
- s'il est présent dans une section actuelle dudit train de bits (OBS), un acheminement
dudit signal de sortie dudit codage de parole via un décodage de parole (33, 73) et
ladite courte transformée MLT de second type (34, 74) ;
- un décodage (36) dudit signal de sortie de ladite étape de quantification et de
codage (16), commandé par lesdites informations de codage secondaires (SI) si présentes,
afin de fournir pour ladite section actuelle un signal à bande passante restante reconstruit
(RRBS) et un signal à faible bande passante reconstruit (RLBS);
- une fourniture d'une commutation de parole/audio (15) avec ledit signal à faible
bande passante reconstruit et un deuxième signal d'entrée (CS) dérivé de la sortie
de ladite transformée MLT de second type (34, 74), et acheminement, selon ledit signal
d'informations de commutation (SWI), dudit signal à faible bande passante reconstruit
(RLBS) ou dudit deuxième signal d'entrée (CS) ;
- une transformée MLT inverse (30) du signal de sortie de ladite commutation (15)
combiné avec ledit signal à bande passante restante reconstruit (RRBS), et chevauchement
éventuel de sections successives, afin de former une section actuelle du signal de
sortie reconstruit (OS).
5. Procédé selon la revendication 1 ou 3, ou appareil selon la revendication 2 ou 4,
où dans le cas où une transformée MLT unique (10) est utilisée à l'entrée du codage
et une transformée MLT inverse unique (30) est utilisée à la sortie du décodage, et
en cas de signal d'entrée (IS) de manière adaptative à l'entrée de ladite étape de
quantification et de codage (16) et à la sortie dudit décodage (36), plusieurs courtes
transformées MLT, présentant chacune une longueur inférieure à la longueur de ladite
transformée MLT unique (10) et de ladite transformée MLT inverse unique (30), respectivement,
sont réalisées:
soit de courtes transformées MLT inverses (22) à l'entrée de ladite étape de quantification
et de codage (16) et de courtes transformées MLT (22) à la sortie dudit décodage (36),
soit de courtes transformées MLT (62) à l'entrée de ladite étape de quantification
et de codage (16) et de courtes transformées MLT inverses (82) à la sortie dudit décodage
(36),
6. Procédé ou appareil selon la revendication 5, où lesdites courtes transformées MLT
et lesdites courtes transformées MLT inverses, respectivement, sont réalisées si l'énergie
de signal dans une section actuelle dudit signal d'entrée (IS) dépasse un niveau seuil.
7. Procédé selon la revendication 1 ou 3, ou appareil selon la revendication 2 ou 4,
où à l'entrée du décodage est appliqué un signal d'entrée commuté (IS) de manière
adaptative à partir d'une transformée MLT unique (10) vers plusieurs transformées
MLT plus courtes (90), et à la sortie dudit décodage (36), en conséquence, à partir
d'une transformée MLT inverse unique (30) vers plusieurs transformées MLT inverses
plus courtes.
8. Procédé ou appareil selon la revendication 7, où lesdites plusieurs transformées MLT
plus courtes et lesdites plusieurs transformées MLT inverses plus courtes, respectivement,
sont réalisées si l'énergie de signal dans une section actuelle dudit signal d'entrée
(IS) dépasse un niveau seuil.
9. Procédé selon une des revendications 1, 3 et 5 à 8, ou appareil selon une des revendications
2 et 4 à 8, où ledit deuxième signal d'entrée (DS) est le signal de différence entre
ledit signal à faible bande passante et le signal de sortie (RSS) de ladite transformée
MLT de second type (14, 54, 94).
10. Procédé selon une des revendications 1, 3 et 5 à 8, ou appareil selon une des revendications
2 et 4 à 8, où ledit deuxième signal d'entrée (DS) est ledit signal de sortie (RSS)
de ladite transformée MLT de second type (14, 54, 94).
11. Procédé selon une des revendications 1, 3 et 5 à 10, ou appareil selon une des revendications
2 et 4 à 10, où ladite commutation (15) est commandée par des informations reçues
dudit modèle psycho-acoustique (18).
12. Procédé selon une des revendications 1, 3 et 5 à 11, ou appareil selon une des revendications
2 et 4 à 11, où ladite commutation (15) fonctionne à l'aide d'une optimisation de
la distorsion du taux.
13. Procédé selon une des revendications 1, 3 et 5 à 12, ou appareil selon une des revendications
2 et 4 à 12, où des sections successives dudit signal d'entrée (IS) et des sections
successives dudit signal de sortie (OS) sont pondérées par une fonction de fenêtre
présentant une longueur correspondant à la longueur de transformée associée, en particulier
selon un chevauchement, et où, si la longueur de transformée est commutée, des fonctions
de fenêtre de transition correspondantes sont utilisées.