[0001] This application claims priority from provisional application Serial No 60/167,197,
filed 23 November 1999.
[0002] The invention relates to electronic devices, and more particularly, but not exclusively,
to speech coding, transmission, storage, and decoding/synthesis methods and circuitry.
[0003] The performance of digital speech systems using low bit rates has become increasingly
important with current and foreseeable digital communications. Both dedicated channel
and packetized-over-network (e.g., Voice over IP or Voice over Packet) transmissions
benefit from compression of speech signals. The widely-used linear prediction (LP)
digital speech coding compression method models the vocal tract as a time-varying
filter and a time-varying excitation of the filter to mimic human speech. Linear prediction
analysis determines LP coefficients a
i, i = 1, 2, ..., M, for an input frame of digital speech samples {s(n)} by setting

and minimizing the energy Σr(n)
2 of the residual r(n) in the frame. Typically, M, the order of the linear prediction
filter, is taken to be about 10-12; the sampling rate to form the samples s(n) is
typically taken to be 8 kHz (the same as the public switched telephone network sampling
for digital transmission); and the number of samples {s(n)} in a frame is typically
80 or 160 (10 or 20 ms frames). A frame of samples may be generated by various windowing
operations applied to the input speech samples. The name "linear prediction" arises
from the interpretation of r(n) = s(n) + Σ
M≥i≥1 a
i s(n-i) as the error in predicting s(n) by the linear combination of preceding speech
samples -Σ
M≥i≥i a
i s(n-i). Thus minimizing Σr(n)
2 yields the {a
i} which furnish the best linear prediction for the frame. The coefficients {a
i} may be converted to line spectral frequencies (LSFs) for quantization and transmission
or storage and converted to line spectral pairs (LSPs) for interpolation between subframes.
[0004] The {r(n)} is the LP residual for the frame, and ideally the LP residual would be
the excitation for the synthesis filter 1/A(z) where A(z) is the transfer function
of equation (1). Of course, the LP residual is not available at the decoder; thus
the task of the encoder is to represent the LP residual so that the decoder can generate
an excitation which emulates the LP residual from the encoded parameters. Physiologically,
for voiced frames the excitation roughly has the form of a series of pulses at the
pitch frequency, and for unvoiced frames the excitation roughly has the form of white
noise.
[0005] The LP compression approach basically only transmits/stores updates for the (quantized)
filter coefficients, the (quantized) residual (waveform or parameters such as pitch),
and (quantized) gain(s). A receiver decodes the transmitted/stored items and regenerates
the input speech with the same perceptual characteristics. Figures 5-6 illustrate
high level blocks of an LP system. Periodic updating of the quantized items requires
fewer bits than direct representation of the speech signal, so a reasonable LP coder
can operate at bits rates as low as 2-3 kb/s (kilobits per second).
[0006] However, high error rates in wireless transmission and large packet losses/delays
for network transmissions demand that an LP decoder handle frames in which so many
bits are corrupted that the frame is ignored (erased). To maintain speech quality
and intelligibility for wireless or voice-over-packet applications in the case of
erased frames, the decoder typically has methods to conceal such frame erasures, and
such methods may be categorized as either interpolation-based or repetition-based.
An interpolation-based concealment method exploits both future and past frame parameters
to interpolate missing parameters. In general, interpolation-based methods provide
better approximation of speech signals in missing frames than repetition-based methods
which exploit only past frame parameters. In applications like wireless communications,
the interpolation-based method has a cost of an additional delay to acquire the future
frame. In Voice over Packet communications future frames are available from a playout
buffer which compensates for arrival jitter of packets, and interpolation-based methods
mainly increase the size of the playout buffer. Repetition-based concealment, which
simply repeats or modifies the past frame parameters, finds use in several CELP-based
speech coders including G.729, G.723.1 and GSM-EFR. The repetition-based concealment
method in these coders does not introduce any additional delay or playout buffer size,
but the performance of reconstructed speech with erased frames is poorer than that
of the interpolation-based approach, especially in a high erased-frame ratio or bursty
frame erasure environment.
[0007] In more detail, the ITU standard G.729 uses frames of 10 ms length (80 samples) divided
into two 5-ms 40-sample subframes for better tracking of pitch and gain parameters
plus reduced codebook search complexity. Each subframe has an excitation represented
by an adaptive-codebook contribution and a fixed (algebraic) codebook contribution.
The adaptive-codebook contribution provides periodicity in the excitation and is the
product of v(n), the prior frame's excitation translated by the current frame's pitch
lag in time and interpolated, multiplied by a gain, gp. The algebraic codebook contribution
approximates the difference between the actual residual and the adaptive codebook
contribution with a four-pulse vector, c(n), multiplied by a gain, g
c. Thus the excitation is u(n) = g
P v(n) + g
C c(n) where v(n) comes from the prior (decoded) frame and g
p, g
c, and c(n) come from the transmitted parameters for the current frame. Figures 3-4
illustrate the encoding and decoding in block format; the postfilter essentially emphasizes
any periodicity (e.g., vowels).
[0008] G.729 handles frame erasures by reconstruction based on previously received information;
that is, repetition-based concealment. Namely, replace the missing excitation signal
with one of similar characteristics, while gradually decaying its energy by using
a voicing classifier based on the long-term prediction gain (which is computed as
part of the long-term postfilter analysis). The long-term postfilter finds the long-term
predictor for which the prediction gain is more than 3 dB by using a normalized correlation
greater than 0.5 in the optimal delay determination. For the error concealment process,
a 10 ms frame is declared periodic if at least one 5 ms subframe has a long-term prediction
gain of more than 3 dB. Otherwise the frame is declared nonperiodic. An erased frame
inherits its class from the preceding (reconstructed) speech frame. Note that the
voicing classification is continuously updated based on this reconstructed speech
signal. The specific steps taken for an erased frame are as follows:
1) repetition of the synthesis filter parameters. The LP parameters of the last good
frame are used.
2) attenuation of adaptive and fixed-codebook gains. The adaptive-codebook gain is
based on an attenuated version of the previous adaptive-codebook gain: if the (m+1)st frame is erased, use gp(m+1) = 0.9 gP(m). Similarly, the fixed-codebook gain is based on an attenuated version of the pervious
fixed-codebook gain: gC(m+1) = 0.98 gC(m).
3) attenuation of the memory of the gain predictor. The gain predictor for the fixed-codebook
gain uses the energy of the previously selected algebraic codebook vectors c(n), so
to avoid transitional effects once good frames are received, the memory of the gain
predictor is updated with an attenuated version of the average codebook energy over
four prior frames.
4) generation of the replacement excitation. The excitation used depends upon the
periodicity classification. If the last reconstructed frame was classified as periodic,
the current frame is considered to be periodic as well. In that case only the adaptive
codebook contribution is used, and the fixed-codebook contribution is set to zero.
The pitch delay is based on the integer part of the pitch delay in the previous frame,
and is repeated for each successive frame. To avoid excessive periodicity the pitch
delay value is increased by one for each next subframe but bounded by 143. In contrast,
if the last reconstructed frame was classified as nonperiodic, the current frame is
considered to be nonperiodic as well, and the adaptive codebook contribution is set
to zero. The fixed-codebook contribution is generated by randomly selecting a codebook
index and sign index. The use of a classification allows the use of different decay
factors for either type of excitation (e.g., 0.9 for periodic and 0.98 for nonperiodic
gains). Figure 2 illustrates the decoder with concealment parameters.
[0009] Leung et al, Voice Frame Reconstruction Methods for CELP Speech Coders in Digital
Cellular and Wireless Communications, Proc. Wireless 93 (July 1993) describes missing
frame reconstruction using parametric extrapolation and interpolation for a low complexity
CELP coder using 4 subframes per frame
[0010] However, the repetition-based concealment methods have poor results.
[0011] An aspect of the present invention provides concealment of erased frames
by frame repetition together with one or more of: excitation signal muting, LP coefficient
bandwidth expansion with cutoff frequency, and pitch delay jittering.
[0012] This has advantages including improved performance for repetition-based concealment.
[0013] Preferred embodiments of the invention will now be described, by way of example only,
and with reference to the accompanying drawings, in which:
[0014] Figure 1 shows a preferred embodiment decoder in block format.
[0015] Figure 2 shows known decoder concealment.
[0016] Figure 3 is a block diagram of a known encoder.
[0017] Figure 4 is a block diagram of a known decoder.
[0018] Figures 5-6 illustrate systems.
1. Overview
[0019] Preferred embodiment decoders and methods for concealment of frame erasures in CELP-encoded
speech or other signal transmissions have one or more of three features: (1) muting
the excitation outside of the feedback loop, this replaces the attenuation of the
adaptive and fixed codebook gains; (2) expanding the bandwidth of the LP synthesis
filter with a threshold frequency for differing expansion factors; and (3) jittering
the pitch delay to avoid overly periodic repetition frames. Features (2) and (3) especially
apply to bursty noise leading to frame erasures. Figure 1 illustrates a preferred
embodiment decoder using all three concealment features; this contrasts with the G.729
standard decoder concealment illustrated in Figure 2.
[0020] Preferred embodiment systems (e.g., Voice over IP or Voice over Packet) incorporate
preferred embodiment concealment methods in decoders.
2. Encoder details
[0021] Some details of coding methods similar to G.729 are needed to explain the preferred
embodiments. In particular, Figure 3 illustrates a speech encoder using LP encoding
with excitation contributions from both adaptive and algebraic codebook, and preferred
embodiment concealment features affect the pitch delay, the codebook gains, and the
LP synthesis filter. Encoding proceeds as follows:
(1) Sample an input speech signal (which may be preprocessed to filter out dc and
low frequencies, etc.) at 8kHz or 16 kHz to obtain a sequence of digital samples,
s(n). Partition the sample stream into frames, such as 80 samples or 160 samples (e.g.,
10 ms frames) or other convenient size. The analysis and encoding may use various
size subframes of the frames or other intervals.
(2) For each frame (or subframes) apply linear prediction (LP) analysis to find LP
(and thus LSF/LSP) coefficients and quantize the coefficients. In more detail, the
LSFs are frequencies {f1, f2, f3, ... fN} monotonically increasing between 0 and the Nyquist frequency (4 kHz or 8 kHz for
sampling rates of 8 kHz or 16 kHz); that is, 0 < f1 < f2 ... < fM < fsamp/2 and M is the order of the linear prediction filter, typically in the range 10-12.
Quantize the LSFs for transmission/storage by vector quantizing the differences between
the frequencies and fourth-order moving average predictions of the frequencies.
(3) For each subframe find a pitch delay, Tj, by searching correlations of s(n) with s(n+k) in a windowed range; s(n) may be perceptually
filtered prior to the search. The search may be in two stages: an open loop search
using correlations of s(n) to find a pitch delay followed by a closed loop search
to refine the pitch delay by interpolation from maximizations of the normalized inner
product <x|y> of the target speech x(n) in the (sub)frame with the speech y(n) generated
by the (sub)frame's quantized LP synthesis filter applied to the prior (sub)frame's
excitation. The pitch delay resolution may be a fraction of a sample, especially for
smaller pitch delays. The adaptive codebook vector v(n) is then the prior (sub)frame's
excitation translated by the refined pitch delay and interpolated.
(4) Determine the adaptive codebook gain, gp, as the ratio of the inner product <x|y> divided by <y|y> where x(n) is the target
speech in the (sub)frame and y(n) is the (perceptually weighted) speech in the (sub)frame
generated by the quantized LP synthesis filter applied to the adaptive codebook vector
v(n) from step (3). Thus gpv(n) is the adaptive codebook contribution to the excitation
and gpy(n) is the adaptive codebook contribution to the speech in the (sub)frame.
(5) For each (sub)frame find the algebraic codebook vector c(n) by essentially maximizing
the normalized correlation of quantized-LP-synthesis-filtered c(n) with x(n) - gpy(n) as the target speech in the (sub)frame; that is, remove the adaptive codebook
contribution to have a new target. In particular, search over possible algebraic codebook
vectors c(n) to maximize the ratio of the square of the correlation < x-gpy|H|c> divided by the energy <c|HTH|c> where h(n) is the impulse response of the quantized LP synthesis filter (with
perceptual filtering) and H is the lower triangular Toeplitz convolution matrix with
diagonals h(0), h(1), .... The vectors c(n) have 40 positions in the case of 40-sample
(5 ms) (sub)frames being used as the encoding granularity, and the 40 samples are
partitioned into four interleaved tracks with 1 pulse positioned within each track.
Three of the tracks have 8 samples each and one track has 16 samples.
(6) Determine the algebraic codebook gain, gc, by minimizing |x-gpy-gcz| where, as in the foregoing description, x(n) is the target speech in the (sub)frame,
gp is the adaptive codebook gain, y(n) is the quantized LP synthesis filter applied
to v(n), and z(n) is the signal in the frame generated by applying the quantized LP
synthesis filter to the algebraic codebook vector c(n).
(7) Quantize the gains gp and gc for insertion as part of the codeword; the algebraic codebook gain may factored and
predicted, and the gains may be jointly quantized with a vector quantization codebook.
The excitation for the (sub)frame is then with quantized gains u(n) = gpv(n) + gcc(n), and the excitation memory is updated for use with the next (sub)frame.
[0022] Note that all of the items quantized typically would be differential values with
moving averages of the preceding frames' values used as predictors. That is, only
the differences between the actual and the predicted values would be encoded.
[0023] The final codeword encoding the (sub)frame would include bits for: the quantized
LSF coefficients, adaptive codebook pitch delay, algebraic codebook vector, and the
quantized adaptive codebook and algebraic codebook gains.
4. Decoder details
[0024] Figure 1 illustrates preferred embodiment decoders and decoding methods which essentially
reverse the encoding steps of the foregoing encoding method plus provide repetition-based
concealment features for erased frame reconstructions as described in the next section.
Figure 4 shows a decoder without concealment features, and for the m
th (sub)frame proceed as follows:
(1) Decode the quantized LP coefficients aj(m). The coefficients may be in differential LSP form, so a moving average of prior frames'
decoded coefficients may be used. The LP coefficients may be interpolated every 20
samples (subframe) in the LSP domain to reduce switching artifacts.
(2) Decode the adaptive codebook quantized pitch delay T(m), and apply (time translate plus interpolation) this pitch delay to the prior decoded
(sub)frame's excitation u(m-1)(n) to form the vector v(m)(n); this is the feedback loop in Figure 4.
(3) Decode the algebraic codebook vector c(m)(n).
(4) Decode the quantized adaptive codebook and algebraic codebook gains, gP(m) and gC(m).
(5) Form the excitation for the mth (sub)frame as u(m)(n) = gP(m) v(m)(n) + gc(m) c(m)(n) using the items from steps (2)-(4).
(6) Synthesize speech by applying the LP synthesis filter from step (1) to the excitation
from step (5).
(7) Apply any post filtering and other shaping actions.
5. Preferred embodiment concealments
[0025] Figure 1 shows preferred embodiment concealment features in a preferred embodiment
decoder and contrasts with Figure 2. In particular, presume that the m
th frame was decoded but the (m+1)
st frame was erased as were the (m+2)
nd, ... (m+j)
th ... frames. Then the preferred embodiment concealment features construct an (m+j)
st frame with one or more of the following modified decoder steps:
(1) Define the LP synthesis filter (1/Â(z)) by taking the (quantized) filter coefficients
ak(m+j) to be bandwidth expanded versions of the prior good frame's (quantized) coefficients
ak(m):

for j=1,2, ... successive erased frames and where the bandwidth expansion factor
γ(n) is confined to the range [0.8, 1.0]. Figure 1 illustrates this bandwidth expansion
applied to the synthesis filter. The decoder updates the bandwidth expansion factor
every frame by:
γ(n+1) = max(0.95 γ(n), 0.8) if CB > 1 and LSFBWmin < 100 Hz
γ(n+1) = min(1.05 γ(n), 1.0) otherwise
where CB is a bursty frame erasure counter which counts the number of consecutive erased frames,
and LSFBWmin is the minimum LSF bandwidth in the last good frame. The ith LSF bandwidth (LSFBWi) is defined as |fi+1 - fi|. The smaller an LSF bandwidth, the sharper the corresponding LPC spectrum peak (formant).
That is, LSFBWmin is the minimum LSFBWi, and so the bandwidth expansion factor may decrease only if at least one pair of
LSF frequencies are close together (a sharp formant). Note that for γ(n) decreasing the poles of the synthesis filter 1/Â(z/γ(n)) move radially towards the origin and thereby expand the formant peaks.
Thus with the mth frame a good frame and the (m+1)st frame erased, the counter CB =1 and the updated expansion factor is γ(m+1) = min (1.05 γ(m), 1.0). (For γ(m+1) = 1.05 γ(m) ≤ 1, γ(m) must have been at most about 0.953; this means that at least one of the preceding
four frames had a γ(n) decrease which implies at least two successive erased frames.) But with the (m+2)nd or more erased frames and an LSFBWmin of the mth frame less than 100 Hz, the factors γ(m+j) progressively decrease to the limit of 0.8. This suppresses any sharp formant (LSFBWmin < 100 Hz) in the mth frame from leading to a synthetic quality in the concealment reconstructions for
the (m+2)nd and later successive erased frames. That is, the synthesis filter is 1/Â(z/γ(m+j)) for concealing the erased (m+j)th frame where the filter coefficients ak(m) are from the last good frame.
Also, for good frames following bursty frame erasures, γ(m+j) is still applied to the decoded filter coefficients and progressively increased up
to 1.0 for a smooth recovery from frame erasures through γ(m+j+1) = min(1.05 γ(m+j), 1.0).
(2) Define the adaptive codebook quantized pitch delay T(m+1) for concealing the erased (m+1)st frame as equal to T(m) from the good prior mth frame. However, for two or more consecutive erased frames, add a random 3% jitter
to T(m) to define T(m+j) for j = 2, 3, ... erased frames. This avoids reconstructing an excessively periodic
concealment signal without accumulating estimation errors which may occur if the T(m+j+1) is just taken to be T(m+j) + 1 as in G.729. Apply this concealing pitch delay to the prior (sub)frame's excitation
u(m)(n) to form the adaptive codebook vector v(m+j)(n). In short, apply a random number in the range of [-0.03 T(m) , 0.03 T(m)] to T(m) and round off to the nearest 1/3 or integer, depending upon range, to obtain T(m+j) for a consecutive erased frame. Figure 1 shows the jitter, and the feedback loop
shows the use of the prior frame's excitation.
(3) Define the algebraic codebook vector c(m+j)(n) as a random vector of the type of c(m)(n); that is, for G.729-type coding the vector has four ±1 pulses out of 40 otherwise-zero
components.
(4) Define the quantized adaptive codebook gain, gP(m+j), and algebraic codebook gain, gC(m+j), simply as equal to gP(m) and gC(m), except gP(m+j) has an upper bound of max(1.2 - 0.1(CB - 1), 0.8). Again, CB is a count of the number of consecutive erased frames; i.e., a burst. The upper bound
prevents an unpredicted surge of excitation signal energy. This use of the unattenuated
gains maintains the excitation energy; however, the excitation is muted prior to synthesis
by applying the factor gE(m+j) as described in step (5).
(5) Form the excitation for the erased (m+1)th (sub)frame as u(m+1)(n) = gP(m+1) v(m+1)(n) + gc(m+1) c(m+1)(n) using the items from steps (2)-(4). Then apply the excitation muting factor gE(m+1) outside of the adaptive codebook feedback loop as illustrated in Figure 1. This eliminates
excessive decay of the excitation but still avoids a surge of speech energy as occurs
if erased frames follow a frame containing an onset of a vowel. The excitation muting
factor gE(n) is updated every subframe (5 ms) and lies in the range [0.0, 1.0]; the updating depends
upon the muting counter CM which is updated every frame (10 ms) as follows:
if CB > 1, then CM = 4
else if gP(m+1) < 1.0 and CM > 0, then decrement CM by 1
else, no change in CM
where CB again is the bursty counter which counts consecutive number of erased frames and
gP(m+1) is the algebraic codebook gain from step (4) Then the gE(n) updating is:
gE(n+1) = 0.95499 gE(n) if CM(n+1) > 0
gE(n+1) = min(1.09648 gE(n) , 1.0) otherwise
Thus the excitation to the synthesis filter becomes gE(m+1) u(m+1)(n).
Similarly for the (m+j)th consecutive erased frame using the corresponding gP(m+j) v(m+j)(n) + gC(m+J) c(m+j)(n) and muting with gE(m+j).
(6) Synthesize speech by applying the LP synthesis filter from step (1) to the excitation
from step (5).
(7) Apply any post filtering and other shaping actions.
6. Alternative preferred embodiments
[0026] Alternatives preferred embodiments perform only one or two of the three concealment
features of the preceding preferred embodiments. Indeed, the bandwidth expansion of
the LP coefficients for the erased frames and for the good frames after a burst of
erased frames could be omitted. This just changes the synthesis filter and does not
affect the excitation muting or pitch delay jittering.
[0027] Another alternative preferred embodiment omits the pitch delay jittering but may
use the incrementing as in G.729 for erased frames together with excitation muting
and LP coefficient bandwidth expansion.
[0028] Further, an alternative preferred embodiment omits the excitation muting and uses
the G.729 construction together with the pitch delay jittering and synthesis filter
coefficient bandwidth expansion.
[0029] Lastly, preferred embodiments may use just one of the three features (excitation
muting, pitch delay jittering, and synthesis filiter coefficient bandwidth expansion)
and follow G.729 in other aspects.
7. System preferred embodiments
[0030] Figures 5-6 show in functional block form preferred embodiment systems which use
the preferred embodiment encoding and decoding. This applies to speech and also other
signals which can be effectively CELP coded. The encoding and decoding can be performed
with digital signal processors (DSPs) or general purpose programmable processors or
application specific circuitry or systems on a chip such as both a DSP and RISC processor
on the same chip with the RISC processor controlling. Codebooks would be stored in
memory at both the encoder and decoder, and a stored program in an onboard or external
ROM, flash EEPROM, or ferroelectric memory for a DSP or programmable processor could
perform the signal processing. Analog-to-digital converters and digital-to-analog
converters provide coupling to the real world, and modulators and demodulators (plus
antennas for air interfaces) provide coupling for transmission waveforms. The encoded
speech can be packetized and transmitted over networks such as the Internet.
8. Modifications
[0031] The preferred embodiments may be modified in various ways while retaining one or
more of the features of erased frame concealment by synthesis filter coefficient bandwidth
expansion, pitch delay jittering, and excitation muting.
[0032] For example, interval (frame and subframe) size and sampling rate could differ; the
bandwidth expansion factor could apply for C
B > 0 or C
B > 2, the multipliers 0.95 and 1.05 and limits 0.8 and 1.0 could vary, and the 100
Hz threshold could vary; the pitch delay jitter could be with a larger or smaller
percentage of the pitch delay and could also apply to the first erased frame, and
the jitter size could vary with the number of consecutive erased frames or erasure
density; the excitation muting could vary nonlinearly with number of consecutive erased
frames or erasure density, and the multipliers 0.95499 and 1.09648 could vary.
[0033] Insofar as embodiments of the invention described above are implementable, at least
in part, using a software-controlled programmable processing device such as a Digital
Signal Processor, microprocessor, other processing devices, data processing apparatus
or computer system, it will be appreciated that a computer program for configuring
a programmable device, apparatus or system to implement the foregoing described methods
is envisaged as an aspect of the present invention. The computer program may be embodied
as source code and undergo compilation for implementation on a processing device,
apparatus or system, or may be embodied as object code, for example. The skilled person
would readily understand that the term computer in its most general sense encompasses
programmable devices such as referred to above, and data processing apparatus and
computer systems.
[0034] Suitably, the computer program is stored on a carrier medium in machine or device
readable form, for example in solid-state memory or magnetic memory such as disc or
tape and the processing device utilises the program or a part thereof to configure
it for operation. The computer program may be supplied from a remote source embodied
in a communications medium such as an electronic signal, radio frequency carrier wave
or optical carrier wave. Such carrier media are also envisaged as aspects of the present
invention.
[0035] The scope of the present disclosure includes any novel feature or combination of
features disclosed therein either explicitly or implicitly or any generalisation thereof
irrespective of whether or not it relates to the claimed invention or mitigates any
or all of the problems addressed by the present invention. The applicant hereby gives
notice that new claims may be formulated to such features during the prosecution of
this application or of any such further application derived therefrom. In particular,
with reference to the appended claims, features from dependent claims may be combined
with those of the independent claims and features from respective independent claims
may be combined in any appropriate manner and not merely in the specific combinations
enumerated in the claims.
1. A method for decoding digital speech, comprising:
(a) forming an excitation for an erased interval of encoded digital speech by a sum
of an adaptive codebook contribution and a fixed codebook contribution where said
adaptive codebook contribution derives from an excitation and pitch and first gain
of intervals prior in time of said encoded digital speech and said fixed codebook
contribution derives from a second gain of said intervals prior in time;
(b) muting said excitation; and
(b) filtering said muted excitation.
2. The method of claim 1, wherein:
(a) said filtering includes a synthesis with synthesis filter coefficients derived
from filter coefficients of said intervals prior in time.
3. A method for decoding digital speech, comprising:
(a) forming a synthesis filter for an erased interval of encoded digital speech by
determining filter coefficients from bandwidth expanded versions of filter coefficients
of intervals prior in time of said encoded digital speech; and
(b) filtering an excitation for said erased interval with said synthesis filter for
said erased interval.
4. The method of claim 3, wherein:
(a) said filter coefficients a1, a2, ... aM for said synthesis filter for said erased interval are related to said filter coefficients
b1, b2 ... bM for said synthesis filter for a interval prior in time by a1 = f b1, a2 = f2 b2, aM = fM bM, where f is a bandwidth expansion factor.
5. A method for decoding digital speech, comprising:
(a) forming an excitation for an erased interval of encoded digital speech by a sum
of an adaptive codebook contribution and a fixed codebook contribution where said
adaptive codebook contribution derives from an excitation and pitch and first gain
of intervals prior in time of said encoded digital speech with said pitch jittered
randomly, and said fixed codebook contribution derives from a second gain of said
intervals prior in time; and
(b) filtering said excitation.
6. The method of claim 5, wherein:
(a) said filtering includes a muting followed by a synthesis with synthesis filter
coefficients derived from synthesis filter coefficients of said intervals prior in
time.
7. The method of claim 6, further comprising:
(a) determining synthesis filter coefficients for said interval from bandwidth
expanded versions of synthesis filter coefficients of intervals prior in time of said
encoded digital speech.
8. A decoder for CELP encoded signals, comprising:
(a) a fixed codebook vector decoder;
(b) a fixed codebook gain decoder;
(c) an adaptive codebook gain decoder;
(d) an adaptive codebook pitch delay decoder;
(e) an excitation generator coupled to said decoders;
(f) a synthesis filter;
(g) a muting gain coupled between an output of said excitation generator and an input
to said synthesis filter;
(h) wherein when a received frame is erased, said decoders generate substitute outputs,
said excitation generator generates a substitute excitation, said synthesis filter
generates substitute filter coefficients, and said muting gain mutes said substitute
excitation.
9. The decoder of claim 8, wherein:
(a) said fixed codebook decoder and said adaptive codebook decoder both generate
said substitute outputs by repeating the outputs for the prior frame.
10. A computer program comprising computer- or machine-readable computer program elements
for configuring a computer to implement the method of any one of claims 1 to 7.
11. A computer program comprising computer- or machine-readable computer program elements
translatable for configuring a computer to implement the method of any one of claims
1 to 7.
12. A carrier medium carrying a computer program according to claim 10 or 11.