Background of the Invention
[0001] The present invention generally relates to digital speech coding at low bit rates,
and more particularly, is directed to an improved method for determining long-term
predictor output responses for code-excited linear prediction speech coders.
[0002] Code-excited linear prediction (CELP) is a speech coding technique which has the
potential of producing high quality synthesized speech at low bit rates, i.e., 4.8
to 9.6 kilobits-per-second (kbps). This class of speech coding, also known as vector-excited
linear prediction or stochastic coding, will most likely be used in numerous speech
communications and speech synthesis applications. CELP may prove to be particularly
applicable to digital speech encryption and digital radiotelephone communication systems
wherein speech quality, data rate, size, and cost are significant issues.
[0003] The term "code-excited" or "vector-excited" is derived from the fact that the excitation
sequence for the speech coder is vector quantized, i.e., a single codeword is used
to represent a sequence, or vector, of excitation samples. In this way, data rates
of less than one bit per sample are possible for coding the excitation sequence. The
stored excitation code vectors generally consist of independent random white Gaussian
sequences. One code vector from the codebook is chosen to represent each block of
N excitation samples. Each stored code vector is represented by a codeword, i.e.,
the address of the code vector memory location. It is this codeword that is subsequently
sent over a communications channel to the speech synthesizer to reconstruct the speech
frame at the receiver. See M.R. Schroeder and B.S. Atal, "Code-Excited Linear Prediction
(CELP): High-Quality Speech at Very Low Bit Rates", Proceedings of the IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. 3, pp. 937-40,
March 1985, for a more detailed explanation of CELP.
[0004] In a CELP speech coder, the excitation code vector from the codebook is applied to
two time-varying linear filters which model the characteristics of the input speech
signal. The first filter includes a long-term predictor in its feedback loop, which
has a long delay, i.e., 2 to 15 milliseconds, used to introduce the pitch periodicity
of voiced speech. The second filter includes a short-term predictor in its feedback
loop, which has a short delay, i.e., less than 2 msec, used to introduce a spectral
envelope or format structure. For each frame of speech, the speech coder applies each
individual code vector to the filters to generate a reconstructed speech signal, and
compares the original input speech signal to the reconstructed signal to create an
error signal. The error signal is then weighted by passing it through a weighting
filter having a response based on human auditory perception. The optimum excitation
signal is determined by selecting the code vector which produces the weighted error
signal having the minimum energy for the current frame. The codeword for the optimum
code vector is then transmitted over a communications channel.
[0005] In a CELP speech synthesizer, the codeword received from the channel is used to address
the codebook of excitation vectors. The single code vector is then multiplied by a
gain factor, and filtered by the long-term and short-term filters to obtain a reconstructed
speech vector. The gain factor and the predictor parameters are also obtained from
the channel. It has been found that a better quality synthesized signal can be generated
if the actual parameter used by the synthesizer are used in the analysis stage, thus
minimizing the quantization errors. Hence, the use of these synthesis parameters in
the CELP speech analysis stage to produce higher quality speech is referred to as
analysis-by-synthesis speech coding.
[0006] The short-term predictor attempts to predict the current output sample s(n) by a
linear combination of the immediately preceding output samples s(n-i), according to
the equation:

where p is the order of the short-term predictor, and e(n) is the prediction residual,
i.e., that part of s(n) that cannot be represented by the weighted sum of p previous
samples. The predictor order p typically ranges from 8 to 12, assuming an 8 kiloHertz
(kHz) sampling rate. The weights α
1, α
2, α
p, in this equation are called the predictor coefficients. The short-term predictor
coefficients are determined from the speech signal using conventional linear predictive
coding (LPC) techniques. The output response of the short-term filter may be expressed
in Z-transform notation as:

[0007] Refer to the article entitled "Predictive Coding of Speech at Low Bit Rates", IEEE
Trans. Commun., Vol. COM-30, pp. 600-14, April 1982, by B.S. Atal, for further discussion
of the short-term filter parameters.
[0008] The long-term filter, on the other hand, must predict the next output sample from
preceding samples that extend over a much longer time period. If only a single past
sample is used in the predictor, then the predictor is a single-tap predictor. Typically,
one to three taps are used. The output response for a long-term filter incorporating
a single-tap, long-term predictor is given in z-transform notation as:.

[0009] Note that this response is a function of only the delay or lag L of the filter and
the filter coefficient β. For voiced speech, the lag L would typically be the pitch
period of the speech, or a multiple of it. At a sampling rate of 8 kHz, a suitable
range for the lag L would be between 16 and 143, which corresponds to a pitch range
between 56 and 500 Hz.
[0010] The long-term predictor lag L and long-term predictor coefficient β can be determined
from either an open-loop or a closed loop configuration. Using the open-loop configuration,
the lag L and coefficient β are computed from the input signal (or its residual) directly.
In the closed loop configuration, the lag-L, and the coefficient β are computed at
the frame rate from coded data representing the past output of the long-term filter
and the input speech signal. In using the coded data, the long-term predictor lag
determination is based on the actual long-term filter state that will exist at the
synthesizer. Hence, the dosed-loop configuration gives better performance than the
open-loop method, since the pitch filter itself is would be contributing to the optimization
of the error signal. Moreover, a single-tap predictor works very well in the closed-loop
configuration.
[0011] Using the closed-loop configuration, the long-term filter output response b(n) is
determined from only past output samples from the long-term filter, and from the current
input speech samples s(n) according to the equation:

[0012] This technique is straightforward for pitch lags L which are greater than the frame
length N, i.e., when L ≥ N, since the term b(n-L) will always represent a past sample
for all sample numbers n, 0 ≤ n ≤ N-1 . Furthermore, in the case L > N, the excitation
gain factor γ and the long-term predictor coefficient β can be simultaneously optimized
for given values of lag L and codeword i. It has been found that this joint optimization
technique yields a noticeable improvement in speech quality.
[0013] If, however, long-term predictor lags L of less than the frame length N must be accommodated,
the dosed-loop approach fails. This problem can readily occur in the case of high-pitched
female speech. For example, a female voice corresponding to a pitch frequency of 250
Hz may require a long-term predictor lag L equal to 4 milliseconds (msec). A pitch
of 250 Hz at an 8 kHz sampling rate corresponds to a long-term predictor lag L of
32 samples. It is not desirable, however, to employ frame length N of less than 4
msec, since the CELP excitation vector can be coded more efficiently when longer frame
lengths are used. Accordingly, utilizing a frame length time of 7.5 msec at a sampling
rate of 8 kHz, the frame length N would be equal to 60 samples. This means only 32
past samples would be available to predict the next 60 samples of the frame. hence,
if the long-term predictor lag L is less than the frame length N, only L past samples
of the required N samples are defined.
[0014] Several alternative approaches have been taken in the prior art to address the problem
of pitch lags L being less than frame length N. In attempting to jointly optimize
the long-term predictor lag L and coefficient β, the first approach would be to attempt
to solve the equations directly, assuming no excitation signal to present. This approach
is explained in the article entitled "Regular-Pulse Excitation - A Novel Approach
to Effective and Efficient Multipulse Coding of Speech" by Kroon, et al., IEEE Transactions
on Acoustics, Speech, and Signal Processing, Vol. ASSP - 34, No. 5, October 1986,
pp. 1054-1063. However, in following this approach, a nonlinear equation in the single
parameter β must be solved. The solution of the quadratic or cubic in β must be solved.
The solution of the quadratic or cubic in β is computationally impractical. Moreover,
jointly optimizing the coefficient β with the gain factor γ is still not possible
with this approach.
[0015] A second solution, that of limiting the long-term predictor delay L to be greater
than the frame length N, is proposed by Singhal and Atal in the article "Improving
Performance of Multi-Pulse LPC Coders at Low Bit Rates", Proceedings of the IEEE International
Conference on Acoustics, Speech, and Signal Processing, Vol. 1, March 19-21, 1984,
pp. 1.3.1-1.3.4. This artificial constraint on the pitch lag L often does not accurately
represent the pitch information. Accordingly, using this approach, the voice quality
is degraded for high-pitched speech.
[0016] A third solution is to reduce the size of the frame length N. With a shorter frame
length, the long-term predictor lag L can always be determined from past samples.
This approach, however, suffers from a severe bit rate penalty. With a shorter frame
length, a greater number of long-term predictor parameters and excitation vectors
must be coded, and accordingly, the bit rate of the channel must be greater to accommodate
the extra coding.
[0017] A second problem exists for high pitch speakers. The sampling rate used in the coder
places an upper limit on the performance of a single-tap pitch predictor. For example,
if the pitch frequency is actually 485 Hz, the closest lag value would be 16 which
corresponds to 500 Hz. This results in an error of 15 Hz for the fundamental pitch
frequency which degrades voice quality. This error is multiplied for the harmonics
of the pitch frequency causing further degradation.
[0018] A need, therefore, exists to provide an improved method for determining the long-term
predictor lag L. The optimum solution must address both the problems of computational
complexity and voice quality for the coding of high-pitched speech. degrades voice
quality. This error is multiplied for the harmonics of the pitch frequency causing
further degradation.
[0019] A need, therefore, exists to provide an improved method for determining the long-term
predictor lag L. The optimum solution must address both the problems of computational
complexity and voice quality for the coding of high-pitched speech.
[0020] IEEE Journal on Selected Areas In Communications, February 1988, Volume 6, pages
353-363, "A Class of Analysis-by-Synthesis Predictive Coders For High Quality Speech
Coding At Rates Between 4.8 and 16 Kbits/s" discloses methods of encoding and reconstructing
speech utilising integer delay values. U.S. patent number 4,569,030, granted February
4, 1986 discloses a recursive filter having loop using both a delay element with delay
T and a delay element with delay 2T
Summary Of The Invention
[0021] Accordingly, a general object of the present invention is to provide an improved
digital speech coding technique that produces high quality speech at low bit rates.
[0022] A more specific object of the present invention is to provide a method to determine
long-term predictor parameters using the closed-loop approach.
[0023] Another object of the present invention is to provide an improved method for determining
the output response of a long-term predictor in the case when the long-term predictor
lag parameter L is a non-integer number.
[0024] A further object of the present invention is to provide an improved CELP speech coder
which permits joint optimization of the gain factor gamma and the long-term predictor
coefficient beta during the codebook search for the optimum excitation code vector.
[0025] According to aspects of the invention, there are provided methods of reconstructing
speech as set forth in claims 1 and 9 and apparatuses for reconstructing speech as
set forth in claims 5 and 11. According to further aspects of the invention, there
is provided a method of encoding speech as set forth in claim 13 and an apparatus
for encoding speech as set forth in claim 16.
Brief Description of the Drawings
[0026] The features of the present invention which are believed to be novel are set forth
with particularity in the appended claims. The invention, together with further objects
and advantages thereof, may best be understood by reference to the following description
taken in conjunction with the accompanying drawings, in the several figures of which
like-referenced numerals identify like elements, and in which:
FIG. 1 is a general block diagram of a code-excited linear predictive speech coder,
illustrating the location of a long-term filter for use with the present invention;
FIG. 2A is a detailed block diagram of an embodiment of the long-term filter of FIG.
1, illustrating the long-term predictor response where filter lag L is an integer;
FIG. 2B is a simplified diagram of a shift register which can be used to illustrate
the operation of the long-term predictor in FIG. 2A;
FIG. 2C is a detailed block diagram of another embodiment of the long-term filter
of FIG. 1, illustrating the long-term predictor, response where filter lag L is an
integer;
FIG. 3 is a detailed flowchart diagram illustrating the operations performed by the
long-term filter of FIG. 2A;
Figure 4 is a general block diagram of a speech synthesizer for use in accordance
with the present invention;
Figure 5 is a detailed block diagram of the long-term filter of Figure 1, illustrating
the sub-sample resolution long-term predictor response in accordance with the present
invention;
Figures 6A and 6B are detailed flowchart diagrams illustrating the operations performed
by the long-term filter of Figure 5; and
Figure 7 is a detailed block diagram of a pitch post filter for intercoupling the
short term filter and D/A converter of the speech synthesizer in Figure 4.
Detailed Description of the Preferred Embodiment
[0027] Referring now to Figure 1, there is shown a general block diagram of code excited
linear predictive speech coder 100 utilizing the long-term filter in accordance with
the present invention. An acoustic input signal to be analyzed is applied to speech
coder 100 at microphone 102. The input signal, typically a speech signal, is then
applied to filter 104. Filter 104 generally will exhibit bandpass filter characteristics.
However, if the speech bandwidth is already adequate, filter 104 may comprise a direct
wire connection.
[0028] The analog speech signal from filter 104 is then converted into a sequence of N pulse
samples, and the amplitude of each pulse sample is then represented by a digital code
in analog-to-digital (A/D) converter 108, as known in the art. The sampling rate is
determined by sample dock SC, which represents an 8.0 kHz rate in the preferred embodiment.
The sample clock SC is generated along with the frame clock FC via clock 112.
[0029] The digital output of A/D 108, which may be represented as input speech vector s(n),
is then applied to coefficient analyzer 110. This input speech vector s(n) is repetitively
obtained in separate frames, i.e., blocks of time, the length of which is determined
by the frame clock FC. In the preferred embodiment, input speech vector s(n), 0 ≤
n ≤ N-1 , represents a 7.5 msec frame containing N=60 samples, wherein each sample
is represented by 12 to 16 bits of da digital code. In this embodiment, for each block
of speech, a set of linear predictive coding (LPC) parameters are produced by coefficient
analyzer 110 in an open-loop configuration. The short-term predictor parameters α
i, long-term predictor coefficient β, nominal long-term predictor lag parameter L,
weighting filter parameters WFP, and excitation gain factor γ (along with the best
excitation codeword l as described later) area applied to multiplexer 150 and sent
over the channel for use by the speech synthesizer. Refer to the article entitled
"Predictive Coding of Speech at Low Bit Rates,"
IEEE Trans. Commun.. Vol. COM-30, pp. 600-14, April 1982, by B.S. Atal, for representative methods of
generating these parameters for this embodiment. The input speech vector s(n) is also
applied to subtractor 130 the function of which will subsequently be described.
[0030] Codebook ROM 120 contains a set of M excitation vectors u
i(n), wherein 1 ≤ i ≤ M, each comprised of N samples, wherein 0 ≤ n ≤ N-1 . Codebook
ROM 120 generates these pseudorandom excitation vectors in response to a particular
one of a set of excitation codewords i. Each of the M excitation vectors are comprised
of a series of random white Gaussian samples, although other types of excitation vectors
may be used with the present invention. If the excitation signal were coded at a rate
of 0.2 bits per sample for each of the 60 samples, then there would be 4096 codewords
i corresponding to the possible excitation vectors.
[0031] For each individual excitation vector u
i(n), a reconstructed speech vector s'
i(n) is generated for comparison to the input speech vector s(n). Gain block 122 scales
the excitation vector u
i(n) by the excitation gain factor γ, which is constant for the frame. The excitation
gain factor γ may be pre-computed by coefficient analyzer 110 and used to analyze
all excitation vectors as shown in Figure 1, or may be optimized jointly with the
search for the best excitation codeword I and generated by codebook search controller
140.
[0032] The scaled excitation signal γ u
i(n) is then filtered by longter filter 124 and short-term filter 126 to generate the
reconstructed speech vector s'
i(n). Filter 124 utilizes the long-term predictor parameters β and L to introduce voice
periodicity, and filter 126 utilizes the short-term predictor parameters α
i to introduce the spectral envelope, as described above. Long-term filter 124 will
be described in detail in the following figures. Note that blocks 124 and 126 are
actually recursive filters which contain the long-term predictor and short-term predictor
in their respective feedback paths.
[0033] The reconstructed speech vector s'
i(n) for the i-th excitation code vector is compared to the same block of the input
speech vector s(n) by subtracting these two signals in subtractor 130. The difference
vector e
i(n) represents the difference between the original and the reconstructed blocks of
speech. The difference vector is perceptually weighted by weighting filter 132, utilizing
the weighting filter parameters WTP generated by coefficient analyzer 110. Refer to
the preceding reference for a representative weighting filter transfer function. Perceptual
weighting accentuates those frequencies where the error is perceptually more important
to the human ear, and attenuates other frequencies.
[0034] Energy calculator 134 computes the energy of the weighted difference vector e'
i(n), and applies this error signal E
i to codebook search controller 140. The search controller compares the i-th error
signal for the present excitation vector u
i(n) against previous error signals to determine the excitation vector producing the
minimum error. The code of the i-th excitation vector having a minimum error is then
output over the channel as the best excitation code I. In the alternative, search
controller 140 may determine a particular codeword which provides an error signal
having some predetermined criteria, such as meeting a predefined error threshold.
[0035] Figure 1 illustrates one embodiment of the invention for a code-excited linear predictive
speech coder. In this embodiment, the long-term filter parameters L and β are determined
in an open-loop configuration by coefficient analyzer 110. Alternatively the long-term
filter parameters can be determined in a closed-loop configuration as described in
the aforementioned Singhal and Atal reference. Generally, performance of the speech
coder is improved using long-term filter parameters determined in the closed-loop
configuration. The novel structure of the long-term predictor according to the present
invention greatly facilitates the use of the closed-loop determination of these parameters
for lags L less than the frame length N.
[0036] Figure 2A illustrates an embodiment of long-term filter 124 of Figure 1, where L
is constrained to be an integer. Although Figure 1 shows the scaled excitation vector
γ u
i(n) from gain block 122 as being input to long-term filter 124, a representative input
speech vector s(n) has been used in Figure 2A for purposes of explanation. hence,
a frame of N samples of input speech vector s(n) is applied to adder 210. The output
of adder 210 produces the output vector b(n) for the long-term filter 124. The output
vector b(n) is fed back to delay block 230 of the long-term predictor. The nominal
long-term predictor lag parameter L is also input to delay block 230. The long-term
predictor delay block provides output vector q(n) to long-term predictor multiplier
block 220, which scales the long-term predictor response by the long-term predictor
coefficient β. The scaled output βq(n) is then applied to adder 210 to complete the
feedback loop if the recursive filter.
[0037] The output response H
n(z) of long-term filter 124 is defined in Z-transform notation as:

wherein n represents a sample number of a frame containing N samples, O ≤ n ≤ N-1
, wherein β represents a filter coefficient, wherein L represents the nominal lag
or delay of the long-term predictor, and wherein

represents the closest integer less than or equal to (n+L)/L. The long-term predictor
delay

L varies as a function of the sample number n. Thus, according to the present invention,
the actual long-term predictor delay becomes kL, wherein L is the basic or nominal
long-term predictor lag, and wherein k is an integer chosen from the set {1, 2, 3,
4, .....} as a function of the sample number n. Accordingly, the long-term filter
output response b(n) is a function of the nominal long-term predictor lag parameter
L and the filter state FS which exists at the beginning of the frame. This statement
holds true for all values of L -- even for the problematic case of when the pitch
lag L is less than the frame length N.
[0038] The function of the long-term predictor delay block 230 is to store the current input
samples in order to predict future samples. Figure 2B represents a simplified diagram
of a shift register, which may be helpful in understanding the operation of long-term
predictor delay block 230 of Figure 2A. For sample number ℓ such that n=ℓ, the current
output sample b(n) is applied to the input of the shift register, which is shown on
the right on Figure 2B. For the next sample n=ℓ+1 , the previous sample b(n) is shifted
left into the shift register. This sample now becomes the first past sample b(n-1).
For the next sample n=ℓ+2 , another sample of b(n) is shifted into the register, and
the original sample is again shifted left to become the second past sample b(n-2).
After L samples have been shifted in, the original sample has been shifted left L
number of times such that it may be represented as b(n-L).
[0039] As mentioned above, the lag L would typically be the pitch period of voiced speech
or a multiple of it. If the lag L is as least as long as the frame length N, a sufficient
number of past samples have been shifted in and stored to predict the next frame of
speech. Even in the extreme case of where L=N , and where n=N-1, b(n-L) will be b(-1),
which is indeed a past sample. Hence, the sample b(n-L) would be output from the shift
register as the output sample q(n).
[0040] If however, the long-term predictor lag parameter L is shorter than the frame length
N, then an insufficient number of samples would have been shifted into the shift register
by the beginning of the next frame. Using the above example a 250 Hz pitch period,
the pitch lag L would be equal to 32. Thus, where L=32 and N=60, and where n=N-1=59,
b(n-L) would normally be b(27), which represents a future sample with respect to the
beginning of the frame of 60 samples. In other words, not enough past samples have
been stored to provide a complete long-term predictor response. The complete long-term
predictor response is needed at the beginning of the frame such that closed-loop analysis
of the predictor parameters can be performed. According to the invention in that case,
the same stored samples b(n-L), 0 ≤ n ≤ L, are repeated such that the output response
of the long-term predictor is always a function of samples which have been input into
the long-term predictor delay block prior to the start of the current frame. In terms
of Figure 2B, the shift register has thus been extended to store another kL samples,
which represent modifying the structure of the long-term predictor delay block 230.
Hence, as the shift register fills with new samples b(n), k must be chosen such that
b(n-kL) represents a sample which existed in the shift register prior to he start
of the frame. Using the previous example of L=32 and N=60, output sample q(32) would
be a repeat of sample q(0), which is b(0- L)=b(32-2L) or b(-32).
[0041] Hence, the output response q(n) of the long-term predictor delay block 230 would
correspond to:

wherein 0 ≤ n ≤ N-I, and wherein k is chosen as the smallest integer such that (n-kL)
is negative. More specifically if a frame of N samples of s(n) is input into long-term
predictor filter 124, each sample number n is j ≤ n ≤ N+j-l
where j is the index for the first sample of a frame of N samples. Hence, the variable
k would vary such that (n-kL) is always less than j. This ensures that the long-term
predictor utilizes only samples available prior to the beginning of the frame to predict
the output response.
[0042] The operation of long-term filter 124 of Figure 2A will now be described in accordance
with the flowchart of Figure 3. Starting at step 350, the sample number n is initialized
to zero at step 351. The nominal long-term predictor lag parameter L and the long-term
predictor coefficient β are input from coefficient analyzer 110 in step 352. In step
353, the sample number n is tested to see if an entire frame has been output. If n
≥ N, operation ends at step 361. If all samples have not yet been computed, a signal
sample s(n) is input in step 354. In step 355, the output response of long-term predictor
delay block 230 is calculated according to the equation:

wherein

represents the closest integer less than or equal to (n+L)/L. For example, if n=56
and L=32, then

becomes ∟(56+32/3┘∟, which is ∟(2.75)┘ L or 2L. In step 356, the output response b(n)
of the long-ter filter is computed according to the equation:

[0043] This represents the function of multiplier 220 and adder 210. In step 357, the sample
in the shift register is shifted left one position, for all register locations between
b(n-2) and b(n-L
MAX), where L
MAX represents the maximum long-term predictor lag that can be assigned. In the preferred
embodiment, L
MAX would be equal to 143. In step 358, the output sample b(n) is input into the first
location b(n-1) of the shift register. Step 359 outputs the filtered sample b(n).
The sample number n is then incremented in step 360, and then tested in step 353.
When all N samples have been computed, the process ends at step 361.
[0044] Figure 2C is an alternative embodiment of a long-term filter incorporating the present
invention. Filter 124' is the feedforward inverse version of the recursive filter
configuration of Figure 2A. Input vector s(n) is applied to both subtracter 240 and
long-term predictor delay block 260. Delayed vector q(n) is output to multiplier 250,
which scales the vector by the long-term predictor coefficient β. The output response
H
n(z) of digital filter 124' is given in z-transform notation as:

wherein n represents the sample number of a frame containing N samples, 0 ≤ n ≤ N-1
, wherein β represents the long-term filter coefficient, wherein L represents the
nominal lag or delay of the long-term predictor, and wherein

represents the closest integer less than or equal to (n+L)/L. The output signal b(n)
of filter 124' may also be defined in terms of the input signal s(n) as:

for 0 ≤ n ≤ N-1 . As can be appreciated by those skilled in the art, the structure
of the long-term predictor has again been modified so as to repeatedly output the
same stored samples of the long-term predictor in the case of when the long-term predictor
lag L is less than the frame length N.
[0045] Referring next to Figure 5, there is illustrated the preferred embodiment of the
long-term filter 124 of Figure 1 which allows for subsample resolution for the lag
parameter L A frame of N samples of input speech vector s(n) is applied to adder 510,
The output of adder 510 produces the output vector b(n) for the long term filter 124.
The output vector b(n) is fed back to delayed vector generator block 530 of the long-term
predictor. The nominal long-term predictor lag parameter L is also input to delayed
vector generator block 530. The long-term predictor lag parameter L can take on non-integer
values. The preferred embodiment allows L to take on values which are a multiple of
one half. Alternate implementations of the sub-sample resolution long-term predictor
of the present invention could allow values which are multiples of one third or one
fourth or any other rational fraction.
[0046] In the preferred embodiment, the delayed vector generator 530 includes a memory which
holds past samples of b(n). In addition, interpolated samples of b(n) are also calculated
by delayed vector generator 530 and stored in its memory. In the preferred embodiment,
the state of the long-term predictor which is contained in delayed vector generator
530 has two samples for every stored sample of b(n). One sample is for b(n) and the
other sample represents an interpolated sample between two consecutive b(n) samples.
In this way, samples of b(n) can be obtained from delayed vector generator 530 which
correspond to integer delays or multiples of half sample delays. The interpolation
is done using interpolating finite impulse response filters as described in the book
by R. Crochiere and L. Rabiner entitled
Multirate Digital Signal Processing, published by Prentice Hall in 1983. The operation of vector delay generator 530 is
described in further detail hereinbelow in conjunction with the flowcharts in Figure
6A and 6B.
[0047] Delayed vector generator 530 provides output vector q(n) to long-term multiplier
block 520, which scales the long-term predictor response by the long-term predictor
coefficient β. The scaled output βq(n) is then applied to adder 510 to complete the
feedback loop of the recursive filter 124 in Figure 5.
[0048] Referring to Figures 6A and 6B, there are illustrated detailed flowchart diagrams
detailing the operations performed by the long-term filter of Figure 5. According
to the preferred embodiment of the present invention, the resolution of the long-term
predictor memory is extended by mapping an N point sequence b(n), onto a 2N point
vector ex(i). The negative indexed samples of ex(i) contain the extended resolution
past values of long-term filter output b(n), or the extended resolution long term
history. The mapping process doubles the temporal resolution of the long-term predictor
memory, each time it is applied. Here for simplicity single stage mapping is described,
although additional stages may be implemented in other embodiments of the present
invention.
[0049] Entering at START step 602 in Figure 6A, the flowchart proceeds to step 604, where
L, β and s(n) are inputted. At step 608, vector q(n) is constructed according to the
equation:

for 0 ≤ n ≤ N-1
wherein

represents the closest integer less than or equal to (n+L)/L and wherein L is the
long term predictor lag. For voiced speech, long term predictor lag L may be the pitch
period or a multiple of the pitch period. L may be an integer or a real number whose
fractional part is 0.5 in the preferred embodiment. When the fractional part of L
is 0.5, L has an effective resolution of half a sample.
[0050] In step 610, vector b(n) of the long-term filter is computed according to the equation:

for 0 ≤ n ≤ N-1
[0051] In step 612, vector b(n) of the long-term filter is outputted. In step 614, the extended
resolution state ex(n) is updated to generate and store the interpolated values of
b(n) in the memory of delayed vector generator 530. Step 614 is illustrated in more
detail in Figure 6B. Next, at step 616 the process has been completed and stops.
[0052] Entering at START step 622 in Figure 6B, the flowchart proceeds to step 624, where
the samples in ex(i) to be calculated in this subframe are zeroed out ex(i) = 0 for
i = -M, -M+2, ..., 2N-1, where M is chosen to be odd for an interpolating filter of
order 2M+1. For example, if the order of the filter is 39, M is 19. Although M has
been chosen to be odd for simplicity, M may also be even. At step 626, every other
sample of ex(i) for i = 0, 2, ..., 2(N-1) is initialized with samples of b(n) according
to the equation:

for i = 0,1, ..., N-1.
[0053] Thus ex(i) for i= = 0, 2, ..., 2(N-1) now holds the output vector b(n) for the current
frame mapped onto its even indices, while the odd indices of ex(i) for i = 1, 3, ...,
2(N-1)+1 are initialized with zeros.
[0054] At step 628, the interpolated samples of ex(i) initialized to zero are reconstructed
through FIR interpolation, using a symmetric, zero-phase shift filter, assuming that
the order of such FIR filter is 2M+1 as explained hereinabove. The FIR filter coefficients
are a(j), where j = -M, -M+2, ..., M-1, M and where a(j) = a(-j). Only even samples
pointed to be the FIR filter taps are used in sample reconstruction, since odd samples
have been set to zero. As a result, M+1 samples instead of 2M+1 samples are actually
weighted and summed for each reconstructed sample. The FIR interpolation is performed
according to the equation:

for i=-M,-M+2,...,2(N-1)-M-2,2(N-1)-M
[0055] Note that the first sample to be reconstructed is ex(-M), not ex(1) as one might
expect. This is because interpolated samples at indices -M,-M+2,..,-1 were reconstructed
at the previous frame using an estimate of the excitation in the current frame, since
the actual excitation samples were then undefined. At the current frame those samples
are known (we have b(n)), and thus the samples of ex(i), for i=-M,-M+2,..,-1 are now
reconstructed again, with the filter taps pointing to the actual and not estimated
values of b(n).
[0056] The largest value of i in the above equation, is 2(N-1)-M . This means that (M+1)/2
odd samples of ex(i), for i=2N-M,2N-M+2,...,2(N-1)+1, still are to be reconstructed.
However, for those values of index i, the upper taps of the interpolating filter point
to the future samples of the excitation which are as yet undefined. To calculate the
values of ex(i) for those indices, the future state of ex(i) for i=2N,2N+2,...,2N+M-1
is extended by evaluating at step 630:

for i=2N,2N+2,...,2N+M-1
[0057] The minimum value of 2L to be used in this scheme is 2M+1. This constraint may be
lifted if we define:

for i=2N,2N+2,...,2N+M-1;
where F(i-2L) for i-2L equal to odd numbers is given by:

and where F(i-2L) for i-2L equal to even numbers is given by:

[0058] The parameter λ, the history extension scaling factor, may be set equal to β, which
is the pitch predictor coefficient, or set to unity.
[0059] At step 632, with the excitation history thus extended, the last (M+1)/2 zeroed samples
of the current extended resolution frame are calculated using:

for i=2N-M,2N-M+2,..., 2(N-1)+1
[0060] These samples will be recalculated at the next subframe, once the actual excitation
samples for ex(i), i=2N,2N+2,...,2N+M-1 become available.
[0061] Thus b(n), for n=0,N-1 has been mapped onto vector ex(i), i=0,2,...,2(N-1). The missing
zeroed samples have been reconstructed using an FIR interpolating filter. Note that
the FIR interpolation is applied only to the missing samples. This ensures that no
distortion is unnecessarily introduced into the known samples, which are stored at
even indices of ex(i). An additional benefit of processing only the missing samples,
is that computation associated with the interpolation is halved.
[0062] At step 634, finally the long term predictor history is updated by shifting down
the contents of the extended resolution excitation vector ex(i) by 2N points:

for i=-2Max_L,-1
where Max_L is the maximum long term predictor delay used. Next, at step 636 the process
has been completed and stops.
[0063] Referring now to Figure 4, a speech synthesizer block diagram is illustrated using
the long-term filter of the present invention. Synthesizer 400 obtains the short-term
predictor parameters α
i, long-term predictor parameters β and L, excitation gain factor γ and the codeword
I received from the channel, via de-multiplexer 450. The codeword I is applied to
codebook ROM 420 to address the codebook of excitation vectors. The single excitation
vector u
l(n) is then multiplied by the gain factor γ in block 422, filtered by long-term predictor
filter 424 and short-term predictorfilter 426 to obtain reconstructed speech vector
s'
i(n). This vector, which represents a frame of reconstructed speech, is then applied
to analog-to-digital (A/D) convertor 408 to produce a reconstructed analog signal,
which is then low pass filtered to reduce aliasing by filter 404, and applied to an
output transducer such as speaker 402. Hence,the CELP synthesizer utilizes the same
codebook, gain block, long-term filter, and short-term filter as the CELP analyzer
of Figure 1.
[0064] Figure 7 is a detailed block diagram of a pitch post filter for intercoupling the
short term filter 426 and D/A converter 408 of the speech synthesizer in Figure 4.
A pitch post filter enhances the speech quality by removing noise introduced by the
filters 424 and 426. A frame of N samples of reconstructed speech vector s'
i(n) is applied to adder 710. The output .of adder 710 produces the output vector s"
i(n) for the pitch post filter. The output vector s"
i(n) is fed back to delayed sample generator block 730 of the pitch post filter. The
nominal long-term predictor lag parameter L is also input to delayed sample generator
block 730. L may take on non-integer values for the present invention. If L is a non-integer,
an interpolating FIR filter is used to generate the fractional sample delay needed.
Delayed sample generator 730 provides output vector q(n) to multiplier block 720,
which scales the pitch post filter response by coefficient R which is a function of
the long-term predictor coefficient β. The scaled output Rq(n) is then applied to
adder 710 to complete the feedback loop of the pitch post filter in Figure 7.
[0065] In utilizing the long-term predictor response according to the present invention,
the excitation gain factor γ and the long-term predictor coefficient β can be simultaneously
optimized for all values of L in a closed-loop configuration. This joint optimization
technique was heretofore impractical for values of L < N, since the joint optimization
equations would become non-linear in the single parameter β. The present invention
modifies the structure of the long-term predictor to allow a linear joint optimization
equation. In addition, the present invention allows the long-term predictor lag to
have better resolution than one sample thereby enhancing its performance.
[0066] Moreover, the codebook search procedure has been further simplified, since the zero
state response of the long-tenn filter becomes zero for lags less than the frame length.
This additional feature permits those skilled in the art to remove the effect of the
long-term filter from the codebooksearch procedure. Hence, a CELP speech coder has
been shown which can provide higher quality speech for all pitch rates while retaining
the advantages of practical implementation and low bit rate.
[0067] While specific embodiments of the present invention have been shown and described
herein, further modifications and improvements may be made. For example, any type
of speech coding (e.g., RELP, multipulse, RPE, LPC, etc.) may be used with the sub-sample
resolution long-term predictor filtering technique described herein. Moreover, additional
equivalent configurations of the sub-sample resolution long-term predictor structure
may be made which perform the same computations as those illustrated above.
1. A method of reconstructing speech comprising the steps of
receiving from a communication channel a set of speech parameters including a codeword
I and a delay parameter L, where the delay parameter L may have a value in a predetermined
range including integer and non-integer values related to a speech pitch period;
generating an excitation vector having N samples in response to the codeword I;
filtering the excitation vector based on at least the delay parameter L and stored
filter state samples, the step of filtering comprising the steps of:
computing interpolated filter state samples from the stored filter state samples using
a non-integer L, and
linearly combining the excitation vector with the interpolated filter state samples,
thereby forming a filter output vector having a plurality of filter output samples;
and processing the filter output vector to produce reconstructed speech,
wherein, in case of the delay parameter L < N and the delay parameter L being an integer,
the filtering step comprises repeating use of at least some of the same stored filter
state samples in producing a delayed vector for linear combination with the excitation
vector to form the filter output vector.
2. A method of reconstructing speech in accordance with claim 1 further characterized in that the step of filtering comprises combining, responsive to L being an integer, the
excitation vector with the stored filter state samples, thereby forming filter state
output samples.
3. A method of reconstructing speech in accordance with claim 1 further characterized in that the step of filtering comprises updating the stored filter state samples using the
filter output samples.
4. A method of reconstructing speech in accordance with claim 1 further comprising the
steps of: converting the reconstructed speech to an analog voice signal; and transducing
the analog voice signal into a perceptible audio output, such that the speech pitch
periods are more accurately predicted.
5. Apparatus for reconstructing speech comprising: receiving circuitry (450) receiving
from a communication channel a set of speech parameters including a codeword I and
a delay parameter L;
generating circuitry (420) for generating an excitation vector having N samples in
response to the codeword I;
filtering circuitry (124, 424) for filtering the excitation vector based on at least
the delay parameter L and stored filter state samples,
characterised in that L has a value in a predetermined range including integer and non-integer values related
to a speech pitch period and
in that the filtering circuitry comprises:
computing circuitry (530) for computing interpolated filter state samples from the
stored filtered state samples using a non-integer L, and
combining circuitry (510, 520) for linearly combining the excitation vector with the
interpolated filter state samples, thereby forming a filter output vector having a
plurality of filter output samples, and
processing circuitry (426) for processing the filter output vector to produce reconstructed
speech,
wherein, in case of the delay parameter L < N and the delay parameter L being an integer,
the filtering step comprises repeating use of at least some of the same stored filter
state samples in producing a delayed vector for linear combination with the excitation
vector to form the filter output vector.
6. Apparatus for reconstructing speech in accordance with claim 5 wherein the combining
circuitry further comprises combining, responsive to L being an integer, the excitation
vector with the stored filter state samples, thereby forming filter state output samples.
7. Apparatus for reconstructing speech in accordance with claim 5 wherein the filtering
circuitry further comprising updating circuitry for updating the stored filter state
samples using the filter output samples.
8. Apparatus for reconstructing speech in accordance with claim 5 further comprising:
converting circuitry for converting the reconstructed speech to an analog voice signal;
and
transducer circuitry for transducing the analog voice signal into a perceptible audio
output, such that the speech pitch periods are more accurately predicted.
9. A method of reconstructing speech comprising the steps of:
receiving from a communication channel a set of speech parameters including a codeword
I and a delay parameter L;
generating an excitation vector having N samples in response to the codeword I;
filtering the excitation vector based on at least the delay parameter L, a set of
stored filter state samples, and at least one set of stored interpolated filter state
samples, wherein L has a value in a predetermined range including integer and non-integer
values related to a speech pitch period the step of filtering comprises the steps
of:
choosing a set of filter state samples from the group consisting of the set of stored
filter state samples and the at least one set of stored interpolated filter state
samples, the step of choosing using the delay parameter L, and
linearly combining the excitation vector with the chosen filter state samples, thereby
forming a filter output vector having a plurality of filter output samples, and
processing the filter output vector to produce reconstructed speech,
wherein, in case of the delay parameter L < N and the delay parameter L being an integer,
the filtering step comprises repeating use of at least some of the same stored filter
state samples in producing a delayed vector for linear combination with the excitation
vector to form the filter output vector.
10. A method of reconstructing speech in accordance with claim 9 further comprising the
steps of:
converting the reconstructed speech to an analog voice signal; and transducing the
analog voice signal into a perceptible audio output, such that the speech pitch periods
are more accurately predicted.
11. Apparatus for reconstructing speech comprising:
receiving circuitry (450) for receiving from a communication channel a set of speech
parameters including a codeword I and a delay parameter L, where L may have a value
in a predetermined range including integer and non-integer values related to a speech
pitch period;
generating circuitry (420) for generating an excitation vector having N samples in
response to the codeword 1;
filtering circuitry (124, 424) for filtering the excitation vector based on at least
the delay parameter L, a set of stored filter state samples and at least one set of
stored interpolated filter state samples, the filtering circuitry comprising:
choosing circuitry for choosing a set of filter state samples from the group consisting
of the set of stored filter state samples and the at least one set of stored interpolated
filter state samples, the step of choosing using the delay parameter L, and
combining circuitry for linearly combining the excitation vector with the chosen filter
state samples, thereby forming a filter output vector having a plurality of filter
output samples; and
processing circuitry (426) for processing the filter output vector to produce reconstructed
speech,
wherein in case of the delay parameter L < N and the delay parameter L being an integer,
the filtering step comprises repeating use of at least some of the same stored filter
state samples in producing a delayed vector for linear combination with the excitation
vector to form the filter output vector.
12. Apparatus for reconstructing speech in accordance with claim 11 further comprising:
converting circuitry for converting the reconstructed speech to an analog voice signal;
and
transducing circuitry for transducing the analog voice signal into a perceptible audio
output, such that the speech pitch periods are more accurately predicted.
13. A method of encoding speech into sets of speech parameters for transmission on a communication
channel, the method comprising the steps of:
sampling a voice signal a plurality of times to provide a plurality of samples forming
a present speech vector;
generating a delay parameter L having a value in a predetermined range including integer
and non-integer values related to a speech pitch period of the present speech vector;
searching excitation vectors to determine a codeword I that best matches the present
speech vector, the step of searching comprising the steps of:
generating excitation vectors in response to corresponding codewords, each excitation
vector having N samples,
filtering each excitation vector based on at least the delay parameter L, a set of
stored filter state samples, and at least one set of interpolated filter state samples,
wherein L has a value in a predetermined range including integer and
non-integer values related to a speech pitch period, the step of filtering comprising
the steps of:
computing the interpolated filter state samples from the stored filter state samples
using a non-integer L, and
linearly combining the excitation vector with the interpolated filter state samples,
thereby forming a filter output vector having a plurality of filter output samples,
selecting the codeword I of the excitation vector for which speech synthesized using
the non-integer L differs the least from the voice signal; and
transmitting the selected parameter L together with preselected speech parameters
for the present speech vector on the communications channel, such that the speech
pitch periods are more accurately predicted,
wherein, in case of the delay parameter L < N and the delay parameter L being an integer,
the filtering step comprises repeating use of at least some of the same stored filter
state samples in producing a delayed vector for linear combination with the excitation
vector to form the filter output vector.
14. A method of encoding speech in accordance with claim 13, further
characterized in that the step of searching the excitation vectors comprises:
processing the filter output vector to produce a reconstructed speech vector; and
comparing the reconstructed speech vector to the present speech vector to determine
the difference therebetween.
15. A method of encoding speech in accordance with claim 13, further characterized in that the step of selecting the codeword I comprises selecting the codeword I of the excitation
vector for which the reconstructed speech vector differs the least from the present
speech vector.
16. Apparatus for encoding speech into sets of speech parameters for transmission on a
communication channel, the apparatus comprising:
sampling circuitry for sampling a voice signal a plurality of times to provide a plurality
of samples forming a present speech vector;
generating circuitry for generating a delay parameter L having a value in a predetermined
range including integer and non-integer values related to a speech pitch period of
the present speech vector;
searching circuitry for searching excitation vectors to determine a codeword I that
best matches the present speech vector, the searching circuitry comprising:
generating circuitry for generating excitation vectors in response to corresponding
codewords, each excitation vector having N samples;
filtering circuitry for filtering each excitation vector based on at least the delay
parameter L, a set of stored filter state samples and at least a set of stored interpolated
filter state samples, the filtering circuitry comprising:
choosing circuitry for choosing a set of filter state samples from the group consisting
of the set of stored filter state samples and the at least one set of stored interpolated
filter state samples, the choosing circuitry using the delay parameter L, and
combining circuitry for linearly combining the excitation vector with chosen filter
state samples, thereby forming a filter output vector having a plurality of filter
output samples;
processing circuitry for processing the filter output vector to produce a reconstructed
speech vector;
comparing circuitry for comparing the reconstructed speech vector to the present speech
vector to determine the difference therebetween;
selecting circuitry for selecting the codeword I of the excitation vector for which
the reconstructed speech vector differs the least from the present speech vector;
and
transmitting circuitry for transmitting the selected codeword I and delay parameter
L together with pre-selected speech parameters for the present speech vector on the
communications channel, such that the speech pitch periods are more accurately predicted,
wherein, in case of the delay parameter L < N and the delay parameter L being an integer,
the filtering step comprises repeating use of at least some of the same stored filter
state samples in producing a delayed vector for linear combination with the excitation
vector to form the filter output vector.
1. Verfahren zum Rekonstruieren von Sprache mit den folgenden Schritten:
Empfangen eines Satzes von Sprachparametern, die ein Codewort I und einen Verzögerungsparameter
L enthalten, von einem Kommunikationskanal, wobei der Verzögerungsparameter L einen
Wert in einem vorgegebenen Bereich aufweisen kann, der ganzzahlige und nicht ganzzahlige
Werte, die sich auf eine Tonhöhenperiodizität der Sprache beziehen, enthält;
Erzeugen eines Anregungsvektors mit N Abtastungen in Reaktion auf das Codewort I;
Filtern des Anregungsvektors zumindest basierend auf dem Verzögerungsparameter L und
gespeicherten Filterzustandsabtastungen, wobei der Schritt des Filterns die folgenden
Schritte umfasst:
Berechnen interpolierter Filterzustandsabtastungen aus den gespeicherten Filterzustandsabtastungen
unter Verwendung eines nicht ganzzahligen L-Werts, und
lineares Kombinieren des Anregungsvektors mit den interpolierten Filterzustandsabtastungen,
um dadurch einen Filterausgangsvektor mit einer Vielzahl Filterausgangsabtastungen zu bilden;
und Verarbeiten des Filterausgangsvektors, um rekonstruierte Sprache zu erzeugen,
wobei, wenn der Verzögerungsparameter L < N und der Verzögerungsparameter L ein ganzzahliger
Wert ist, der Filterungsschritt eine Wiederholung der Benutzung zumindest eines Teils
der gleichen gespeicherten Filterzustandsabtastungen bei der Erzeugung eines Verzögerungsvektors
zur Linearkombination mit dem Anregungsvektor umfasst, um den Filterausgangsvektor
zu erzeugen.
2. Verfahren zum Rekonstruieren von Sprache nach Anspruch 1, weiter dadurch gekennzeichnet, dass der Schritt des Filterns in Reaktion darauf, dass L ganzzahlig ist, das Kombinieren
des Anregungsvektors mit den gespeicherten Filterzustandsabtastungen enthält, um dadurch Filterzustandsausgangsabtastungen zu bilden.
3. Verfahren zum Rekonstruieren von Sprache nach Anspruch 1, weiter dadurch gekennzeichnet, dass der Schritt des Filterns das Aktualisieren der gespeicherten Filterzustandsabtastungen
unter Verwendung der Filterausgangsabtastungen enthält.
4. Verfahren zum Rekonstruieren von Sprache nach Anspruch 1, weiter die folgenden Schritte
umfassend:
Konvertieren der rekonstruierten Sprache in ein analoges Sprachsignal; und
Umwandeln des analogen Sprachsignals in eine wahmehmbare Audioausgabe, so dass die
Tonhöhenperiodizitäten der Sprache genauer vorhergesagt werden.
5. Vorrichtung zum Rekonstruieren von Sprache umfassend:
einen Empfangsschaltkreis (450) zum Empfangen eines Satzes Sprachparameter von einem
Kommunikationskanal, die ein Codewort I und einen Verzögerungsparameter L enthalten;
einen Erzeugungsschaltkreis (420) zum Erzeugen, eines Anregungsvektors mit N Abtastungen
in Reaktion auf das Codewort I;
einen Filterschaltkreis (124, 424) zum Filtern des Anregungsvektors zumindest basierend
auf dem Verzögerungsparameter L und gespeicherten Filterzustandsabtastungen, dadurch gekennzeichnet, dass L einen Wert in einen vorgegebenen Bereich aufweist, der ganzzahlige und nicht ganzzahlige
Werte enthält, die sich auf eine Tonhöhenperiodizität der Sprache beziehen, sowie
dadurch, dass der Filterschaltkreis umfasst:
einen Berechnungsschaltkreis (530) zum Berechnen interpolierter Filterzustandsabtastungen
aus den gespeicherten Filterzustandsabtastungen unter Verwendung eines nicht ganzzahligen
L-Werts, und
einen Kombinationsschaltkreis (510, 520) zum linearen Kombinieren des Anregungsvektors
mit den interpolierten Filterzustandsabtastungen, um dadurch einen Filterausgangsvektor mit einer Vielzahl von Filterausgangsabtastungen zu bilden;
und
einen Verarbeitungsschaltkreis (426) zum Verarbeiten des Filterausgangsvektors, um
rekonstruierte Sprache zu erzeugen,
wobei, wenn der Verzögerungsparameter L < N und der Verzögerungsparameter L ein ganzzahliger
Wert ist, der Filterungsschritt eine Widerholung der Benutzung zumindest eines Teils
der gleichen gespeicherten Filterzustandsabtastungen bei der Erzeugung eines Verzögerungsvektors
zur Linearkombination mit dem Anregungsvektor umfasst, um den Filterausgangsvektor
zu erzeugen.
6. Vorrichtung zum Rekonstruieren von Sprache nach Anspruch 5, wobei der Kombinierschaltkreis
weiter in Reaktion darauf, dass L ganzzahlig ist, das Kombinieren des Anregungsvektors
mit den gespeicherten Filterzustandsabtastungen enthält, um dadurch Filterausgangszustandsabtastungen zu bilden.
7. Vorrichtung zum Rekonstruieren von Sprache nach Anspruch 5, wobei der Filterschaltkreis
weiter einen Aktualisierungsschaltkreis zum Aktualisieren der gespeicherten Filterzustandsabtastungen
unter Verwendung der Filterausgangsabtastungen umfasst.
8. Vorrichtung zum Rekonstruieren von Sprache nach Anspruch 5, weiter umfassend:
einen Konvertierungsschaltkreis zum Konvertieren der rekonstruierten Sprache in ein
analoges Sprachsignal; und
einen Umwandlungsschaltkreis zum Umwandeln des analogen Sprachsignals in eine wahmehmbare
Audioausgabe, so dass die Tonhöhenperiodizitäten der Sprache genauer vorhergesagt
werden.
9. Verfahren zum Rekonstruieren von Sprache, die folgenden Schritte umfassend:
Empfangen eines Satzes Sprachparameter von einem Kommunikationskanal, die ein Codewort
I und einen Verzögerungsparameter L enthalten;
Erzeugen eines Anregungsvektors mit N Abtastungen in Reaktion auf das Codewort 1;
Filtern des Anregungsvektors zumindest auf der Grundlage des Verzögerungsparameters
L, eines Satzes gespeicherter Filterzustandsabtastungen und zumindest eines Satzes
gespeicherter interpolierter Filterzustandsabtastungen, wobei L einen Wert in einem
vorgegebenen Bereich hat, der ganzzahlige und nicht-ganzzahlige Werte, die sich auf
die Tonhöhenperiodizität der Sprache beziehen, umfasst, und wobei der Schritt des
Filterns die folgenden Schritte umfasst:
Wählen eines Satzes Filterzustandsabtastungen aus der Gruppe, die aus dem Satz gespeicherter
Filterzustandsabtastungen und dem zumindest einen Satz gespeicherter interpolierter
Filterzustandsabtastungen besteht, wobei der Schritt des Wählens den Verzögerungsparameter
L verwendet, und
lineares Kombinieren des Anregungsvektors mit den gewählten Filterzustandsabtastungen,
um dadurch einen Filterausgangsvektor mit einer Vielzahl von Filterausgangsabtastungen zu bilden,
und
Verarbeiten des Filterausgangsvektors, um rekonstruierte Sprache zu erzeugen.
wobei, wenn der Verzögerungsparameter L < N und der Verzögerungsparameter L ein ganzzahliger
Wert ist, der Filterungsschritt eine Widerholung der Benutzung zumindest eines Teils
der gleichen gespeicherten Filterzustandsabtastungen bei der Erzeugung eines Verzögerungsvektors
zur Linearkombination mit dem Anregungsvektor umfasst, um den Filterausgangsvektor
zu erzeugen.
10. Verfahren zum Rekonstruieren von Sprache nach Anspruch 9, weiter die folgenden Schritte
umfassend:
Konvertieren der rekonstruierten Sprache in ein analoges Sprachsignal; und
Umwandeln des analogen Sprachsignals in eine wahmehmbare Audioausgabe, so dass die
Tonhöhenperiodizitäten der Sprache genauer vorhergesagt werden.
11. Vorrichtung zum Rekonstruieren von Sprache umfassend:
einen Empfangsschaltkreis (450) zum Empfangen eines Satzes Sprachparameter von einem
Kommunikationskanal, die ein Codewort I und einen Verzögerungsparameter L enthalten,
wobei L einen Wert in einem vorgegebenen Bereich aufweisen kann, der ganzzahlige und
nicht ganzzahlige Werte enthält, die sich auf eine Tonhöhenperiodizität der Sprache
beziehen;
einen Erzeugungsschaltkreis (420) zum Erzeugen eines Anregungsvektors mit N Abtastungen
in Reaktion auf das Codewort l;
einen Filterschaltkreis (124, 424) zum Filtern des Anregungsvektors auf der Grundlage
zumindest des Verzögerungsparameters L, eines Satzes gespeicherter Filterzustandsabtastungen
und zumindest eines Satzes gespeicherter interpolierter Filterzustandsabtastungen,
wobei der Filterschaltkreis umfasst:
einen Auswahlschaltkreis zum Auswählen eines Satzes Filterzustandsabtastungen aus
der Gruppe, die aus dem Satz gespeicherter Filterzustandsabtastungen und dem zumindest
einem Satz gespeicherter interpolierter Filterzustandsabtastungen besteht, wobei der
Schritt des Wählens den Verzögerungsparameter L verwendet, und
einen Kombinationsschaltkreis zum linearen Kombinieren des Anregungsvektors mit den
gewählten Filterzustandsabtastungen, um dadurch einen Filterausgangsvektor mit einer Vielzahl von Filterausgangsabtastungen zu bilden;
und
einen Verarbeitungsschaltkreis (426) zum Verarbeiten des Filterausgangsvektors, um
rekonstruierte Sprache zu erzeugen,
wobei, wenn der Verzögerungsparameter L < N und der Verzögerungsparameter L ein ganzzahliger
Wert ist, der Filterungsschritt eine Wiederholung der Benutzung zumindest eines Teils
der gleichen gespeicherten Filterzustandsabtastungen bei der Erzeugung eines Verzögerungsvektors
zur Linearkombination mit dem Anregungsvektor umfasst, um den Filterausgangsvektor
zu erzeugen.
12. Vorrichtung zum Rekonstruieren von Sprache nach Anspruch 11, weiter umfassend:
einen Konvertierungsschaltkreis zum Konvertieren der rekonstruierten Sprache in ein
analoges Sprachsignal; und
einen Umwandlungsschaltkreis zum Umwandeln des analogen Sprachsignals in eine wahmehmbare
Audioausgabe, so dass die Tonhöhenperiodizitäten der Sprache genauer vorhergesagt
wird.
13. Verfahren zum Codieren von Sprache in Sätze von Sprachparametern zur Übertragung auf
einem Kommunikationskanal, wobei das Verfahren die folgenden Schritte umfasst:
mehrfaches Abtasten eines Sprachsignals, um eine Vielzahl von Abtastungen zu schaffen,
die einen momentanen Sprachvektor bilden;
Erzeugen eines Verzögerungswerts L, der einen Wert in einem vorgegebenen Bereich aufweist,
der ganzzahlige und nicht ganzzahlige Werte enthält, die sich auf eine Tonhöhenperiodizität
der Sprache in dem momentanen Sprachvektor beziehen;
Suchen von Anregungsvektoren, um ein Codewort I zu bestimmen, das am besten mit dem
momentanen Sprachvektor übereinstimmt, wobei der Schritt des Suchens die folgenden
Schritte umfasst:
Erzeugen von Anregungsvektoren in Reaktion auf entsprechende Codeworte, wobei jeder
Anregungsvektor aus N Abtastungen besteht;
Filtern jedes Anregungsvektors basierend auf zumindest dem Verzögerungsparameter L,
einem Satz gespeicherter Filterzustandsabtastungen und zumindest einem Satz interpolierter
Filterzustandsabtastungen, wobei L einen Wert in einem vorgegebenen Bereich hat, der
ganzzahlige und nicht-ganzzahlige Werte umfasst, die sich auf die Tonhöhenperiodizität
der Sprache beziehen, wobei der Schritt des Filterns folgende Schritte umfasst.
Berechnen der interpolierten Filterzustandsabtastungen aus den gespeicherten Filterzustandsabtastungen
unter Verwendung eines nicht ganzzahligen L-Werts, und
lineares Kombinieren des Anregungsvektors mit den interpolierten Filterzustandsabtastungen,
um dadurch einen Filterausgangsvektor mit einer Vielzahl von Filterausgangsabtastungen zu bilden;
Wählen des Codeworts I des Anregungsvektors, für den die Sprache, die unter Verwendung
des nicht ganzzahligen L-Werts synthetisiert wird, sich am geringsten vom Sprachsignal
unterscheidet; und
Übertragen des gewählten Parameters L zusammen mit im voraus gewählten Sprachparametern
für den momentanen Sprachvektor auf dem Kommunikationskanal, so dass die Tonhöhenperiodizitäten
der Sprache genauer vorhergesagt werden,
wobei, wenn der Verzögerungsparameter L < N und der Verzögerungsparameter L ein ganzzahliger
Wert ist, der Filterungsschritt eine Widerholung der Benutzung zumindest eines Teils
der gleichen gespeicherten Filterzustandsabtastungen bei der Erzeugung eines Verzögerungsvektors
zur Linearkombination mit dem Anregungsvektor umfasst, um den Filterausgangsvektor
zu erzeugen.
14. Verfahren zum Codieren von Sprache nach Anspruch 13, weiter
dadurch gekennzeichnet, dass der Schritt des Suchens des Anregungsvektors umfasst:
Verarbeiten des Filterausgangsvektors, um einen rekonstruierten Sprachvektor zu erzeugen;
und
Vergleichen des rekonstruierten Sprachvektors mit dem momentanen Sprachvektor, um
den Unterschied zwischen ihnen zu bestimmen.
15. Verfahren zum Codieren von Sprache nach Anspruch 13, weiter dadurch gekennzeichnet, dass der Schritt des Auswählens des Codeworts I das Auswählen des Codeworts I des Anregungsvektors
umfasst, für den der rekonstruierte Sprachvektor sich am geringsten vom momentanen
Sprachvektor unterscheidet.
16. Vorrichtung zum Codieren von Sprache in Sätze von Sprachparametern zur Übertragung
auf einem Kommunikationskanal, wobei die Vorrichtung umfasst:
einen Abtastschalkreis zum mehrfachen Abtasten eines Sprachsignals, um eine Vielzahl
von Abtastungen zu schaffen, die einen momentanen Sprachvektor bilden;
einen Erzeugungsschaltkreis zum Erzeugen eines Verzögerungsparameters L, der einen
Wert in einem vorgegebenen Bereich aufweist, der ganzzahlige und nicht ganzzahlige
Werte enthält, die sich auf eine Tonhöhenperiodizität der Sprache in dem momentanen
Sprachvektors beziehen;
einen Suchschaltkreis zum Suchen von Anregungsvektoren, um ein Codewort I zu bestimmen,
das mit dem momentanen Sprachvektor am besten übereinstimmt, wobei der Suchschaltkreis
umfasst:
einen Erzeugungsschaltkreis zum Erzeugen von Anregungsvektoren in Reaktion auf entsprechende
Codeworte, wobei jeder Anregungsvektor aus N Abtastungen besteht;
einen Filterschaltkreis zum Filtern jedes Anregungsvektors basierend auf zumindest
dem Verzögerungsparameter L, einem Satz gespeicherter Filterzustandsabtastungen und
zumindest einem Satz gespeicherter interpolierter Filterzustandsabtastungen, wobei
die Filterschaltungseinrichtung umfasst:
einen Auswählschaltkreis zum Auswählen eines Satzes Filterzustandsabtastungen aus
der Gruppe, die aus dem Satz gespeicherter Filterzustandsabtastungen und dem zumindest
einen Satz gespeicherter interpolierter Filterzustandsabtastungen besteht, wobei der
Auswählschaltkreis den Verzögerungsparameter L verwendet, und
einen Kombinationsschaltkreis zum linearen Kombinieren des Anregungsvektors mit gewählten
Filterzustandsabtastungen, um dadurch einen Filterausgangsvektor mit einer Vielzahl von Filterausgangsabtastungen zu bilden;
einen Verarbeitungsschaltkreis zum Verarbeiten des Filterausgangsvektors, um einen
rekonstruierten Sprachvektor zu erzeugen;
einen Vergleichsschaltkreis zum Vergleichen des rekonstruierten Sprachvektors mit
dem momentanen Sprachvektor, um den Unterschied zwischen ihnen zu bestimmen;
einen Auswahlschaltkreis zum Auswählen des Codeworts I des Anregungsvektors, für den
der rekonstruierte Sprachvektor sich am geringsten vom momentanen Sprachvektor unterscheidet;
und
einen Übertragungsschaltkreis zum Übertragen des gewählten Codeworts I und des Verzögerungsparameters
L zusammen mit im voraus gewählten Sprachparametern auf dem Kommunikationskanal für
den momentanen Sprachvektor, so dass die Tonhöhenperiodizitäten der Sprache genauer
vorhergesagt werden,
wobei, wenn der Verzögerungsparameter L < N und der Verzögerungsparameter L ein ganzzahliger
Wert ist, der Filterungsschritt eine Wiederholung der Benutzung zumindest eines Teils
der gleichen gespeicherten Filterzustandsabtastungen bei der Erzeugung eines Verzögerungsvektors
zur Linearkombination mit dem Anregungsvektor umfasst, um den Filterausgangsvektor
zu erzeugen.
1. Procédé de reconstitution de signaux vocaux comprenant les étapes consistant à:
recevoir d'une voie de communication un ensemble de paramètres de signaux vocaux comportant
un mot de code 1 et un paramètre de retard L, le paramètre de retard L pouvant prendre
une valeur comprise dans un éventail prédéterminé comportant des valeurs entières
et non entières liées à une période de hauteur de son des signaux vocaux ;
générer un vecteur d'excitation ayant N échantillons en réponse au mot de code 1;
filtrer le vecteur d'excitation en fonction d'au moins le paramètre de retard L et
d'échantillons d'état de filtre stockés, l'étape de filtrage comprenant les étapes
consistant à :
calculer des échantillons d'état de filtre interpolés à-partir des échantillons d'état
de filtre stockés en utilisant un L non entier, et
combiner linéairement le vecteur d'excitation aux échantillons d'état de filtre interpolés,
ce qui permet de former un vecteur de sortie de filtre ayant une pluralité d'échantillons
de sortie de filtre ; et
traiter le vecteur de sortie de filtre afin de produire un signal vocal reconstitué,
dans lequel, dans le cas où le paramètre de retard L < N et le paramètre de retard
L est un entier, l'étape de filtrage comprend l'utilisation répétée d'au moins quelques-uns
des mêmes échantillons d'état de filtre stockés lors de la production d'un vecteur
de retard pour la combinaison linéaire avec le vecteur d'excitation en vue de former
le vecteur de sortie de filtre.
2. Procédé de reconstitution de signaux vocaux selon la revendication 1, caractérisé en outre en ce que l'étape de filtrage comprend la combinaison, en réponse au fait que L est un entier,
du vecteur d'excitation avec les échantillons d'état de filtre stockés, ce qui permet
de former des échantillons de sortie d'état de filtre.
3. Procédé de reconstitution de signaux vocaux selon la revendication 1, caractérisé en outre en ce que l'étape de filtrage comprend la mise à jour des échantillons d'état de filtre stockés
en utilisant les échantillons de sortie de filtre.
4. Procédé de reconstitution de signaux vocaux selon la revendication 1, comprenant en
outre les étapes consistant à :
convertir les signaux vocaux reconstitués en un signal vocal analogique ; et
convertir le signal vocal analogique en une sortie audio perceptible, de telle sorte
que les périodes de hauteur de son des signaux vocaux soient prévues de façon plus
précise.
5. Appareil de reconstitution de signaux vocaux comprenant:
un circuit de réception (450) permettant de recevoir d'une voie de communication un
ensemble de paramètres de signaux vocaux comportant un mot de code 1 et un paramètre
de retard L ;
un circuit de génération (420) permettant de générer un vecteur d'excitation ayant
N échantillons en réponse au mot de code I;
un circuit de filtrage (124, 424) permettant de filtrer le vecteur d'excitation en
fonction d'au moins le paramètre de retard L et d'échantillons d'état de filtre stockés,
caractérisé en ce que L présente une valeur comprise dans un éventail prédéterminé comprenant des valeurs
entières et non entières liées à une période de hauteur de son des signaux vocaux,
et en ce que le circuit de filtrage comprend:
un circuit de calcul (530) permettant de calculer des échantillons d'état de filtre
interpolés à-partir des échantillons d'état de filtre stockés en utilisant un L non
entier, et
un circuit de combinaison (510, 520) permettant de combiner linéairement le vecteur
d'excitation aux échantillons d'état de filtre interpolés, ce qui permet de former
un vecteur de sortie de filtre ayant une pluralité d'échantillons de sortie de filtre;
et
un circuit de traitement (426) permettant de traiter le vecteur de sortie de filtre
afin de produire un signal vocal reconstitué,
dans lequel, dans le cas où le paramètre de retard L < N et le paramètre de retard
L est un entier, l'étape de filtrage comprend l'utilisation répétée d'au moins quelques-uns
des mêmes échantillons d'état de filtre stockés lors de la production d'un vecteur
de retard pour la combinaison linéaire avec le vecteur d'excitation en vue de former
le vecteur de sortie de filtre.
6. Appareil de reconstitution de signaux vocaux selon la revendication 5, dans lequel
le circuit de combinaison comprend en outre la combinaison, en réponse au fait que
L est un entier, du vecteur d'excitation avec les échantillons d'état de filtre stockés,
ce qui permet de former les échantillons de sortie d'état de filtre.
7. Appareil de reconstitution de signaux vocaux selon la revendication 5, dans lequel
le circuit de filtrage comprend en outre un circuit de mise à jour permettant de mettre
à jour les échantillons d'état de filtre stockés en utilisant les échantillons de
sortie de filtre.
8. Appareil de reconstitution de signaux vocaux selon la revendication 5, comprenant
en outre :
un circuit de conversion permettant de convertir les signaux vocaux reconstitués en
un signal vocal analogique ; et
un circuit de conversion permettant de convertir le signal vocal analogique en une
sortie audio perceptible, de telle sorte que les périodes de hauteur de son des signaux
vocaux soient prévues de façon plus précise.
9. Procédé de reconstitution de signaux vocaux comprenant les étapes consistant à:
recevoir d'une voie de communication un ensemble de paramètres de signaux vocaux comportant
un mot de code 1 et un paramètre de retard L ;
générer un vecteur d'excitation ayant N échantillons en réponse au mot de code I;
filtrer le vecteur d'excitation en fonction d'au moins le paramètre de retard L, un
ensemble d'échantillons d'état de filtre stockés, et d'au moins un ensemble d'échantillons
d'état de filtre interpolés et stockés, dans lequel L a une valeur comprise dans un
éventail prédéterminé comportant des valeurs entières et non entières liées à une
période de hauteur de son des signaux vocaux, l'étape de filtrage comprenant les étapes
consistant à:
choisir un ensemble d'échantillons d'état de filtre dans le groupe composé de l'ensemble
des échantillons d'état de filtre stockés et, du au moins un ensemble d'échantillons
d'état de filtre interpolés et stockés, l'étape consistant à faire un choix utilisant
le paramètre de retard L, et
combiner linéairement le vecteur d'excitation aux échantillons d'état de filtre choisis,
ce qui permet de former un vecteur de sortie de filtre ayant une pluralité d'échantillons
de sortie de filtre; et
traiter le vecteur de sortie de filtre pour produire les signaux vocaux reconstitués,
dans lequel, dans le cas où le paramètre de retard L < N, et le paramètre de retard
L est un entier, l'étape de filtrage comprend l'utilisation répétée d'au moins quelques-uns
des mêmes échantillons d'état de filtre stockés lors de la production d'un vecteur
de retard pour la combinaison linéaire avec le vecteur d'excitation en vue de former
le vecteur de sortie de filtre.
10. Procédé de reconstitution de signaux vocaux selon la revendication 9, comprenant en
outre les étapes consistant à:
convertir le signal vocal reconstitué en un signal vocal analogique; et
convertir le signal vocal analogique en une sortie audio perceptible, de telle sorte
que les périodes de hauteur de son des signaux vocaux soient prévues de façon plus
précise.
11. Appareil de reconstitution de signaux vocaux comprenant:
un circuit de réception (450) permettant de recevoir depuis une voie de communication
un ensemble de paramètres de signaux vocaux comportant un mot de code I et un paramètre
de retard L, L pouvant avoir une valeur comprise dans un éventail prédéterminé comportant
des valeurs entières et non entières liées à une période de hauteur de son de signaux
vocaux;
un circuit de génération (420) permettant de générer un vecteur d'excitation ayant
N échantillons en réponse au mot de code I;
un circuit de filtrage (124,424) permettant de filtrer le vecteur d'excitation en
fonction d'au moins le paramètre de retard L, un ensemble d'échantillons d'état de
filtre stockés et d'au moins un ensemble d'échantillons d'état de filtre interpolés
et stockés, le circuit de filtrage comprenant:
un circuit de choix permettant de choisir un ensemble d'échantillons d'état de filtre
à-partir du groupe composé de l'ensemble des échantillons d'état de filtre stockés
et d'au moins un ensemble d'échantillons d'état de filtre interpolés et stockés, l'étape
de choix utilisant le paramètre de retard L, et
un circuit de combinaison pour combiner linéairement le vecteur d'excitation aux échantillons
d'état de filtre choisis, ce qui permet de former un vecteur de sortie de filtre ayant
une pluralité d'échantillons de sortie de filtre ; et
un circuit de traitement (426) permettant de traiter le vecteur de sortie de filtre
pour produire un signal vocal reconstitué,
dans lequel, dans le cas où le paramètre de retard L < N et le paramètre de retard
L est un entier, l'étape de filtrage comprend l'utilisation répétée d'au moins quelques-uns
des mêmes échantillons d'état de filtre stockés lors de la production d'un vecteur
de retard pour la combinaison linéaire avec le vecteur d'excitation en vue de former
le vecteur de sortie de filtre.
12. Appareil de reconstitution de signaux vocaux selon la revendication 11, comprenant
en outre :
un circuit de conversion permettant de convertir les signaux vocaux reconstitués en
un signal vocal analogique, et
un circuit de conversion permettant de convertir le signal vocal analogique en une
sortie audio perceptible, de telle sorte que les périodes de hauteur de son des signaux
vocaux soient prévues de façon plus précise.
13. Procédé de codage de signaux vocaux en ensembles de paramètres de signaux vocaux en
vue de leur transmission sur une voie de communication, le procédé comprenant les
étapes consistant à :
échantillonner un signal vocal une pluralité de fois pour foumir une pluralité d'échantillons
formant un vecteur de signaux vocaux présent;
générer un paramètre de retard L ayant une valeur comprise dans un éventail prédéterminé
comportant des valeurs entières et non entières liées à la période de hauteur de son
des signaux vocaux du vecteur de signaux vocaux présent;
rechercher des vecteurs d'excitation pour déterminer un mot de code I qui correspond
le mieux au vecteur de signaux vocaux présent, l'étape de recherche comprenant les
étapes consistant à :
générer des vecteurs d'excitation en réponse à des mots de code correspondants, chaque
vecteur d'excitation ayant N échantillons;
filtrer chaque vecteur d'excitation en fonction d'au moins le paramètre de retard
L, un ensemble d'échantillons d'état de filtre stockés, et d'au moins un ensemble
d'échantillons d'état de filtre interpolés, dans lequel L a une valeur comprise dans
un éventail prédéterminé comportant des valeurs entières et non entières liées à une
période de hauteur de son des signaux vocaux, l'étape de filtrage comprenant les étapes
consistant à :
calculer les échantillons d'état de filtre interpolés à-partir des échantillons d'état
de filtre stockés en utilisant un L non entier, et
combiner linéairement le vecteur d'excitation aux échantillons d'état de filtre interpolés,
ce qui permet de former un vecteur de sortie de filtre ayant une pluralité d'échantillons
de sortie de filtre ;
sélectionner le mot de code 1 du vecteur d'excitation pour lequel les signaux vocaux
synthétisés en utilisant le L non entier diffèrent le moins des signaux vocaux ; et
transmettre le paramètre choisi L avec des paramètres de signaux vocaux présélectionnés
pour le présent vecteur de signaux vocaux sur la voie de communication, de telle sorte
que les périodes de hauteur de son des signaux vocaux soient prévues de façon plus
précise,
dans lequel, dans le cas où le paramètre de retard L < N et le paramètre de retard
L est un entier, l'étape de filtrage comprend l'utilisation répétée d'au moins quelques-uns
des mêmes échantillons d'état de filtre stockés lors de la production d'un vecteur
de retard pour la combinaison linéaire avec le vecteur d'excitation en vue de former
le vecteur de sortie de filtre.
14. Procédé de codage de signaux vocaux selon la revendication 13,
caractérisé en outre en ce que l'étape de recherche des vecteurs d'excitation comprend :
le traitement du vecteur de sortie de filtre pour produire un vecteur de signaux vocaux
reconstitués ; et
la comparaison du vecteur de signaux vocaux reconstitués au vecteur de signaux vocaux
présent pour déterminer la différence entre les deux.
15. Procédé de codage de signaux vocaux selon la revendication 13, caractérisé en outre en ce que l'étape de sélection du mot de code I comprend la sélection du mot de code I du vecteur
d'excitation pour lequel le vecteur de signaux vocaux reconstitués diffère le moins
du présent vecteur de signaux vocaux.
16. Appareil de codage de signaux vocaux en ensembles de paramètres de signaux vocaux
en vue de leur transmission sur une voie de communication, l'appareil comprenant:
un circuit d'échantillonnage pour échantillonner un signal vocal une pluralité de
fois pour fournir une pluralité d'échantillons formant un vecteur de signaux vocaux
présent ;
un circuit de génération pour générer un paramètre de retard L ayant une valeur comprise
dans un éventail prédéterminé comportant des valeurs entières et non entières liées
à la période de hauteur de son des signaux vocaux du présent vecteur de signaux vocaux
;
un circuit de recherche pour rechercher des vecteurs d'excitation afin de déterminer
un mot de code 1 qui corresponde le mieux au présent vecteur de signaux vocaux, le
circuit de recherche comprenant:
un circuit de génération pour générer des vecteurs d'excitation en réponse aux mots
de code correspondants, chaque vecteur d'excitation ayant N échantillons;
un circuit de filtrage pour filtrer chaque vecteur d'excitation en fonction d'au moins
le paramètre de retard L, un ensemble d'échantillons d'état de filtre stockés et d'au
moins un ensemble d'échantillons d'état de filtre interpolés et stockés, le circuit
de filtrage comprenant:
un circuit de choix pour choisir un ensemble d'échantillons d'état de filtre dans
le groupe composé de l'ensemble des échantillons d'état de filtre stockés et du au
moins un ensemble des échantillons d'état de filtre interpolés et stockés, le circuit
de choix utilisant le paramètre de retard L,
un circuit de combinaison pour combiner linéairement le vecteur d'excitation avec
les échantillons d'état de filtre choisis afin de former un vecteur de sortie de filtre
ayant une pluralité d'échantillons de sortie de filtre ;
un circuit de traitement pour traiter le vecteur de sortie de filtre afin de produire
un vecteur de signaux vocaux reconstitués;
un circuit de comparaison pour comparer le vecteur de signaux vocaux reconstitués
au présent vecteur de signaux vocaux afin de déterminer la différence entre les deux
;
un circuit de sélection pour sélectionner le mot de code 1 du vecteur d'excitation
pour lequel le vecteur des signaux vocaux reconstitués diffère le moins du présent
vecteur de signaux vocaux; et
un circuit de transmission pour transmettre le mot de code sélectionné et le paramètre
de retard L avec des paramètres de signaux vocaux présélectionnés pour le présent
vecteur de signaux vocaux sur la voie de communication, de telle sorte que les périodes
de hauteur de son des signaux vocaux soient prévues de façon plus précise,
dans lequel, dans le cas où le paramètre de retard L < N et le paramètre de retard
L est un entier, l'étape de filtrage comprend l'utilisation répétée d'au moins quelques-uns
des mêmes échantillons d'état de filtre stockés lors de la production d'un vecteur
de retard pour la combinaison linéaire avec le vecteur d'excitation en vue de former
le vecteur de sortie de filtre.