TECHNICAL FIELD OF THE INVENTION
[0001] This invention relates to switched-predictive vector quanzization and more particularly
to quantization of LPC coefficients transformed to line spectral frequencies.
BACKGROUND OF THE INVENTION
[0002] Many speech coders, such as the new 2.4 kb/s Federal Standard Mixed Excitation Linear
Prediction (MELP) coder (McCree, et al., entitled, "A 2.4 kbits/s MELP Coder Candidate
for the New U. S. Federal Standard," Proc. ICASSP-96, pp. 200-203, May 1996.) use
some form of Linear Predictive Coding (LPC) to represent the spectrum of the speech
signal. A MELP coder is described in the Applicant's co-pending Application Serial
No. 08/650,585, entitled "Mixed Excitation Linear Prediction with Fractional Pitch,"
filed 05/20/96, incorporated herein by reference. Fig. 1 illustrates such a MELP coder.
The MELP coder is based on the traditional LPC vocoder with either a periodic impulse
train or white noise exciting a 10th order on all-pole LPC filter. In the enhanced
version, the synthesizer has the added capabilities of mixed pulse and noise excitation
periodic or aperiodic pulses, adaptive spectral enhancement and pulse dispersion filter
as shown in Fig. 1. Efficient quantization of the LPC coefficients is an important
problem in these coders, since maintaining accuracy of the LPC has a significant effect
on processed speech quality, but the bit rate of the LPC quantizer must be low in
order to keep the overall bit rate of the speech coder small. The MELP coder for the
new Federal Standard uses a 25-bit multi-stage vector quantizer (MSVQ) for line spectral
frequencies (LSF) . There is a 1 to 1 transformation between the LPC coefficients
and LSF coefficients.
[0003] Quantization is the process of converting input values into discrete values in accordance
with some fidelity criterion. A typical example of quantization is the conversion
of a continuous amplitude signal into discrete amplitude values. The signal is first
sampled, then quantized.
[0004] For quantization, a range of expected values of the input signal is divided into
a series of subranges. Each subrange has an associated quantization level. A sample
value of the input signal that is within a certain subrange is converted to the associated
quantizing level. For example, for 8-bit quantization, a sample of the input signal
would be converted to one of 256 levels, each level represented by an 8-bit value.
[0005] Vector quantization is a method of quantization, which is based on the linear and
non-linear correlation between samples and the shape of the probability distribution.
Essentially, vector quantization is a lookup process, where the lookup table is referred
to as a "codebook". The codebook lists each quantization level, and each level has
an associated "code-vector". The vector quantization process compares an input vector
to the code-vectors and determines the best code-vector in terms of minimum distortion.
Where x is the input vector, the comparison of distortion values may be expressed
as:

for all j not equal to k. The codebook is represented by y
(j), where y
(j) is the jth code-vector, 0 < j < L, and L is the number of levels in the codebook.
[0006] Multi-stage vector quantization (MSVQ) is a type of vector quantization. This process
obtains a central quantized vector (the output vector) by adding a number of quantized
vectors. The output vector is sometimes referred to as a "reconstructed" vector. Each
vector used in the reconstruction is from a different codebook, each codebook corresponding
to a "stage" of the quantization process. Each codebook is designed especially for
a stage of the search. An input vector is quantized with the first codebook, and the
resulting error vector is quantized with the second codebook, etc. The set of vectors
used in the reconstruction may be expressed as:

, where S is the number of stages and y
s is the codebook for the sth stage. For example, for a three-dimensional input vector,
such as x = (2,3,4), the reconstruction vectors for a two-stage search might be y
0 = (1,2,3) and y
1 = (1,1,1) (a perfect quantization and not always the case).
[0007] During multi-stage vector quantization, the codebooks may be searched using a sub-optimal
tree search algorithm, also known as an M-algorithm. At each stage, M-best number
of "best" code-vectors are passed from one stage to the next. The "best" code-vectors
are selected in terms of minimum distortion. The search continues until the final
stage, when only one best code-vector is determined.
[0008] In predictive quantization a target vector for quantization in the current frame
is the mean-removed input vector minus a predictive value. The predicted value is
the previous quantized vector multiplied by a known prediction matrix. In switched
prediction, there is more than one possible prediction matrix and the best prediction
matrix is selected for each frame. See S. Wang, et al., "Product Code Vector Quantization
of LPC Parameters," in Speech and Audio Coding for Wireless and Network Applications,"
Ch. 31, pp. 251-258, Kluwer Academic Publishers, 1993.
[0009] It is highly desirable to provide an improved weighted distance measure that better
correlates with subjective speech quality.
SUMMARY OF THE INVENTION
[0010] In accordance with a preferred embodiment the present invention provides an improved
method of vector quantization of LSF transformation of LPC coefficients by a new weighted
distance measure that better correlates with subjective speech quality. This weighting
includes running samples from the LPC filter from an impulse and applying these samples
to a perceptual weighting filter.
DESCRIPTION OF THE DRAWINGS
[0011] Embodiments of the present invention will now be further described, by way of example,
with reference to the accompanying drawings in which:
Fig. 1 is a block diagram of Mixed Excitation Linear Prediction Coder;
Fig. 2 is a block diagram of switch-predictive vector quantization encoder according
to the present invention;
Fig. 3 is a block diagram of a decoder according to the present invention;
Fig. 4 is a flow chart for determining a weighted distance measure in accordance with
an embodiment of the present invention; and
Fig. 5 is a block diagram of an encoder according to an embodiment of the present
invention.
DESCRIPTION OF PREFERRED EMBODIMENTS OF THE PRESENT INVENTION
[0012] The new quantization method, like the one used in the 2.4 kb/s Federal Standard MELP
coder, uses multi-stage vector quantization (MSVQ) of the Line Spectral Frequency
(LSF) transformation of the LPC coefficients (LeBlanc, et al., entitled "Efficient
Search and Design Procedures for Robust Multi-Stage VQ or LPC Parameters for 4kb/s
Speech Coding," IEEE Transactions on Speech and Audio Processing, Vol. 1, No. 4, October
1993, pp. 373-385.) An efficient codebook search for multi-stage VQ is disclosed in
US Patent Application Serial No. 09/003,172 cited above. However, the method, described
herein, improves on the previous one in two ways: the use of switched prediction to
take advantage of time redundancy and the use of a new weighted distance measure that
better correlates with subjective speech quality.
[0013] In the Federal Standard MELP coder, the input LSF vector is quantized directly using
MSVQ. However, there is a significant redundancy between LSF vectors of neighboring
frames, and quantization accuracy can be improved by exploiting this redundancy. As
discussed previously in predictive quantization, the target vector for quantization
in the current frame is the mean-removed input vector minus a predicted value, where
the predicted value is the previous quantized vector multiplied by a known prediction
matrix. In switched prediction, there is more than one possible prediction matrix,
and the best predictor or prediction matrix is selected for each frame. In accordance
with an embodiment of the present invention, both the predictor matrix and the MSVQ
codebooks are switched. For each input frame, we search every possible predictor/codebooks
set combination for the predictor/codebooks set which minimizes the squared error.
An index corresponding to this pair and the MSVQ codebook indices are then encoded
for transmission. This differs from previous techniques in that the codebooks are
switched as well as the predictors. Traditional methods share a single codebook set
in order to reduce codebook storage, but we have found that the MSVQ codebooks used
in switched predictive quantization can be considerably smaller than non-predictive
codebooks, and that multiple smaller codebooks do not require any more storage space
than one larger codebook. From our experiments, the use of separate predictor/codebooks
pairs results in a significant performance improvement over a single shared codebook,
with no increase in bit rate.
[0014] Referring to the LSF encoder with switched predictive quantizer 20 of Fig. 2, the
10 LPC coefficients are transformed by transformer 23 to 10 LSF coefficients of the
Line Spectral Frequency (LSF) vectors. The LSF has 10 dimensional elements or coefficients
(for 10 order all-pole filter). The LSF input vector is subtracted in adder 22 by
a selected mean vector and the mean-removed input vector is subtracted in adder 25
by a predicted value. The resulting target vector for quantization vector e in the
current frame is applied to multi-stage vector quantizer (MSVQ) 27. The predicted
value is the previous quantized vector multiplied by a known prediction matrix at
multiplier 26. The predicted value in switched prediction has more than one possible
prediction matrix. The best predictor (prediction matrix and mean vector) is selected
for each frame. In accordance with an embodiment of the present invention, both the
predictor (the prediction matrix and mean vector) and the MSVQ codebook set are switched.
A control 29 first switches in via switch 28 prediction matrix 1 and mean vector 1
and first set of codebooks 1 in quantizer 27. The index corresponding to this first
prediction matrix and the MSVQ codebook indices for the first set of codebooks are
then provided out of the quantizer to gate 37. The predicted value is added to the
quantized output
ê for the target vector e at adder 31 to produce a quantized mean-removed vector. The
mean-removed vector is added at Adder 70 to the selected mean vector to get quantized
vector
X̂. The squared error for each dimension is determined at squarer 35. The weighted squared
error between the input vector X
i and the delayed quantized vector
X̂i is stored at control 29. The control 29 applies control signals to switch in via
switch 28 prediction matrix 2 and mean vector 2 and codebook 2 set to likewise measure
the weighted squared error for this set at squarer 35. The measured error from the
first pair of prediction matrix 1 (with mean vector 1) and codebooks set 1 is compared
with prediction matrix 2 (with mean vector 2) and codebook set 2. The set of indices
for the codebooks with the minimum error is gated at gate 37 out of the encoder as
encoded transmission of indices and a bit is sent out at terminal 38 from control
29 indicating from which pair of prediction matrix and codebooks set the indices was
sent (codebook set 1 with mean vector 1 and predictor matrix 1 or codebook set 2 and
prediction matrix 2 with mean vector 2). The mean-removed quantized vector from adder
31 associated with the minimum error is gated at gate 33a to frame delay 33 so as
to provide the previous mean-removed quantized vector to multiplier 26.
[0015] Fig. 3 illustrates a decoder 40 for use with LSF encoder 20. At the decoder 40, the
indices for the codebooks from the encoding are received at the quantizer 44 with
two sets of codebooks corresponding to codebook set 1 and 2 in the encoder. The bit
from terminal 38 selects the appropriate codebook set used in the encoder. The LSF
quantized input is added to the predicted value at adder 41 where the predicted value
is the previous mean-removed quantized value (from delay 43) multiplied at multiplier
45 by the prediction matrix at 42 that matches the best one selected at the encoder
to get mean-removed quantized vector. Both prediction matrix 1 and mean value 1 and
prediction matrix 2 and mean value 2 are stored at storage 42 of the decoder. The
1 bit from terminal 38 of the encoder selects the prediction matrix and the mean value
at storage 42 that matches the encoder prediction matrix and mean value. The quantized
mean-removed vector is added to the selected mean value at adder 48 to get the quantized
LSF vector. The quantized LSF vector is transformed to LPC coefficients by transformer
46.
[0016] As discussed previously, LSF vector coefficients correspond to the LPC coefficients.
The LSF vector coefficients have better quantization properties than LPC coefficients.
There is a 1 to 1 transformation between these two vector coefficients. A weighting
function is applied for a particular set of LSFs for a particular set of LPC coefficients
that correspond.
[0017] The Federal Standard MELP coder uses a weighted Euclidean distance for LSF quantization
due to its computational simplicity. However, this distance in the LSF domain does
not necessarily correspond well with the ideal measure of quantization accuracy: perceived
quality of the processed speech signal. The applicant has previously shown in the
paper on the new 2.4 kb/s Federal Standard that a perceptually-weighted form of log
spectral distortion has close correlation with subjective speech quality. The applicant
teaches herein in accordance with an embodiment a weighted LSF distance which corresponds
closely to this spectral distortion. This weighting function requires looking into
the details of this transformation for a particular set of LSFs for a particular input
vector x which is a set of LSFs for a particular set of LPC coefficients that correspond
to that set. The coder computes the LPC coefficients and as discussed above, for purposes
of quantization, this is converted to LSF vectors which are better behaved. As shown
in Fig. 1, the actual synthesizer will take the quantized vector
X̂ and perform an inverse transformation to get an LPC filter for use in the actual
speech synthesis. The optimal LSF weights for un-weighted spectral distortion are
computed using the formula presented in paper of Gardner, et al., entitled, "Theoretical
Analysis of the High-Rate Vector Quantization of the LPC Parameters," IEEE Transactions
on Speech and Audio Processing, Vol. 3, No. 5, September 1995, pp. 367-381.

where R
A(m) is the autocorrelation of the impulse response of the LPC synthesis filter at
lag m, and R
i(m) is the correlation of the elements in the ith column of the Jacobian matrix of
the transformation from LSF's to LPC coefficients. Therefore for a particular input
vector x we compute the weight W
i.
[0018] The difference in the present solution is that perceptual weighting is applied to
the synthesis filter Impulse response prior to computation of the autocorrelation
function R
A(m), so as to reflect a perceptually-weighted form of spectral distortion.
[0019] In accordance with the weighting function as applies to the embodiment of Fig. 2,
the weighting W
i is applied to the squared error at 35. The weighted output from error detector 35
is:

Each entry in a 10 dimensional vector has a weight value. The error sums the weight
value for each element. In applying the weight, for example, one of the elements has
a weight value of three and the others are one then the element with three is given
an emphasis by a factor of three times that of the other elements in determining error.
[0020] As stated previously, the weighting function requires looking into the details of
the LPC to LSF conversion. The weight values are determined by applying an impulse
to the LPC synthesis filter 21 and providing the resultant sampled output of the LPC
synthesis filter 21 to a perceptual weighting filter 47. A computer 39 is programmed
with a code based on a pseudo code that follows and is illustrated in the flow chart
of Fig. 4. An impulse is gated to the LPC filter 21 and N samples of LPC synthesis
filter response (step 51) are taken and applied to a perceptual weighting filter 37
(step 52). In accordance with one embodiment of the invention low frequencies are
weighted more than high frequencies and use the well known Bark scale which matches
how the human ear responds to sounds. The equation for Bark weighting W
B(f) is

The coefficients of a filter with this response are determined in advance and stored
and time domain coefficients are stored. An 8 order all-pole fit to this spectrum
is determined and these 8 coefficients are used as the perceptual weighting filter.
The following steps follow the equation for un-weighted spectral distortion from Gardner,
et al. paper found on page 375 are expressed as

where R
A(m) is the autocorrelation of the impulse response of the LPC synthesis filter at
lag m, where

h(n) is an impulse response, R
i(m) is

is the correlation function of the elements in the ith column of the Jacobian matrix
J
ω (ω) of the transformation from LSFs to LPC coefficients. Each column of J
ω (ω) can be found by

since

The values of j
i(n) can be found by simple polynomial division of the coefficients of P(ω) by the
coefficients of

(ω). Since the first coefficient of

(ω) = 1, no actual divisions are necessary in this procedure. Also, j
i(n) = j
i(v + 1- n): i odd; 0 < n < v, so only half the values must be computed. Similar conditions
with an anti-symmetry property exist for the even columns.
[0021] The autocorrelation function of the weighted impulse response is calculated (step
53 in Fig. 4). From that the Jacobian matrix for LSFs is computed (step 54). The correlation
of rows of Jacobian matrix is then computed (step 55). The LSF weights are then calculated
by multiplying correlation matrices (step 56). The computed weight value from computer
39, in Fig. 2, is applied to the error detector 35. The indices from the prediction
matrix/codebook set with the least error is then gated from the quantizer 27. The
system may be implemented using a microprocessor encapsulating computer 39 and control
29 utilizing the following pseudo code. The pseudo code for computing the weighting
vector from the current LPC and LSF follows:
/* Compute weighting vector from current LPC and LSF's */
Compute N samples of LPC synthesis filter impulse response
Filter impulse response with perceptual weighting filter
Calculate the autocorrelation function of the weighted impulse response
Compute Jacobian matrix for LSF's
Compute correlation of rows of Jacobian matrix
Calculate LSF weights by multiplying correlation matrices
[0022] The code for the above is provided in Appendix A.
The pseudo code for the encode input vector follows:
/* Encode input vector */
For all predictor, codebook pairs
Remove mean from input LSF vector
Subtract predicted value to get target vector
Search MSVQ codebooks for best match to target vector using weighted distance
If Error <Emin
Emin = Error
best predictor index = current predictor
Endif
End
Endcode best predictor index and codebook indices for transmission
[0023] The pseudo code for regenerate quantized vector follows:
/* Regenerate quantized vector */
Sum MSVQ codevectors to produce quantized target
Add predicted value
Update memory of past quantized values (mean-removed)
Add mean to produce quantized LSF vector
[0024] We have implemented a 20-bit LSF quantizer based on this new approach which produces
equivalent performance to the 25-bit quantizer used in the Federal Standard MELP coder,
at a lower bit rate. There are two predictor/codebook pairs, with each consisting
of a diagonal first-order prediction matrix and a four stage MSVQ with codebook of
size 64, 32, 16, and 16 vectors each. Both the codebook storage and computational
complexity of this new quantizer are less than in the previous version.
[0025] Although the present invention and its advantages have been described in detail,
it should be understood that various changes, substitutions and alterations can be
made herein without departing from the spirit and scope of the invention
[0026] For example it is anticipated that the system and method be used without switched
prediction for each frame as illustrated in Fig. 5 wherein the weighted error for
each frame would be determined at error detector and codebook indices with the least
error would be gated out by control 29 and gate 37. For each frame, the LPC filtered
samples of the impulse at filter 21 should be filtered by perception weighting filter
47 and processed by computer 39 using code such as described in the pseudo code to
provide the weight vales. Also the perception weighting filter may use other perceptual
weighting besides the bark scale that is perceptually motivated such as weighting
low frequencies more than high frequencies, or the perceptual weighting filter as
is presently used in CELP coders.
[0027] The scope of the present disclosure includes any novel feature or combination of
features disclosed therein either explicitly or implicitly or any generalisation thereof
irrespective of whether or not it relates to the claimed invention or mitigates any
or all of the problems addressed by the present invention. The applicant hereby gives
notice that new claims may be formulated to such features during the prosecution of
this application or of any such further application derived therefrom. In particular,
with reference to the appended claims, features from dependent claims may be combined
with those of the independent claims and features from respective independent claims
may be combined in any appropriate manner and not merely in the specific combinations
enumerated in the claims.


1. A method of vector quantization of LPC coefficients comprising the steps of:
translating LPC coefficients to LSF coefficients;
providing a quantizer with a codebook for quantizing LSF target vectors;
searching within said codebook for determining LSF target vectors that result in quantized
output that best match LPC coefficients;
applying said target vectors to said codebook to get quantized vectors;
said searching step comprising the step of determining the squared error multiplied
by a weighting value for each dimension between the LSF coefficients and the quantized
output wherein said weighting value is a function of perceptual weighting;
and said determining step including the steps of:
calculating an autocorrelation function of a weighted impulse response;
computing a Jacobian matrix for said LSF vectors;
computing the correlation of rows of the Jacobian matrix; and
calculating LSF weights by multiplying correlation matrices.
2. The method of Claim 1 wherein said determining step comprises the further steps for
finding said weighting value of:
applying an impulse to said LPC filter and running N samples of the LPC synthesis
response; and
filtering the samples with a perceptual filter;
calculating the autocorrelation function of the weighted impulse response;
computing the Jacobian matrix for said LSF vectors;
computing the correlation of rows of Jacobian matrix; and
calculating LSF weights by multiplying correlation matrices.
3. The method of Claim 2 wherein the step of filtering the samples with said perceptual
filter comprises weighting low frequencies more than high frequencies.
4. The method of Claim 3 wherein the step of filtering the samples with said perceptual
filter comprises following the bark scale.
5. The method of any preceding Claim wherein said step of providing said quantizer comprises
providing a multi-stage vector quantizer.
6. The method of any preceding Claim wherein said step of providing said quantizer comprises
providing a quantizer having one or more sets of codebooks.
7. A quantizer for a coder including an LPC filter and a translator for translating LPC
coefficients to LSF coefficients comprising:
a codebook responsive to said LSF target vector for quantizing LSF target vectors;
means for searching within said codebooks for determining LSF target vectors that
result in quantized output that best match LPC coefficients;
means for applying said LSF target vectors to said codebook to provide a quantized
output;
said searching means including means for applying an impulse to said LPC filter;
means for running samples of said LPC response;
a perceptual filter for filtering said samples; and
means for calculating an autocorrelation function by weighted response, a Jacobian
matrix for said LSF vectors, a correlation of rows of Jacobian matrix, and LSF weights
by multiplying correlation matrices.