Background of the Invention
Field of the Invention
[0001] The present invention relates to communications; more specifically, voice encoding.
Description of the Related Art
[0002] A voice encoder (vocoder) is used to encode voice signals so as to minimise the amount
of bandwidth that is used for transmitting over communication channels. It is important
to minimize the amount of bandwidth used per communication channel so as to maximize
the number of channels available within a given range of spectrum. Many vocoders are
known as code excited linear predictive (CELP) vocoders. Present CELP vocoders which
model the fixed codebook contribution to the filter excitation as a series of pulses
use a inefficient encoding scheme that is sensitive to bit errors. An encoding scheme
that is wasteful of precious bandwidth and is sensitive to bit errors is particularly
undesirable in a error-prone communication channel such as a wireless communication
channel.
[0003] The encoding process involves representing a series of excitation pulses or an excitation
vector as a series of bits referred to as a fixed index. The fixed index is used by
a vocoder at a receiver to reproduce the excitation pulses which are then used to
excite a speech model and thereby reproduce speech. Prior vocoders represent these
pulses using 3-1/2 or more bits per pulse. Additionally, prior vocoders are sensitive
to communication channel induced errors because a single bit error may produce effort
in up to two pulses.
[0004] FIG. 1 illustrates a series of pulses that are to be represented by a fixed index.
In this example there are ten pulses; each pulse may be positive or negative. The
fixed index specifies which ten of the forty possible predetermined positions are
occupied by a pulse and the sign of each pulse. An inefficient coding scheme is illustrated
by the table of FIG. 2. There are 40 possible positions for pulses; however, the table
indicates that each pulse is limited to one of eight positions. As a result, the vocoder
is limited to using excitation vectors that are composed of a series of pulses that
are permitted by the possible combinations specified by the table. FIG. 2 illustrates
a fixed index table where two pulses are associated with each row of the table. In
the first row, each of pulses I
0 and I
5 are restricted to one of eight positions; namely, positions 0, 5, 10, 15, 20, 25,
30 and 35. Likewise, each remaining row specifies the possible positions that may
be assigned to each pulse of the pulse pair associated with that row. It should be
noted that specifying one of eight positions for each pulse requires three bits for
each pulse. Additionally, a sign is specified for each pulse. In this prior art system,
one bit is used to specify the sign of the first pulse of each pulse pair in each
row. The sign of the second pulse in each pulse pair is specified by the position
of that pulse. If the second pulse has a position that is smaller than the first pulse's
position, the sign of the second pulse is opposite to that of the first pulse, otherwise
the signs of the pulses are the same. As a result, for ten pulses, thirty-five bits
are used to specify their positions and signs (3.5 bits/pulse). It should be noted
that in this system if a single bit error occurs it will not only affect the position
or sign of the pulse associated with that error, but it may also affect the sign of
the second pulse in a pair of pulses.
Summary of the Invention
[0005] The present invention provides a CELP vocoder that efficiently encodes an excitation
vector in a way that is less sensitive to single bit errors. Each of the pulses composing
the excitation vector are limited to one of four predetermined positions. As a result,
only three bits are required to encode each pulse (two bits for position and one sign
bit) and, in addition, a single bit error only produces a error in one pulse.
Brief Description of the Drawing
[0006]
FIG. 1 illustrates a series of pulses;.
FIG. 2 is a fixed index table illustrating a inefficient encoding scheme;
FIG. 3 is a block diagram of a typical vocoder;
FIG. 4 illustrates the major functions of encoder 14 of vocoder 10;
FIG. 5 is a functional block diagram of decoder 20 of vocoder 10;
FIG. 6 is a fixed index table specifying valid pulse positions for a ten pulse excitation
vector;
FIG. 7 is a fixed index table specifying valid pulse positions for a five pulse excitation
vector; and
FIG. 8 is a fixed index table specifying valid pulse positions for a three pulse excitation
vector;
Detailed Description of the Invention
[0007] FIG. 3 illustrates a block diagram of a typical vocoder. Vocoder 10 receives digitized
speech on input 12. The digitized speech is an analog speech signal that has been
passed through an analog to digitized converter, and has been broken into flames where
each flame is typically on the order of 20 milliseconds. The signal at input 12 is
passed to encoder section 14 which encodes the speech so as decrease the amount of
bandwidth used to transmit the speech. The encoded speech is made available at output
16. The encoded speech is received by the decode section of a similar vocoder at the
other end of a communication channel. The decoder at the other end of the communication
channel is similar or identical to the decoder portion of vocoder 10. Encoded speech
is received by vocoder 10 through input 18, and is passed to decoder section 20. Decoder
section 20 uses the encoded signals received from the transmitting vocoder to produce
digitized speech at output 22.
[0008] Vocoders are well known in the communications arts. For example, vocoders are described
in "Speech and audio coding for wireless and network applications," edited by Bishnu
S. Atal, Vladimir Cuperman, and Allen Gersho, 1993, by Kluwer Academic Publishers.
Vocoders are widely available and manufactured by companies such as Qualcomm Incorporated
of San Diego, California, and Lucent Technologies Inc., of Murray Hill, New Jersey.
[0009] FIG. 4 illustrates the major functions of encoder 14 of vocoder 10. A digitized speech
signal is received at input 12, and is passed to linear predictive coder 40. Linear
predictive coder 40 performs a linear predictive analysis of the incoming speech once
per flame. Linear predictive analysis is well known in the art and produces a linear
predictive synthesis model of the vocal tract based on the input speech signal. The
linear predictive parameters or coefficients describing this model are transmitted
as part of the encoded speech signal through output 16. Coder 40 uses this model to
produce a residual speech signal which represents the excitation that the model uses
to reproduce the input speech signal. The residual speech signal is made available
at output 42. The residual speech from output 42 is provided to input 48 of open-loop
pitch search unit 50, to an input of adaptive codebook unit 72 and to fixed codebook
unit 82.
[0010] Impulse response unit 60 receives the linear predictive parameters from coder 40
and generates the impulse response of the model generated in coder 40. This impulse
response is used in the adaptive and fixed codebook units.
[0011] Open loop pitch search unit 50 uses the residual speech signal from coder 40 to model
its pitch and provides a pitch, or what is commonly called the pitch period or pitch
delay signal, at output 52. The pitch delay signal from output 52 and the impulse
response signal from output 64 of impulse response unit 60 are received by input 70
of adaptive codebook unit 72. Adaptive codebook unit 72 produces a pitch gain output
and a pitch index output which become part of encoded speech output 16 of vocoder
10. Output 74 of adaptive codebook 72 also provides the pitch gain and pitch index
signals to input 80 of fixed codebook unit 82. Additionally, adaptive codebook 72
provides an excitation signal and a adaptive codebook target signal to input 80.
[0012] The adaptive codebook 72 produces its outputs using the digitised speech signal from
input 12 and the residual speech signal produced by linear predictive coder 40. Adaptive
codebook 72 uses the digitized speech signal and linear predictive coder 40's residual
speech signal to form an adaptive codebook target signal. The adaptive codebook target
signal is used as an input to fixed codebook 82, and as a input to a computation that
produces the pitch gain, pitch index and excitation outputs of adaptive codebook unit
72. Additionally, the adaptive codebook target signal, the pitch delay signal from
open loop pitch search unit 50, and the impulse response from impulse response unit
60 are used to produced the pitch index, the pitch gain and excitation signals which
are passed to fixed codebook unit 82. The manner in which these signals are computed
is well known in the vocoder art.
[0013] Fixed codebook 82 uses the inputs received from input 80 to produce a fixed gain
output and a fixed index output which are used as part of the encoded speech at output
16. The fixed codebook unit attempts to model the stocastic part of the linear predictive
coder 40's residual speech signal. A target for a fixed codebook search is produced
by determining a fixed codebook error or the difference between the current adaptive
codebook target signal and the residual speech signal from linear predictive coder
40. The fixed codebook error is well known in the art and is described in telecommunications
standards as the mean square error between a weighted speech signal and a weighted
synthesis speech signal. These standards are published by groups such as the International
Telecommunication Union, the European Telecommunications Standards Institute, and
the Telecommunications Industry Association. The fixed codebook search produces the
fixed gain and fixed index that minimizes the fixed codebook error or the mean square
of the error. The fixed index describes a set of excitation pulses. The fixed index
is obtained by searching for a set of excitation pulses that minimize the fixed codebook
error, however, the search for a set of excitation pulses is limited to valid sets
of excitation pulses defined by the fixed codebook's fixed index table. The fixed
index table limits the number of possible positions that each pulse may occupy. The
manner in which the fixed gain and fixed index signals are computed using the outputs
from adaptive codebook unit 72 are well known in the vocoder art.
[0014] FIG. 5 illustrates a functional block diagram of decoder 20 of vocoder 10. Encoded
speech signals are received at input 18 of encoder 20. The encoded speech signals
are received by decoder 100. Decoder 100 produces fixed and adaptive code vectors
corresponding to the fixed index and pitch index signals, respectively. These code
vectors are passed to the excitation construction portion of unit 110 along with the
pitch gain and the fixed gain signals. The pitch gain signal is used to scale the
adaptive vector which was produced using the pitch index signal, and the fixed gain
signal is used to scale the fixed vector which was obtained using the fixed index
signal. Decoder 100 passes the linear predictive code parameters to the filter or
model synthesis section of unit 110. Unit 110 then uses the scaled vectors to excite
the filter that is synthesized using the linear predictive coefficients produced by
linear predictive coder 40, and produces an output signal which is representative
of the digitized speech originally received at input 12. Optionally, post filter 120
may be used to shape the spectrum of the digitized speech signal that is produced
at output 20.
[0015] Referring back to FIG. 3, one of fixed codebook 82's outputs is a fixed index. A
fixed index is produced four times per frame (once per subframe), which is every 5
msec for a system using 20 msec frames. The fixed index specifies an excitation vector
or a series of excitation pulses, where the bits of the fixed index describe the position
and sign of the pulses. As mentioned earlier, these excitation pulses are used as
inputs to a speech model in a receiving vocoder.
[0016] FIG. 6 illustrates a fixed index table used for specifying the possible predetermined
positions of the excitation pulses composing a valid excitation vector. Each pulse
is limited to one of four predetermined positions and therefore only requires two
bits to specify a position. A third bit is used to specify a sign. For example, if
ten pulses are to be specified, ten rows each having four possible positions are included
in the table. In this example, pulse I
0 may occupy positions 0, 10, 20 or 30. And likewise, each of the other pulses may
occupy one of the possible positions specified in its row. In this example, only thirty
bits are required to specify the position and sign of ten pulses (3 bits/pulse) because
two bits per pulse specify position and one bit per pulse specifies a sign.
[0017] FIG. 7 illustrates a fixed index table used for specifying the possible predetermined
positions of five pulses where each pulse may occupy only one of four positions.
[0018] FIG. 8 illustrates a fixed index table specifying the possible predetermined positions
of the pulses in a three pulse excitation vector where the excitation pulses specified
by the last two rows are limited to three possible predetermined locations each. It
is also possible to use a fixed index table that limits one or more excitation pulses
to two possible predetermined locations each. The schemes of FIGS. 6, 7 and 8 may
be applied to excitation vectors having any number of pulses and the number of possible
predetermined positions that each pulse may occupy may be limited to four or less.
[0019] The functional block diagrams can be implemented in various forms. Each block can
be implemented individually using microprocessors or microcomputers, or they can be
implemented using a single microprocessor or microcomputer. It is also possible to
implement each or all of the functional blocks using programmable digital signal processing
devices or specialized devices received from the aforementioned manufacturers or other
semiconductor manufacturers.
1. A method for encoding an excitation vector, comprising the steps of:
selecting a selected excitation pulse set from a plurality of valid excitation pulse
sets, each excitation pulse set having a plurality of excitation pulses;
restricting the plurality of valid excitation pulse sets to sets where each excitation
pulse is limited to one of up to four predetermined positions; and
producing an output describing the selected excitation pulse set.
2. A method for encoding an excitation vector, comprising the steps of:
searching through a plurality of valid excitation pulse sets for a selected excitation
pulse set that minimizes a fixed codebook error, each excitation pulse set having
a plurality of excitation pulses;
restricting the plurality of valid excitation pulse sets to sets where each excitation
pulse is limited to one of up to four predetermined positions; and
producing an output describing the selected excitation pulse set.
3. The method of claim 1 or claim 2, wherein the step of restricting comprises restricting
the plurality of valid excitation pulse sets to sets where each excitation pulse is
limited to one of four predetermined positions.
4. The method of claim 1 or claim 2, wherein the step of restricting comprises restricting
the plurality of valid excitation pulse sets to sets where a first excitation pulse
is limited to one of up to four predetermined positions and a second excitation pulse
is limited to one of up to three predetermined positions.
5. The method of claim 1 or claim 2, wherein the step of restricting comprises restricting
the plurality of valid excitation pulse sets to sets where a first excitation pulse
is limited to one of four predetermined positions and a second excitation pulse is
limited to one of three predetermined positions.
6. The method of any of the preceding claims, wherein the step of producing an output
comprises producing an output that describes a position of each excitation pulse in
the selected excitation pulse set by up to two bits.
7. The method of claim 6, wherein the step of producing an output comprises producing
an output that describes a sign of each excitation pulse in the selected excitation
pulse set by one bit.
8. The method of any of the preceding claims wherein the step of selecting comprises
selecting a selected excitation pulse set having ten pulses.
9. The method of any of claims 1 to 7, wherein the step of selecting comprises selecting
a selected excitation pulse set having five pulses.
10. The method of any of claims 1 to 7, wherein the step of selecting comprises selecting
a selected excitation pulse set having four pulses.
11. The method of any of claims 1 to 7, wherein the step of selecting comprises selecting
a selected excitation pulse set having three pulses.