[0001] The invention refers to a speech encoder as set forth in the preamble of claim 1.
A speech encoder of this type is known from EP-8-0 176 243.
[0002] In the aforementioned document, a coder for speech signals is disclosed comprising
separation means for receiving speech signals and generating series of values, each
series representing respective portions of the frequency spectrum of the input signal
and, encoding means for digitally encoding each series, and bit location means for
varying the number of bits used for encoding the respective series in dependence on
the relative energy content thereof, wherein the number of series to which any given
number of bits is allocated is constant and only the selection of the series to which
respective numbers of bits are allocated is varied.
[0003] Conventional analog telephone systems are being replaced by digital systems. In digital
systems, the analog signals are sampled at a rate of about twice the bandwidth of
the analog signals or about eight kilohertz, and the samples are then encoded. In
a simple pulse code modulation system (PCM), each sample is quantized as one of a
discrete set of prechosen values and encoded as a digital word which is then transmitted
over the telephone lines. With eight bit digital words, for example, the analog sample
is quantized to 2⁸ or 256 levels, each of which is designated by a different eight
bit word. Using nonlinear quantization, excellent quality speech can be obtained with
only seven bits per sample; but since a seven bit word is still required for each
sample, transmission bit rates of 56 kilobits per second are necessary.
[0004] Efforts have been made to reduce the bit rates required to encode the speech and
obtain a clear decoded speech signal at the receiving end of the system. The linear
predictive coding (LPC) technique is based on the recognition that speech production
involves excitation and a filtering process. The excitation is determined by the vocal
cord vibration for voiced speech and by turbulence for unvoiced speech, and that actuating
signal is then modified by the filtering process of vocal resonance chambers, including
the mouth and nasal passages. For a particular group of samples, a digital filter
which simulates the formant effects of the resonance chambers can be defined and the
definition can be encoded. A residual signal which approximates the excitation can
then be obtained by passing the speech signal through an inverse formant filter, and
the residual signal can be encoded. Because sufficient information is contained in
the lower-frequency portion of the residual spectrum, it is possible to encode only
the low frequency baseband and still obtain reasonably clear speech. At the receiver,
a definition of the formant filter and the residual baseband are decoded. The baseband
is repeated to complete the spectrum of the residual signal. By applying the decoded
filter to the repeated baseband signal, the initial speech can be reconstructed.
[0005] A major problem of the LPC approach is in defining the formant filter which must
be redefined with each window of samples. A complex encoder and a complex decoder
are required to obtain transmission rates as low as 16,000 bits per second. Another
problem with such systems is that they do not always provide a satisfactory reconstruction
of certain formants such as that resulting, for example, from nasal resonance. It
is the object of the invention to solve these problems.
[0006] This object is attained by the characterizing features of claim 1 or 14, respectively.
Preferred embodiments of the invention are subject matter of the sub-claims.
[0007] In one system, the approximate envelope of the transform spectrum in each of a plurality
of subbands of coefficients is defined and each envelope definition is encoded for
transmission. Each spectrum coefficient is then scaled relative to the defined envelope
of the respective subband, and each scaled coefficient is encoded in a number of bits
which is determined by the defined envelope of its subband.
[0008] Zero bits may be allotted to a number of less significant subbands as indicated by
the defined envelopes; and varying numbers of bits may be used for each encoded coefficient
depending on the magnitude of the defined envelope for the respective subband. Thus,
the subbands which are transmitted and the resolution with which the transmitted subbands
are encoded are determined adaptively for each sample window based on the defined
envelopes of the subbands.
[0009] At the receiver, the subbands which are transmitted are replicated to define coefficients
of frequencies which are not transmitted. A list replication procedure is followed
by which an nth coefficient which is transmitted is replicated as an nth coefficient
which is not transmitted. After replication the speech signal can be recreated by
using the transmitted envelope definitions to inverse scale the coefficients of the
respective subbands and by performing an inverse transform.
[0010] In another system the spectrum is normalized first with respect to only a few regions
and subsequently with respect to a greater number of subregions. The maximum magnitude
in each of the regions and in each of the subregions is encoded. The maximums are
logarithmically encoded and only a baseband of the normalized spectrum is encoded.
[0011] The foregoing and other objects, features, and advantages of the invention will be
apparent from the following more particular description of a preferred embodiment
of the invention, as illustrated in the accompanying drawings in which like reference
characters refer to the same parts throughout the different views. The drawings are
not necessarily to scale, emphasis instead being placed upon illustrating the principles
of the invention.
Fig. 1 is a block diagram illustration of an encoder and a decoder embodying the present
invention;
Figure 2 is a block diagram of a speech encoder and corresponding decoder of a preferred
implementation of the system of Figure 1.
Figure 3 is an example of a magnitude spectrum of the Fourier transform of a window
of speech illustrating principles of the system of Figure 2.
Figure 4 is an example spectrum normalized from that of Figure 3 based on principles
of the present invention.
Figure 5 schematically illustrates a quantizer for complex values of the normalized
spectrum.
Figure 6 is an example illustration of coefficient groups which are transmitted and
illustrates the replication technique of the system of Figure 2.
Figure 7 is an example of a magnitude spectrum of a window of speech illustrating
principles of another system embodying the present invention.
Figure 8 is an example spectrum normalized from the spectrum of Fig. 7 using four
formant regions;
Figure 9 is an example spectrum normalized from that of Fig. 8 in subbands;
Figure 10 schematically illustrates a quantizer for complex values of the normalized
spectrum;
Figure 11 is a block diagram illustration of the spectral equalization encoding circuit
of Fig. 1 in the alternative embodiment.
[0012] A block diagram of the system is shown in Fig. 1. Speech is filtered with a telephone
bandpass filter 20 which prevents aliasing when the signal is sampled 8,000 times
per second in sampling circuit 22. The analog samples are digitally encoded in an
analog to digital encoder 24 and are preprocessed at 26 prior to being applied to
a discrete Fourier transform unit 28.
[0013] The output of the Fourier transform circuit 28 is a sequence of coefficients which
indicate the magnitude and phase of the Fourier transform spectrum at each of 97 frequencies
spaced 41.667 hertz apart. The magnitude spectrum of the Fourier transform output
is illustrated as a continuous function in Fig. 3 but it is recognized that the transform
circuit 28 would actually provide only 97 incremental outputs.
[0014] In accordance with the present invention, the Fourier transform spectrum of the full
speech within a selected window is equalized and encoded in circuit 30 in a manner
which will be discussed below. The resultant digital signal can be transmitted at
16,000 bits per second over a line 32 to a receiver. At the receiver the full spectrum
of Fig. 3 is reconstructed in circuit 34. The inverse Fourier transform is performed
in circuit 36 and applied through a post-processor 38 corresponding to the pre-processor
26. That signal is then converted to analog form in digital to analog converter 40.
Final filtering in filter 42 provides clear speech to the listener.
[0015] In a preferred system, a pipelined multiprocessor architecture is employed. One microcomputer
is dedicated to the analog to digital conversion with preemphasis filtering, one is
dedicated to the forward Fourier transform and a third is dedicated to the spectral
equalization and coding. Similarly, in the receiver, one microcomputer is dedicated
to spectrum reconstruction, another to inverse Fourier transform and a third to digital
to analog conversion with deemphasis filtering.
[0016] The spectral equalization and encoding technique of the present invention is based
on the recognition that the Fourier transform of the total signal includes a relatively
flat spectrum of the pitch illustrated in Fig. 4 shaped by formant signals. In the
present system, the signal of Fig. 4 is obtained by normalizing the spectrum of Fig.
3 to at least one curve which itself can be encoded separate from the residual spectrum
of Fig. 4.
[0017] One implementation of the coding system of Figure 1 is shown in Figure 2. Prior to
compression, the analog speech signal is low pass filtered in filter 20 at 3.4 kilohertz,
sampled in sampler 22 at a rate of 8 kilohertz, and digitized using a 12 bit linear
analog to digital converter 24. It will be recognized that the input to the encoder
may already be in digital form and may require conversion to the code which can be
accepted by the encoder. The digitized speech signal, in frames of N samples, is first
scaled up in a scaler 26 to maximize its dynamic range in each frame. The scaled input
samples are then Fourier transformed in a fast Fourier transform device 28 to obtain
a corresponding discrete spectrum represented by (N/2)+ 1 complex frequency coefficients.
[0018] In a specific implementation, the input frame size equals 180 samples and corresponds
to a frame every 22.5 milliseconds. However, the discrete Fourier transform is performed
on 192 samples, including 12 samples overlapped with the previous frame, preceded
by trapezoidal windowing with a 12 point slope at each end. The resulting output of
the FFT includes 97 complex frequency coefficients spaced 41.667 Hertz apart.
[0019] An example magnitude spectrum of a Fourier transform output from FFT 28 is illustrated
in Figure 3. Although illustrated as a continuous function, it is recognized that
the transform circuit 28 actually provides only 97 incremental complex outputs.
[0020] The magnitude spectrum of the Fourier transform output is equalized and encoded.
To that end, the spectrum is partitioned into contiguous subbands and a spectral envelope
estimate is based on a piecewise approximation of those subbands at 44. In a specific
implementation, the spectrum is divided into twenty subbands, each including four
complex coefficients. Frequencies above 3291.67 Hertz are not encoded and are set
to zero at the receiver. To equalize the spectrum, the spectral envelope of each subband
is assumed constant and is defined by the peak magnitude in each subband as illustrated
by the horizontal lines in Figure 3. Each magnitude, or more correctly the inverse
thereof, can be treated as a scale factor for its respective subband. Each scale factor
is quantized in a quantizer 45 to four bits.
[0021] By then multiplying at 46 the magnitude of each coefficient of the spectrum by the
scale factor associated with that coefficient, the flattened residual spectrum of
Figure 4 is obtained. This flattening of the spectrum is equivalent to inverse filtering
the signal based on the piecewise-constant estimate of the spectral envelope.
[0022] Only selected subbands of the flattened spectrum of Figure 4 are quantized and transmitted.
Selection at 48 of subbands to be transmitted is based on the scale factor of the
subbands. In a specific implementation, the 12 subbands having the smallest scale
factors, that is the largest energy, are encoded and transmitted. For the eight lower
energy subbands only the scale factors are transmitted.
[0023] A nonuniform bit allocation is used for the complex coefficients which are transmitted.
Three separate two dimensional quantizers 50 are used for the transmitted 12 subbands.
The sixteen complex coefficients of the four subbands having the smallest scale factors
are quantized to seven bits each. The coefficients of the four subbands having the
next smallest scale factors are quantized to six bits each, and the coefficients of
the remaining four of the transmitted subgroups are quantized to four bits each. In
effect, the coefficients of the eight subbands which are not transmitted are quantized
to zero bits.
[0024] Each of the two dimensional quantizers is designed using an approach presented by
Linde, et al., "An Algorithm for Vector Quantizer Design,"
IEEE Trans on Commun, Vol COM-28, pp. 84-95, Jan 1980. The result for the seven bit quantizer is shown
in Figure 5. The two dimensions of the quantizer are the real and imaginary components
of each complex coefficient. Each cluster has a seven bit representation to which
each complex point in the cluster is quantized. Actual quantization may be by table
look-up in a read only memory.
[0025] The bit allocation for a single frame may be summarized as follows:
| Scale factors 20 x 4 bits each = |
80 bits |
| 16 x 7 bits = |
112 bits |
| 16 x 6 bits = |
96 bits |
| 16 x 4 bits = |
64 bits |
| Time scaling = |
4 bits |
| Synchronization = |
4 bits |
| TOTAL |
360 bits |
[0026] At the receiver, the transmitted 12 groups of coefficients are applied to corresponding
seven bit, six bit and four bit inverse quantizers at 52. The frequency subbands to
which the resulting coefficients correspond are determined by the scale factors which
are transmitted in sequence for all subbands. Thus, the coefficients from the seven
bit inverse quantizer are placed in the subbands which the scale factors indicate
to be of the greatest magnitude.
[0027] The coefficients of the eight subbands which are not transmitted are approximated
by replication of transmitted subbands at 54. To that end, a list replication approach
is utilized. This approach is illustrated by Figure 6. In Figure 6, the coefficients
for each subband are illustrated by a single vector. The transmitted subbands are
indicated as T1, T2, T3, . . .Tn, . . . and the subbands which must be produced by
replication in the receiver are indicated as R1, R2, R3, . . . Rn, . . . In accordance
with the replication technique of the present system, the coefficients of the subband
Tn are used both for Tn and for Rn. Thus, the scaled coefficients for subband T1 are
repeated at subband R1, those of subband T2 are repeated at R2, and those at subband
T3 are repeated at R3. The rationale for this list replication technique is that subbands
are themselves usually grouped in blocks of transmitted subbands and blocks of nontransmitted
subbands. Thus, large blocks of coefficients are typically repeated using this approach
and speech harmonics are maintained in the replication process.
[0028] Once the equalized spectrum of Figure 4 is recreated by replication of subbands,
a reproduction of the spectrum of Figure 3 can be generated at 56 by applying the
scale factors to the equalized spectrum. From that Fourier transform reproduction
of the original Fourier transform, the speech can be obtained through an inverse FFT
36, an inverse scaler 38, a digital to analog converter 40 and a reconstruction filter
42.
[0029] A distinct advantage of the present system is that the coder is not based on an assumed
fixed low pass spectrum model which is speech specific. Voice-band data and signaling
take the form of sine waves of some bandwidth which may occur at any frequency. Where
only a lower or an upper baseband of coefficients is transmitted, voice-band data
can be lost. With the present system, the subbands in which digital information is
transmitted are naturally selected because of their higher energy.
[0030] Another attractive feature of the coding system is its embedded data-rate codes capability.
Embedded coding, important as a method of congestion control in telephone applications,
allows the data to leave the encoder at a constant bit rate, yet be received at the
decoder at a lower bit rate as some bits are discarded enroute. Embedded coding implies
a packet or block of bits within which there is a hierarchy of subblocks. Least crucial
subblocks can be discarded first as the channel gets overloaded. This hierarchical
concept is a natural one in the present system where the partial-band information,
described by a set of frequency coefficients, is ordered in a decreasing significance
and the missing coefficients can always be approximated from the received ones. The
more coefficients in the set, the higher is the rate and the better is the quality.
However, speech quality degrades very gracefully with modest drops in the rate. The
implementation of an embedded coding system in conjunction with this approach is therefore
fairly simple and very attractive.
[0031] The coding technique described above provides for excellent speech coding and reproduction
at 16 kilobits per second. Excellent results as low as 8.0 kilobits per second can
be obtained by using this technique in conjunction with a frequency scaling technique
known as time domain harmonic scaling and described by D. Malah, "Time Domain Algorithms
for Harmonic Bandwidth Reduction and Time Scaling of Speech Signals", IEEE Trans.
Acoust., Speech, Signal Processing, Vol. ASSP-27, pp. 121-133, Apr. 1979. In that
approach, prior to performing the fast Fourier transform, speech at twice the rate
of the original speech but at the original pitch is generated by combining adjacent
pitch cycles. The frequency scaled speech can then be fast Fourier transformed in
the technique described above.
[0032] Although each of the steps of residual extraction, subband selection, and quantizing
and the steps of inverse quantizing, replication and envelope excitation are shown
as individual elements of the system, it will be recognized that they can be merged
in an actual system. For example, the residual spectrum for subbands which are not
transmitted need not be obtained. The system can be implemented using a combination
of software and hardware.
[0033] In another coding system, the shape of the spectrum is determined by a two-step process.
This process also encodes the shape of the entire 100 to 3,800 Hz spectrum since this
is useful in the baseband coding. In the first step, the spectrum is divided into
four regions illustrated in Fig. 7:
125 - 583 Hz
625 - 1959 Hz
2000 - 3416 Hz
3468 - 3833 Hz
These regions correspond roughly to the usual locations of the first four formants.
The dynamic range of the magnitudes of the spectral coefficients is much smaller within
each of these regions than in the spectrum as a whole. For voiced phonemes the peak
magnitude near 250 Hz can be 30 dB above the magnitudes near 3,800 Hz. The first step
of spectral normalization is performed by finding the peak magnitudes within each
region, quantizing these peaks to 5 bits each with a logarithmic quantizer, and dividing
each spectral coefficient by the quantized peak in its region. The result is a vector
of spectral coefficients with maximum magnitude equal to unity. The division into
regions should result in the spectral coefficients being reasonably uniformly distributed
within the complex disc of radius one.
[0034] The second step extracts more detailed structure. The spectrum is divided into equal
bands of about 165 Hz each. The peak magnitude within each band is located and quantized
to 3 bits. The complex spectral coefficients within the band are divided by the quantized
magnitude and coded to 6 bits each using a hexagonal quantizer. This coding preserves
phase information that is important for reconstruction of frame boundaries.
[0035] The specifics of this alternative approach are illustrated with reference to Figs.
7 through 11. In this system, the preprocessor 26 is a single-pole pre-emphasis filter.
Low frequencies are attenuated by about 5 dB. High frequencies are boosted. The highest
frequency (4 kHz) is boosted by about 24 dB. The filter is useful in equalizing the
spectrum by reducing the low-pass effects of the initializing filter and the high-frequency
attenuation of the lips. The boosting helps to maintain numerical accuracy in the
subsequent computation of the Fourier transform.
[0036] Within each of the four formant regions, the spectrum is normalized to a curve which
in this case is selected as a horizontal line through the peak magnitude of the spectrum
in each region. These curves are shown as lines 58, 60, 62 and 64 in Figure 7. The
peak magnitude of the complex numbers in each region is determined and encoded to
five bits at unit 66 of Fig. 11 by finding a value k which is encoded such that the
peak magnitude is between 162 x 2
12(k-1)/32 and 162 x 2
12k/32. This results in logarithmic encoding of the peak magnitude. The four k values, each
encoded in five bits, make up a total of 20 bits from the formant encoder which are
the most significant bits of the transmitted code for the window. All spectral coefficients
in each of the four regions are then divided by the 162 x 2
12k/32 in the spectral normalization unit 68. By this method, all of the resultant magnitudes,
illustrated in Figure 8, are less than 1.
[0037] Next, the normalized coefficients output from unit 68 are grouped into 20 regions
of four and two subregions of five illustrated in Figure 8. The peak magnitude in
each of these subregions is determined and encoded to three bits with a logarithmic
quantizer in unit 70. The peak is always coded to the next largest value. The three
bits from each of the 22 subregions provide an additional 66 bits of the final signal
for the window. Each output within a subregion is multipled by the reciprocal of the
quantized magnitude in the sample normalization unit 72, thus ensuring that all outputs
illustrated in Fig. 9 remain less than 1.
[0038] Each complex output from the baseband of 125 Hz to 1959 Hz of the normalized spectrum
of Fig. 9 is coded to six bits with the two dimensional quantizer and encoder 74.
The two-dimensional quantizer is formed by dividing a complex disc of radius one into
hexagons as shown in Figure 10. The x, y coordinates are radially warped by an exponential
function to approximate a logarithmic coding of the magnitude. All points within a
hexagon are quantized to the coordinates of the center of the hexagon. As a result,
coefficients of large magnitude are coded to better phase resolution than coefficients
of small magnitude. Actual quantization is done by table lookup, but efficient computational
algorithms are possible.
[0039] The bit allocation for a single frame may be summarized as follows:
| Formant region scale factors |
4 x 5 bits each = |
20 bits |
| Subband scale factors |
22 x 3 bits each = |
66 bits |
| Baseband components |
45 x 6 bits each = |
270 bits |
| TOTAL |
|
356 bits |
In a practical 16-kb/s transmission system, this allows 4 bits per frame for overhead
functions, such as frame synchronization. The actual coding transformations, bit allocations,
and subband sizes may be changed as the coder is optimized for different applications.
[0040] All normalization factors (four at 5 bits each, 22 at 3 bits each) and the coded
normalized baseband coefficients (45 at 6 bits) are transmitted. At the receiver the
baseband is decoded and duplicated into the upper frequency ranged. The normalization
factors are applied onto the spectrum to restore the original shape. Specifically,
in the receiver, the inverse Fourier Transform Inputs 0 to 2 and 93 to 96 are set
to zero. The normalized complex coefficients for Inputs 3 to 47 are reconstructed
from the quantizer codes by table lookup. They are duplicated into Positions 48 to
92. This duplication is the nonlinear regeneration step. The scale factors for the
subregions and larger regions are then applied.
[0041] The inverse transform is computed in unit 36. The effects of the windowing are removed
by adding the last 12 points of the previous inverse transform to the first 12 points
from the current inverse transform. The speech now passes through filter 38, which
is an inverse to the pre-emphasis filter and which attenuates the high frequencies,
removing the effects of the treble boost and reducing high-frequency quantization
noise. The outputs are converted to analog with a 12-bit linear analog to digital
converter 40.
[0042] The baseband which is repeated in the spectrum reconstruction has been described
as being a band of lower frequencies. However, the baseband may include any range
of frequencies within the spectrum. For some sounds where higher energy levels are
found in the higher frequencies, a baseband of the higher frequencies is preferred.
[0043] It should be noted that the baseband suffers degradations only from quantization
errors. The reconstruction of the upper frequencies is only as good as the model and
the shaping information. However, by ensuring that at least some coefficient in each
165-Hz band of the normalized baseband is at full scale, each formant is excited at
approximately the right frequency. This is an improvement over baseband residual excitation
in which some parts of the spectrum may have too little energy. The reduction in computational
complexity due to peak finding and scaling instead of linear prediction analysis and
filtering is very significant.
[0044] This approach is a wideband approach in that the entire voice frequency range is
coded. The major problem with other wideband systems at 16 kb/s is that there are
barely enough bits available to give a rough description of the waveform. Baseband
excitation systems such as the present system meet that problem by devoting most of
the bits to the baseband and regenerating the excitation signal for higher frequencies.
In a modification of the subband transform coding just described, one could code the
baseband as described above, but code only some measure of energy for the higher frequencies.
Frequency translation of the baseband regenerates the fine structure of the upper
spectrum.
[0045] While the invention has been particularly shown and described with reference to a
preferred embodiment thereof, it will be understood by those skilled in the art that
various changes in form and details may be made therein without departing from the
scope of the invention as defined by the appended claims.
1. A speech encoder comprising:
Fourier transform means (28) for performing a discrete Fourier transform of an incoming
speech signal to generate a discrete transform spectrum of coefficients;
normalizing means (30) for modifying the transform spectrum to provide a normalized,
flatter spectrum and for encoding a function by which the discrete spectrum is modified;
and
means (30) for encoding at least a portion of the spectrum.
characterized in that
said normalizing means (30) comprises means (44) for defining the approximate envelope
of the discrete spectrum in each of a plurality of subbands of coefficients and for
encoding the defined envelope of each subband of coefficients and means for scaling
each spectrum coefficient relative to the defined envelope of the respective subband
of coefficients; and
said means (30) for encoding encodes the scaled spectrum coefficients within each
subband in a number of bits determined by the defined envelope of the subband.
2. A speech coding system as claimed in claim 1 wherein the number of bits determined
for a plurality of subbands is zero such that the scaled coefficients for those subbands
are not transmitted.
3. A speech coding system as claimed in claim 2 wherein the scale coefficients of different
subbands are encoded in different numbers of bits other than zero.
4. A speech encoding system as claimed in claim 2 wherein the encoded speech is decoded
by replicating subbands of transmitted coefficients as substitutes for subbands of
nontransmitted coefficients, the transmitted coefficients being replicated such that
the nth subband which is transmitted is replicated as the nth subband which is not
transmitted.
5. A speech coding system as claimed in claim 1 wherein the coefficients of different
subbands are encoded in different numbers of bits other than zero.
6. A speech coding system as claimed in claim 1 wherein:
the means (30) for encoding encodes the scaled coefficients of less than all of the
subbands, the encoded scaled coefficients being those corresponding to the defined
envelopes of greater magnitude, with the scaled coefficients of subbands corresponding
to defined envelopes of greatest magnitudes being encoded in more bits than coefficients
of subbands corresponding to defined envelopes of lesser magnitudes.
7. A speech coding system as claimed in claim 6 wherein the encoded speech is decoded
by replicating subbands of transmitted coefficients as substitutes for subbands of
nontransmitted coefficients, the transmitted coefficients being replicated such that
the nth subband which is transmitted is replicated as the nth subband which is not
transmitted.
8. A speech coding system as claimed in claim 6 wherein the transform means (28) performs
a discrete Fourier transform.
9. A speech coding system as claimed in claim 1 wherein the normalizing means comprises:
means (44) for determining the maximum magnitude of the discrete spectrum within each
of a plurality of regions of the specturm; and
means for digitally encoding the maximum magnitude of each region; and
means (45) for scaling each coefficient of the discrete spectrum in each region to
the maximum magnitude of each region to provide a first set of normalized coefficients.
10. A speech coding system as claimed in claim 9 wherein the normalizing means further
comprises:
means for determining the maximum magnitude of the first set of normalized outputs
in each of a plurality of subregions of the spectrum;
means for digitally encoding the maximum magnitude of each subregion; and
means for scaling each output of the first set of normalized outputs to the maximum
magnitude of each subregion to provide a second set of normalized outputs.
11. A speech encoder as claimed in Claim 10 wherein each of the maximum magnitudes is
logarithmically encoded.
12. A speech encoder as claimed in Claim 10 wherein the maximum magnitude is determined
for each of four regions corresponding to the first four formants.
13. A speech encoder as claimed in Claim 10 wherein only a baseband of the normalized
spectrum is encoded.
14. A method of encoding speech comprising:
performing a discrete Fourier transform of a window of speech to generate a discrete
transform spectrum;
providing a normalized spectrum by defining at least one curve approximating the
magnitude of the discrete spectrum, digitally encoding the defined curve and defining
the discrete spectrum relative to the defined curve; and
encoding at least a portion of the normalized spectrum,
characterized in that
the normlized spectrum is provided by defining the approximate envelope of the
discrete spectrum in each of a plurality of subbands of coefficients and digitally
encoding the defined envelope of each subband of coefficients and scaling each coefficient
relative to the defined magnitude of the respective subband of coefficients; and
the scaled coefficients within each subband are encoded into a number of bits determined
by the defined envelope of the subband.
15. The method as claimed in Claim 14 wherein the number of bits determined for a plurality
of subbands is zero such that the scaled coefficients for those subbands are not transmitted.
16. The method as claimed in Claim 15 wherein the scaled coefficients of different subbands
are encoded in different numbers of bits other than zero.
17. The method as claimed in Claim 15 wherein the encoded speech is decoded by replicating
subbands of transmitted coefficients as substitutes for subbands of nontransmitted
coefficients, the transmitted coefficients being replicated such that the nth subband
which is transmitted is replicated as the nth subband which is not transmitted.
18. A method as claimed in Claim 14 wherein the normalized spectrum is provided by;
determining a maximum magnitude of the discrete spectrum within each of a plurality
of regions of the spectrum;
digitally encoding the maximum magnitude of each region; and
scaling each coefficient of the discrete spectrum in each region to the maximum
magnitude of each region to provide a set of normalized coefficients.
1. Sprachcodierer, enthaltend:
eine Fourier-Transformationseinrichtung (28) zur Ausführung einer diskreten Fourier-Transformation
eines ankommenden Sprachsignals zur Erzeugung eines diskreten Transformationsspektrums
von Koeffizienten;
eine Normierungseinrichtung (30) zum Modifizieren des Transformationsspektrums zur
Erzeugung eines normierten, flacheren Spektrums und zum Codieren einer Funktion, durch
die das diskrete Spektrum modifiziert wird; und
eine Einrichtung (30) zum Codieren wenigstens eines Teils des Spektrums,
dadurch gekennzeichnet, daß
die Normierungseinrichtung (30) eine Einrichtung (44) zum Definieren der approximierten
Einhüllenden des diskreten Spektrums in jedem von mehreren Unterbändern von Koeffizienten
und zum Codieren der definierten Einhüllenden eines jedes Unterbandes von Koeffizienten
und Einrichtungen zum Skalieren jedes Spektrumkoeffizienten relativ zur definierten
Einhüllenden des betreffenden Unterbandes von Koeffizienten aufweist; und
die Einrichtung (30) zum Codieren, die die skalierten Spektrumkoeffizienten innerhalb
jedes Unterbandes in eine Anzahl von Bits codiert, die durch die definierte Einhüllende
des Unterbandes bestimmt wird.
2. Sprachcodiersystem nach Anspruch 1, bei der die Anzahl von Bits, die für mehrere Unterbänder
bestimmt wird, gleich Null ist, so daß die skalierten Koeffizienten für jene Unterbänder
nicht übertragen werden.
3. Sprachcodiersystem nach Anspruch 2, bei dem die Skalierkoeffizienten verschiedener
Unterbänder in verschiedene Anzahlen von Bits codiert werden, die von Null verschieden
sind.
4. Sprachcodiersystem nach Anspruch 2, bei dem die codierte Sprache durch Wiederholung
von Unterbändern übertragener Koeffizienten als Ersatz für Unterbänder nicht-übertragener
Koeffizienten decodiert wird, wobei die übertragenen Koeffizienten derart wiederholt
werden, daß das übertragene n-te Unterband als das nicht-übertragene n-te Unterband
wiederholt wird.
5. Sprachcodiersystem nach Anspruch 1, bei dem die Koeffizienten unterschiedlicher Unterbänder
in verschiedene Anzahlen von Bits codiert werden, die ungleich Null sind.
6. Sprachcodiersystem nach Anspruch 1, bei dem die Einrichtungen (30) zum Codieren die
skalierten Koeffizienten von weniger als allen Unterbändern codieren, wobei die codierten
skalierten Koeffizienten jene sind, die den definierten Einhüllenden größerer Amplitude
entsprechen, wobei die skalierten Koeffizienten von Unterbändern, die den definierten
Einhüllenden größter Amplituden entsprechen, in mehr Bits codiert werden als die Koeffizienten
von Unterbändern, die den definierten Einhüllenden kleinerer Amplituden entsprechen.
7. Sprachcodiersystem nach Anspruch 6, bei dem die codierte Sprache durch Wiederholen
von Unterbändern übertragener Koeffizienten als Ersatz für Unterbänder nicht-übertragener
Koeffizienten decodiert werden, wobei die übertragenen Koeffizienten derart wiederholt
werden, daß das übertragene n-te Unterband als das nicht-übertragene n-te Unterband
wiederholt wird.
8. Sprachcodiersystem nach Anspruch 6, bei dem die Transformatinseinrichtung (28) eine
diskrete Fouriertransformation ausführt.
9. Sprachcodiersystem nach Anspruch 1, bei dem die Normierungseinrichtung enthält:
eine Einrichtung (44) zur Bestimmung der maximalen Amplitude des diskreten Spektrums
innerhalb jedes an mehreren Bereichen des Spektrums; und
eine Einrichtung zum digitalen Codieren der maximalen Amplitude jedes Bereichs; und
eine Einrichtung (45) zum Skalieren jedes Koeffizienten des diskreten Spektrums in
jedem Bereich auf die maximale Amplitude eines jeden Bereiches zur Erzeugung eines
ersten Satzes normierter Koeffizienten.
10. Sprachcodiersystem nach Anspruch 9, bei dem die Normierungseinrichtung weiterhin enthält:
Einrichtung zum Bestimmen der maximalen Amplitude des ersten Satzes normierter Ausgänge
in jedem von mehreren Unterbereichen des Spektrums;
Einrichtung zum digitalen Codieren der maximalen Amplitude jedes Unterbereichs; und
Einrichtung zum Skalieren jedes Ausgangs des ersten Satzes normierter Ausgänge zur
Maximalamplitude jedes Unterbereiches zur Erzeugung eines zweiten Satzes normierter
Ausgänge.
11. Sprachcodierer nach Anspruch 10, bei dem jede der maximalen Amplituden logarithmisch
codiert wird.
12. Sprachcodierer nach Anspruch 10, bei dem die maximale Amplitude für jeden von vier
Bereichen entsprechend der ersten vier Formate bestimmt wird.
13. Sprachcodierer nach Anspruch 10, bei dem nur ein Basisband des normtierten Spektrums
codiert wird.
14. Verfahren zur Sprachcodierung, enthaltend:
Ausführen einer diskreten Fouriertransformation eines Sprachfensters zur Erzeugung
eines diskreten Transformationsspektrums;
Erzeugen eines normierten Spektrums durch Definition wenigstens einer Kurve, die die
Amplitude des diskreten Spektrums approximiert, digitales Codieren der definierten
Kurve und Definition des diskreten Spektrums bezüglich der definierten Kurve; und
Codieren wenigstens eines Teils des normierten Spektrums,
dadurch gekennzeichnet, daß
das normierte Spektrum durch Definition der approximierten Einhüllenden des diskreten
Spektrums in jedem von mehreren Unterbändern von Koeffizienten und digitales Codieren
der definierten Einhüllenden jedes Unterbandes von Koeffizienten und Skalieren jedes
Koeffizienten bezüglich der definierten Amplitude des betreffenden Unterbandes von
Koeffizienten erzeugt wird; und
die skalierten Koeffizienten innerhalb jedes Unterbandes in eine Anzahl von Bits codiert
werden, die durch die definierte Einhüllende des Unterbandes bestimmt wird.
15. Verfahren nach Anspruch 14, bei dem die Anzahl von Bits, die für mehrere Unterbänder
bestimmt wird, gleich Null ist, so daß die skalierten Koeffizienten für jene Unterbänder
nicht übertragen werden.
16. Verfahren nach Anspruch 15, bei dem die skalierten Koeffizienten verschiedener Unterbänder
in verschiedene Anzahlen von Bits codiert werden, die ungleich Null sind.
17. Verfahren nach Anspruch 15, bei dem die codierte Sprache durch Wiederholen von Unterbändern
übertragener Koeffizienten als Ersatz für Unterbänder nicht-übertragener Koeffizienten
decodiert wird, wobei die übertragenen Koeffizienten derart wiederholt werden, daß
das übertragene n-te Unterband durch das nicht-übertragene n-te Unterband wiederholt
wird.
18. Verfahren nach Anspruch 14, bei dem das normierte Spektrum erzeugt wird durch:
Bestimmen einer Maximalamplitude des diskreten Spektrums innerhalb jedes von mehreren
Bereichen des Spektrums;
digitales Codieren der Maximalamplitude jedes Bereiches; und
Skalieren jedes Koeffizienten des diskreten Spektrums in jedem Bereich zur Maximalamplitude
jedes Bereiches zur Erzeugung eines Satzes normierter Koeffizienten.
1. Codeur de la parole comprenant :
un moyen de transformation de Fourier (28) assurant une transformation discrète
de Fourier d'un signal de parole entrant pour engendrer un spectre transformé discret
de coefficients;
un moyen de normalisation (30) pour modifier le spectre transforme pour obtenir
un spectre normalisé plus plat et pour coder une fonction par laquelle le spectre
discret est modifié; et
un moyen (30) pour coder au moins une partie du spectre,
caractérisé en ce que
le dit moyen de normalisation (30) comprend un moyen (44) pour définir l'enveloppe
approximée du spectre discret dans chacune d'une pluralité de sous-bandes de coefficients
et pour coder l'enveloppe définie de chaque sous-bande de coefficients et un moyen
pour établir chaque coefficient du spectre par rapport à l'enveloppe définie de la
sous-bande respective de coefficients; et
le dit moyen (30) pour coder code les coefficients établis du spectre à l'intérieur
de chaque sous-bande dans un nombre de binons déterminé par l'enveloppe définie de
la sous-bande.
2. Système de codage de la parole selon la revendication 1 dans lequel le nombre déterminé
de binons pour une pluralité de sous-bandes est zéro, de telle façon que les coefficients
établis pour ces sous-bandes ne soient pas transmis.
3. Système de codage de la parole selon la revendication 2 dans lequel les coefficients
établis de différentes sous-bandes sont codés en différents nombres de binons autres
que zéro.
4. Système de codage de la parole selon la revendication 2 dans lequel la parole codée
est codée en copiant des sous-bandes des coefficients transmis en tant que substituts
pour les sous-bandes de coefficients non-transmis, les coefficients transmis étant
copiés de telle façon que la nième sous-bandes qui est transmisé soit copiée en tant que nième sous-bande qui n'est pas transmisé.
5. Système de codage de la parole selon la revendication 1 dans lequel les coefficients
des différentes sous-bandes sont codés en différents nombres de binons autres que
zéro.
6. Système de codage de la parole selon la revendication 1 dans lequel :
le moyen de codage (30) code les coefficients établis de moins de toutes les sous-bandes,
les coefficients établis codes étant ceux correspondant aux enveloppes définies de
plus grande amplitude, les coefficients établis des sous-bandes correspondant aux
enveloppes définies de plus grande amplitude étant codés en plus de binons que les
coefficients des sous-bandes correspondant aux enveloppes définies d'amplitude plus
petite.
7. Système de codage de la parole selon la revendication 6 dans lequel la parole codée
est décodée en copiant des sous-bandes de coefficients transmis en tant que substituts
pour des sous-bandes de coefficients non-transmis, les coefficients transmis étant
copiés de telle manière que la nième sous-bande qui est transmisé soit copiée en tant que nième sous-bande qui n'est pas transmise.
8. Système de codage de la parole selon la revendication 6 dans lequel le moyen de transformation
(28) réalise une transformation discrète de Fourier.
9. Système de codage de la parole selon la revendication 1 dans lequel le moyen de normalisation
comprend :
un moyen (44) pour déterminer l'amplitude maximale du spectre discret à l'intérieur
de chacune d'une pluralité de régions du spectre; et
un moyen pour coder en numérique l'amplitude maximale de chaque région; et
un moyen (45) pour établir chaque coefficient du spectre discret dans chaque région
par rapport à l'amplitude maximale de chaque région pour obtenir un premier ensemble
de coefficients normalisés.
10. Système de codage de la parole selon la revendication 9 dans lequel le moyen de normalisation
comprend, en outre :
un moyen pour déterminer l'amplitude maximale du premier ensemble de sorties normalisées
dans chacune d'une pluralité de sous-régions du spectre;
un moyen pour coder en numérique l'amplitude maximale de chaque sous-région; et
un moyen pour établir chaque sortie du premier ensemble de sorties normalisées
par rapport à l'amplitude maximale de chaque sous-région pour obtenir un deuxième
sous-ensemble de sorties normalisées.
11. Codeur de la parole selon la revendication 10 dans lequel chacune des amplitudes maximales
est codée de façon logarithmique.
12. Codeur de la parole selon la revendation 10 dans lequel l'amplitude maximale est déterminée
pour chacune des quatre régions correspondantes aux premiers quatre formants.
13. Système de codage de la parole selon la revendication 10 dans lequel seule une bande
de base du spectre normalisé est codée.
14. Procédé de codage de la parole comprenant les étapes suivantes :
réalisation d'une transformation discrète de Fourier d'une fenêtre de parole pour
engendrer un spectre transforme discret;
obtention d'un spectre normalisé en définissant au moins une courbe approximant
l'amplitude du spectre discret, en codant en numérique la courbe définie et en définissant
le spectre discret par rapport à la courbe définie; et
codage d'au moins une partie du spectre normalisé.
caractérisé en ce que
le spectre normalisé est obtenu en définissant l'enveloppe approximée du spectre
discret dans chacune d'une pluralité de sous-bandes de coefficients et en codant en
numérique l'enveloppe définie de chaque sous-bande de coefficients et en établissant
chaque coefficient par rapport à l'amplitude définie de la sous-bande respective de
coefficients; et
les coefficients établis à l'intérieur de chaque sous-bande sont codés en un nombre
de binons déterminé par l'enveloppe définie de la sous-bande.
15. Procédé selon la revendication 14 dans lequel le nombre de binons déterminé pour une
pluralité de sous-bandes est zéro, de telle façon que les coefficients établis pour
ces sous-bandes ne soient pas transmis.
16. Procédé selon la revendication 15 dans lequel les coefficients établis de différentes
sous-bandes sont codés en différents nombres de binons autres que zéro.
17. Procédé selon la revendication 15 dans lequel la parole codée est décodée par copie
de sous-bandes de coefficients transmis en tant que substituts pour des sous-bandes
de coefficients non-transmis, les coefficients transmis étant copiés de telle façon
que la nième sous-bande qui est transmisé soit copiée en tant que nième sous-bande qui n'est pas transmisé.
18. Procédé selon la revendication 14 dans lequel le spectre normalisé est obtenu en :
déterminant une amplitude maximale du spectre discret à l'intérieur de chacune
d une pluralité de régions du spectre;
codant en numérique l'amplitude maximale de chaque région; et
établissant chaque coefficient du spectre discret dans chaque région par rapport
à l'amplitude maximale de chaque région pour déterminer un ensemble de coefficients
normalisés.