(19)
(11) EP 0 208 712 B1

(12) EUROPEAN PATENT SPECIFICATION

(45) Mention of the grant of the patent:
07.04.1993 Bulletin 1993/14

(21) Application number: 86900480.4

(22) Date of filing: 11.12.1985
(51) International Patent Classification (IPC)5G10L 5/00
(86) International application number:
PCT/US8502/448
(87) International publication number:
WO 8603/872 (03.07.1986 Gazette 1986/14)

(54)

ADAPTIVE METHOD AND APPARATUS FOR CODING SPEECH

ANPASSBARES VERFAHREN UND VORRICHTUNG FÜR SPRACHKODIERUNG

PROCEDE ET APPAREIL ADAPTATIFS DE CODAGE DE LA PAROLE


(84) Designated Contracting States:
BE DE FR GB IT

(30) Priority: 20.12.1984 US 684382
14.11.1985 US 798174

(43) Date of publication of application:
21.01.1987 Bulletin 1987/04

(73) Proprietor: GTE LABORATORIES INCORPORATED
Wilmington, DE 19800 (US)

(72) Inventors:
  • ZIBMAN, Israel, Bernard
    Newton, MA 02159 (US)
  • MAZOR, Baruch
    Newton Centre, MA 02159 (US)
  • VEENEMAN, Dale, E.
    Southborough, MA 01722 (US)

(74) Representative: Grünecker, Kinkeldey, Stockmair & Schwanhäusser Anwaltssozietät 
Maximilianstrasse 58
80538 München
80538 München (DE)


(56) References cited: : 
EP-A- 0 124 728
DE-A- 3 102 822
US-A- 4 388 491
EP-A- 0 176 243
US-A- 4 330 689
US-A- 4 535 472
   
  • IEEE TRANSACTIONS ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, vol. ASSP-31, no. 5, October 1983, pages 1323-1327, IEEE, New York, US; B.N. SURESH BABU: "Performance of an FFT-based voice coding system in quiet and noisy environments"
  • IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, 1981, KANG et al.: "Mediumband Speech Processor", pages 820-823
   
Note: Within nine months from the publication of the mention of the grant of the European patent, any person may give notice to the European Patent Office of opposition to the European patent granted. Notice of opposition shall be filed in a written reasoned statement. It shall not be deemed to have been filed until the opposition fee has been paid. (Art. 99(1) European Patent Convention).


Description


[0001] The invention refers to a speech encoder as set forth in the preamble of claim 1. A speech encoder of this type is known from EP-8-0 176 243.

[0002] In the aforementioned document, a coder for speech signals is disclosed comprising separation means for receiving speech signals and generating series of values, each series representing respective portions of the frequency spectrum of the input signal and, encoding means for digitally encoding each series, and bit location means for varying the number of bits used for encoding the respective series in dependence on the relative energy content thereof, wherein the number of series to which any given number of bits is allocated is constant and only the selection of the series to which respective numbers of bits are allocated is varied.

[0003] Conventional analog telephone systems are being replaced by digital systems. In digital systems, the analog signals are sampled at a rate of about twice the bandwidth of the analog signals or about eight kilohertz, and the samples are then encoded. In a simple pulse code modulation system (PCM), each sample is quantized as one of a discrete set of prechosen values and encoded as a digital word which is then transmitted over the telephone lines. With eight bit digital words, for example, the analog sample is quantized to 2⁸ or 256 levels, each of which is designated by a different eight bit word. Using nonlinear quantization, excellent quality speech can be obtained with only seven bits per sample; but since a seven bit word is still required for each sample, transmission bit rates of 56 kilobits per second are necessary.

[0004] Efforts have been made to reduce the bit rates required to encode the speech and obtain a clear decoded speech signal at the receiving end of the system. The linear predictive coding (LPC) technique is based on the recognition that speech production involves excitation and a filtering process. The excitation is determined by the vocal cord vibration for voiced speech and by turbulence for unvoiced speech, and that actuating signal is then modified by the filtering process of vocal resonance chambers, including the mouth and nasal passages. For a particular group of samples, a digital filter which simulates the formant effects of the resonance chambers can be defined and the definition can be encoded. A residual signal which approximates the excitation can then be obtained by passing the speech signal through an inverse formant filter, and the residual signal can be encoded. Because sufficient information is contained in the lower-frequency portion of the residual spectrum, it is possible to encode only the low frequency baseband and still obtain reasonably clear speech. At the receiver, a definition of the formant filter and the residual baseband are decoded. The baseband is repeated to complete the spectrum of the residual signal. By applying the decoded filter to the repeated baseband signal, the initial speech can be reconstructed.

[0005] A major problem of the LPC approach is in defining the formant filter which must be redefined with each window of samples. A complex encoder and a complex decoder are required to obtain transmission rates as low as 16,000 bits per second. Another problem with such systems is that they do not always provide a satisfactory reconstruction of certain formants such as that resulting, for example, from nasal resonance. It is the object of the invention to solve these problems.

[0006] This object is attained by the characterizing features of claim 1 or 14, respectively. Preferred embodiments of the invention are subject matter of the sub-claims.

[0007] In one system, the approximate envelope of the transform spectrum in each of a plurality of subbands of coefficients is defined and each envelope definition is encoded for transmission. Each spectrum coefficient is then scaled relative to the defined envelope of the respective subband, and each scaled coefficient is encoded in a number of bits which is determined by the defined envelope of its subband.

[0008] Zero bits may be allotted to a number of less significant subbands as indicated by the defined envelopes; and varying numbers of bits may be used for each encoded coefficient depending on the magnitude of the defined envelope for the respective subband. Thus, the subbands which are transmitted and the resolution with which the transmitted subbands are encoded are determined adaptively for each sample window based on the defined envelopes of the subbands.

[0009] At the receiver, the subbands which are transmitted are replicated to define coefficients of frequencies which are not transmitted. A list replication procedure is followed by which an nth coefficient which is transmitted is replicated as an nth coefficient which is not transmitted. After replication the speech signal can be recreated by using the transmitted envelope definitions to inverse scale the coefficients of the respective subbands and by performing an inverse transform.

[0010] In another system the spectrum is normalized first with respect to only a few regions and subsequently with respect to a greater number of subregions. The maximum magnitude in each of the regions and in each of the subregions is encoded. The maximums are logarithmically encoded and only a baseband of the normalized spectrum is encoded.

[0011] The foregoing and other objects, features, and advantages of the invention will be apparent from the following more particular description of a preferred embodiment of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention.

Fig. 1 is a block diagram illustration of an encoder and a decoder embodying the present invention;

Figure 2 is a block diagram of a speech encoder and corresponding decoder of a preferred implementation of the system of Figure 1.

Figure 3 is an example of a magnitude spectrum of the Fourier transform of a window of speech illustrating principles of the system of Figure 2.

Figure 4 is an example spectrum normalized from that of Figure 3 based on principles of the present invention.

Figure 5 schematically illustrates a quantizer for complex values of the normalized spectrum.

Figure 6 is an example illustration of coefficient groups which are transmitted and illustrates the replication technique of the system of Figure 2.

Figure 7 is an example of a magnitude spectrum of a window of speech illustrating principles of another system embodying the present invention.

Figure 8 is an example spectrum normalized from the spectrum of Fig. 7 using four formant regions;

Figure 9 is an example spectrum normalized from that of Fig. 8 in subbands;

Figure 10 schematically illustrates a quantizer for complex values of the normalized spectrum;

Figure 11 is a block diagram illustration of the spectral equalization encoding circuit of Fig. 1 in the alternative embodiment.



[0012] A block diagram of the system is shown in Fig. 1. Speech is filtered with a telephone bandpass filter 20 which prevents aliasing when the signal is sampled 8,000 times per second in sampling circuit 22. The analog samples are digitally encoded in an analog to digital encoder 24 and are preprocessed at 26 prior to being applied to a discrete Fourier transform unit 28.

[0013] The output of the Fourier transform circuit 28 is a sequence of coefficients which indicate the magnitude and phase of the Fourier transform spectrum at each of 97 frequencies spaced 41.667 hertz apart. The magnitude spectrum of the Fourier transform output is illustrated as a continuous function in Fig. 3 but it is recognized that the transform circuit 28 would actually provide only 97 incremental outputs.

[0014] In accordance with the present invention, the Fourier transform spectrum of the full speech within a selected window is equalized and encoded in circuit 30 in a manner which will be discussed below. The resultant digital signal can be transmitted at 16,000 bits per second over a line 32 to a receiver. At the receiver the full spectrum of Fig. 3 is reconstructed in circuit 34. The inverse Fourier transform is performed in circuit 36 and applied through a post-processor 38 corresponding to the pre-processor 26. That signal is then converted to analog form in digital to analog converter 40. Final filtering in filter 42 provides clear speech to the listener.

[0015] In a preferred system, a pipelined multiprocessor architecture is employed. One microcomputer is dedicated to the analog to digital conversion with preemphasis filtering, one is dedicated to the forward Fourier transform and a third is dedicated to the spectral equalization and coding. Similarly, in the receiver, one microcomputer is dedicated to spectrum reconstruction, another to inverse Fourier transform and a third to digital to analog conversion with deemphasis filtering.

[0016] The spectral equalization and encoding technique of the present invention is based on the recognition that the Fourier transform of the total signal includes a relatively flat spectrum of the pitch illustrated in Fig. 4 shaped by formant signals. In the present system, the signal of Fig. 4 is obtained by normalizing the spectrum of Fig. 3 to at least one curve which itself can be encoded separate from the residual spectrum of Fig. 4.

[0017] One implementation of the coding system of Figure 1 is shown in Figure 2. Prior to compression, the analog speech signal is low pass filtered in filter 20 at 3.4 kilohertz, sampled in sampler 22 at a rate of 8 kilohertz, and digitized using a 12 bit linear analog to digital converter 24. It will be recognized that the input to the encoder may already be in digital form and may require conversion to the code which can be accepted by the encoder. The digitized speech signal, in frames of N samples, is first scaled up in a scaler 26 to maximize its dynamic range in each frame. The scaled input samples are then Fourier transformed in a fast Fourier transform device 28 to obtain a corresponding discrete spectrum represented by (N/2)+ 1 complex frequency coefficients.

[0018] In a specific implementation, the input frame size equals 180 samples and corresponds to a frame every 22.5 milliseconds. However, the discrete Fourier transform is performed on 192 samples, including 12 samples overlapped with the previous frame, preceded by trapezoidal windowing with a 12 point slope at each end. The resulting output of the FFT includes 97 complex frequency coefficients spaced 41.667 Hertz apart.

[0019] An example magnitude spectrum of a Fourier transform output from FFT 28 is illustrated in Figure 3. Although illustrated as a continuous function, it is recognized that the transform circuit 28 actually provides only 97 incremental complex outputs.

[0020] The magnitude spectrum of the Fourier transform output is equalized and encoded. To that end, the spectrum is partitioned into contiguous subbands and a spectral envelope estimate is based on a piecewise approximation of those subbands at 44. In a specific implementation, the spectrum is divided into twenty subbands, each including four complex coefficients. Frequencies above 3291.67 Hertz are not encoded and are set to zero at the receiver. To equalize the spectrum, the spectral envelope of each subband is assumed constant and is defined by the peak magnitude in each subband as illustrated by the horizontal lines in Figure 3. Each magnitude, or more correctly the inverse thereof, can be treated as a scale factor for its respective subband. Each scale factor is quantized in a quantizer 45 to four bits.

[0021] By then multiplying at 46 the magnitude of each coefficient of the spectrum by the scale factor associated with that coefficient, the flattened residual spectrum of Figure 4 is obtained. This flattening of the spectrum is equivalent to inverse filtering the signal based on the piecewise-constant estimate of the spectral envelope.

[0022] Only selected subbands of the flattened spectrum of Figure 4 are quantized and transmitted. Selection at 48 of subbands to be transmitted is based on the scale factor of the subbands. In a specific implementation, the 12 subbands having the smallest scale factors, that is the largest energy, are encoded and transmitted. For the eight lower energy subbands only the scale factors are transmitted.

[0023] A nonuniform bit allocation is used for the complex coefficients which are transmitted. Three separate two dimensional quantizers 50 are used for the transmitted 12 subbands. The sixteen complex coefficients of the four subbands having the smallest scale factors are quantized to seven bits each. The coefficients of the four subbands having the next smallest scale factors are quantized to six bits each, and the coefficients of the remaining four of the transmitted subgroups are quantized to four bits each. In effect, the coefficients of the eight subbands which are not transmitted are quantized to zero bits.

[0024] Each of the two dimensional quantizers is designed using an approach presented by Linde, et al., "An Algorithm for Vector Quantizer Design," IEEE Trans on Commun, Vol COM-28, pp. 84-95, Jan 1980. The result for the seven bit quantizer is shown in Figure 5. The two dimensions of the quantizer are the real and imaginary components of each complex coefficient. Each cluster has a seven bit representation to which each complex point in the cluster is quantized. Actual quantization may be by table look-up in a read only memory.

[0025] The bit allocation for a single frame may be summarized as follows:
Scale factors 20 x 4 bits each = 80 bits
16 x 7 bits = 112 bits
16 x 6 bits = 96 bits
16 x 4 bits = 64 bits
Time scaling = 4 bits
Synchronization = 4 bits
TOTAL 360 bits


[0026] At the receiver, the transmitted 12 groups of coefficients are applied to corresponding seven bit, six bit and four bit inverse quantizers at 52. The frequency subbands to which the resulting coefficients correspond are determined by the scale factors which are transmitted in sequence for all subbands. Thus, the coefficients from the seven bit inverse quantizer are placed in the subbands which the scale factors indicate to be of the greatest magnitude.

[0027] The coefficients of the eight subbands which are not transmitted are approximated by replication of transmitted subbands at 54. To that end, a list replication approach is utilized. This approach is illustrated by Figure 6. In Figure 6, the coefficients for each subband are illustrated by a single vector. The transmitted subbands are indicated as T1, T2, T3, . . .Tn, . . . and the subbands which must be produced by replication in the receiver are indicated as R1, R2, R3, . . . Rn, . . . In accordance with the replication technique of the present system, the coefficients of the subband Tn are used both for Tn and for Rn. Thus, the scaled coefficients for subband T1 are repeated at subband R1, those of subband T2 are repeated at R2, and those at subband T3 are repeated at R3. The rationale for this list replication technique is that subbands are themselves usually grouped in blocks of transmitted subbands and blocks of nontransmitted subbands. Thus, large blocks of coefficients are typically repeated using this approach and speech harmonics are maintained in the replication process.

[0028] Once the equalized spectrum of Figure 4 is recreated by replication of subbands, a reproduction of the spectrum of Figure 3 can be generated at 56 by applying the scale factors to the equalized spectrum. From that Fourier transform reproduction of the original Fourier transform, the speech can be obtained through an inverse FFT 36, an inverse scaler 38, a digital to analog converter 40 and a reconstruction filter 42.

[0029] A distinct advantage of the present system is that the coder is not based on an assumed fixed low pass spectrum model which is speech specific. Voice-band data and signaling take the form of sine waves of some bandwidth which may occur at any frequency. Where only a lower or an upper baseband of coefficients is transmitted, voice-band data can be lost. With the present system, the subbands in which digital information is transmitted are naturally selected because of their higher energy.

[0030] Another attractive feature of the coding system is its embedded data-rate codes capability. Embedded coding, important as a method of congestion control in telephone applications, allows the data to leave the encoder at a constant bit rate, yet be received at the decoder at a lower bit rate as some bits are discarded enroute. Embedded coding implies a packet or block of bits within which there is a hierarchy of subblocks. Least crucial subblocks can be discarded first as the channel gets overloaded. This hierarchical concept is a natural one in the present system where the partial-band information, described by a set of frequency coefficients, is ordered in a decreasing significance and the missing coefficients can always be approximated from the received ones. The more coefficients in the set, the higher is the rate and the better is the quality. However, speech quality degrades very gracefully with modest drops in the rate. The implementation of an embedded coding system in conjunction with this approach is therefore fairly simple and very attractive.

[0031] The coding technique described above provides for excellent speech coding and reproduction at 16 kilobits per second. Excellent results as low as 8.0 kilobits per second can be obtained by using this technique in conjunction with a frequency scaling technique known as time domain harmonic scaling and described by D. Malah, "Time Domain Algorithms for Harmonic Bandwidth Reduction and Time Scaling of Speech Signals", IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-27, pp. 121-133, Apr. 1979. In that approach, prior to performing the fast Fourier transform, speech at twice the rate of the original speech but at the original pitch is generated by combining adjacent pitch cycles. The frequency scaled speech can then be fast Fourier transformed in the technique described above.

[0032] Although each of the steps of residual extraction, subband selection, and quantizing and the steps of inverse quantizing, replication and envelope excitation are shown as individual elements of the system, it will be recognized that they can be merged in an actual system. For example, the residual spectrum for subbands which are not transmitted need not be obtained. The system can be implemented using a combination of software and hardware.

[0033] In another coding system, the shape of the spectrum is determined by a two-step process. This process also encodes the shape of the entire 100 to 3,800 Hz spectrum since this is useful in the baseband coding. In the first step, the spectrum is divided into four regions illustrated in Fig. 7:
125 - 583 Hz
625 - 1959 Hz
2000 - 3416 Hz
3468 - 3833 Hz
These regions correspond roughly to the usual locations of the first four formants. The dynamic range of the magnitudes of the spectral coefficients is much smaller within each of these regions than in the spectrum as a whole. For voiced phonemes the peak magnitude near 250 Hz can be 30 dB above the magnitudes near 3,800 Hz. The first step of spectral normalization is performed by finding the peak magnitudes within each region, quantizing these peaks to 5 bits each with a logarithmic quantizer, and dividing each spectral coefficient by the quantized peak in its region. The result is a vector of spectral coefficients with maximum magnitude equal to unity. The division into regions should result in the spectral coefficients being reasonably uniformly distributed within the complex disc of radius one.

[0034] The second step extracts more detailed structure. The spectrum is divided into equal bands of about 165 Hz each. The peak magnitude within each band is located and quantized to 3 bits. The complex spectral coefficients within the band are divided by the quantized magnitude and coded to 6 bits each using a hexagonal quantizer. This coding preserves phase information that is important for reconstruction of frame boundaries.

[0035] The specifics of this alternative approach are illustrated with reference to Figs. 7 through 11. In this system, the preprocessor 26 is a single-pole pre-emphasis filter. Low frequencies are attenuated by about 5 dB. High frequencies are boosted. The highest frequency (4 kHz) is boosted by about 24 dB. The filter is useful in equalizing the spectrum by reducing the low-pass effects of the initializing filter and the high-frequency attenuation of the lips. The boosting helps to maintain numerical accuracy in the subsequent computation of the Fourier transform.

[0036] Within each of the four formant regions, the spectrum is normalized to a curve which in this case is selected as a horizontal line through the peak magnitude of the spectrum in each region. These curves are shown as lines 58, 60, 62 and 64 in Figure 7. The peak magnitude of the complex numbers in each region is determined and encoded to five bits at unit 66 of Fig. 11 by finding a value k which is encoded such that the peak magnitude is between 162 x 2 12(k-1)/32 and 162 x 2 12k/32. This results in logarithmic encoding of the peak magnitude. The four k values, each encoded in five bits, make up a total of 20 bits from the formant encoder which are the most significant bits of the transmitted code for the window. All spectral coefficients in each of the four regions are then divided by the 162 x 2 12k/32 in the spectral normalization unit 68. By this method, all of the resultant magnitudes, illustrated in Figure 8, are less than 1.

[0037] Next, the normalized coefficients output from unit 68 are grouped into 20 regions of four and two subregions of five illustrated in Figure 8. The peak magnitude in each of these subregions is determined and encoded to three bits with a logarithmic quantizer in unit 70. The peak is always coded to the next largest value. The three bits from each of the 22 subregions provide an additional 66 bits of the final signal for the window. Each output within a subregion is multipled by the reciprocal of the quantized magnitude in the sample normalization unit 72, thus ensuring that all outputs illustrated in Fig. 9 remain less than 1.

[0038] Each complex output from the baseband of 125 Hz to 1959 Hz of the normalized spectrum of Fig. 9 is coded to six bits with the two dimensional quantizer and encoder 74. The two-dimensional quantizer is formed by dividing a complex disc of radius one into hexagons as shown in Figure 10. The x, y coordinates are radially warped by an exponential function to approximate a logarithmic coding of the magnitude. All points within a hexagon are quantized to the coordinates of the center of the hexagon. As a result, coefficients of large magnitude are coded to better phase resolution than coefficients of small magnitude. Actual quantization is done by table lookup, but efficient computational algorithms are possible.

[0039] The bit allocation for a single frame may be summarized as follows:
Formant region scale factors 4 x 5 bits each = 20 bits
Subband scale factors 22 x 3 bits each = 66 bits
Baseband components 45 x 6 bits each = 270 bits
TOTAL   356 bits

In a practical 16-kb/s transmission system, this allows 4 bits per frame for overhead functions, such as frame synchronization. The actual coding transformations, bit allocations, and subband sizes may be changed as the coder is optimized for different applications.

[0040] All normalization factors (four at 5 bits each, 22 at 3 bits each) and the coded normalized baseband coefficients (45 at 6 bits) are transmitted. At the receiver the baseband is decoded and duplicated into the upper frequency ranged. The normalization factors are applied onto the spectrum to restore the original shape. Specifically, in the receiver, the inverse Fourier Transform Inputs 0 to 2 and 93 to 96 are set to zero. The normalized complex coefficients for Inputs 3 to 47 are reconstructed from the quantizer codes by table lookup. They are duplicated into Positions 48 to 92. This duplication is the nonlinear regeneration step. The scale factors for the subregions and larger regions are then applied.

[0041] The inverse transform is computed in unit 36. The effects of the windowing are removed by adding the last 12 points of the previous inverse transform to the first 12 points from the current inverse transform. The speech now passes through filter 38, which is an inverse to the pre-emphasis filter and which attenuates the high frequencies, removing the effects of the treble boost and reducing high-frequency quantization noise. The outputs are converted to analog with a 12-bit linear analog to digital converter 40.

[0042] The baseband which is repeated in the spectrum reconstruction has been described as being a band of lower frequencies. However, the baseband may include any range of frequencies within the spectrum. For some sounds where higher energy levels are found in the higher frequencies, a baseband of the higher frequencies is preferred.

[0043] It should be noted that the baseband suffers degradations only from quantization errors. The reconstruction of the upper frequencies is only as good as the model and the shaping information. However, by ensuring that at least some coefficient in each 165-Hz band of the normalized baseband is at full scale, each formant is excited at approximately the right frequency. This is an improvement over baseband residual excitation in which some parts of the spectrum may have too little energy. The reduction in computational complexity due to peak finding and scaling instead of linear prediction analysis and filtering is very significant.

[0044] This approach is a wideband approach in that the entire voice frequency range is coded. The major problem with other wideband systems at 16 kb/s is that there are barely enough bits available to give a rough description of the waveform. Baseband excitation systems such as the present system meet that problem by devoting most of the bits to the baseband and regenerating the excitation signal for higher frequencies. In a modification of the subband transform coding just described, one could code the baseband as described above, but code only some measure of energy for the higher frequencies. Frequency translation of the baseband regenerates the fine structure of the upper spectrum.

[0045] While the invention has been particularly shown and described with reference to a preferred embodiment thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention as defined by the appended claims.


Claims

1. A speech encoder comprising:
Fourier transform means (28) for performing a discrete Fourier transform of an incoming speech signal to generate a discrete transform spectrum of coefficients;
normalizing means (30) for modifying the transform spectrum to provide a normalized, flatter spectrum and for encoding a function by which the discrete spectrum is modified; and
means (30) for encoding at least a portion of the spectrum.
characterized in that
said normalizing means (30) comprises means (44) for defining the approximate envelope of the discrete spectrum in each of a plurality of subbands of coefficients and for encoding the defined envelope of each subband of coefficients and means for scaling each spectrum coefficient relative to the defined envelope of the respective subband of coefficients; and
said means (30) for encoding encodes the scaled spectrum coefficients within each subband in a number of bits determined by the defined envelope of the subband.
 
2. A speech coding system as claimed in claim 1 wherein the number of bits determined for a plurality of subbands is zero such that the scaled coefficients for those subbands are not transmitted.
 
3. A speech coding system as claimed in claim 2 wherein the scale coefficients of different subbands are encoded in different numbers of bits other than zero.
 
4. A speech encoding system as claimed in claim 2 wherein the encoded speech is decoded by replicating subbands of transmitted coefficients as substitutes for subbands of nontransmitted coefficients, the transmitted coefficients being replicated such that the nth subband which is transmitted is replicated as the nth subband which is not transmitted.
 
5. A speech coding system as claimed in claim 1 wherein the coefficients of different subbands are encoded in different numbers of bits other than zero.
 
6. A speech coding system as claimed in claim 1 wherein:
the means (30) for encoding encodes the scaled coefficients of less than all of the subbands, the encoded scaled coefficients being those corresponding to the defined envelopes of greater magnitude, with the scaled coefficients of subbands corresponding to defined envelopes of greatest magnitudes being encoded in more bits than coefficients of subbands corresponding to defined envelopes of lesser magnitudes.
 
7. A speech coding system as claimed in claim 6 wherein the encoded speech is decoded by replicating subbands of transmitted coefficients as substitutes for subbands of nontransmitted coefficients, the transmitted coefficients being replicated such that the nth subband which is transmitted is replicated as the nth subband which is not transmitted.
 
8. A speech coding system as claimed in claim 6 wherein the transform means (28) performs a discrete Fourier transform.
 
9. A speech coding system as claimed in claim 1 wherein the normalizing means comprises:
means (44) for determining the maximum magnitude of the discrete spectrum within each of a plurality of regions of the specturm; and
means for digitally encoding the maximum magnitude of each region; and
means (45) for scaling each coefficient of the discrete spectrum in each region to the maximum magnitude of each region to provide a first set of normalized coefficients.
 
10. A speech coding system as claimed in claim 9 wherein the normalizing means further comprises:
   means for determining the maximum magnitude of the first set of normalized outputs in each of a plurality of subregions of the spectrum;
   means for digitally encoding the maximum magnitude of each subregion; and
   means for scaling each output of the first set of normalized outputs to the maximum magnitude of each subregion to provide a second set of normalized outputs.
 
11. A speech encoder as claimed in Claim 10 wherein each of the maximum magnitudes is logarithmically encoded.
 
12. A speech encoder as claimed in Claim 10 wherein the maximum magnitude is determined for each of four regions corresponding to the first four formants.
 
13. A speech encoder as claimed in Claim 10 wherein only a baseband of the normalized spectrum is encoded.
 
14. A method of encoding speech comprising:
   performing a discrete Fourier transform of a window of speech to generate a discrete transform spectrum;
   providing a normalized spectrum by defining at least one curve approximating the magnitude of the discrete spectrum, digitally encoding the defined curve and defining the discrete spectrum relative to the defined curve; and
   encoding at least a portion of the normalized spectrum,
   characterized in that
   the normlized spectrum is provided by defining the approximate envelope of the discrete spectrum in each of a plurality of subbands of coefficients and digitally encoding the defined envelope of each subband of coefficients and scaling each coefficient relative to the defined magnitude of the respective subband of coefficients; and
   the scaled coefficients within each subband are encoded into a number of bits determined by the defined envelope of the subband.
 
15. The method as claimed in Claim 14 wherein the number of bits determined for a plurality of subbands is zero such that the scaled coefficients for those subbands are not transmitted.
 
16. The method as claimed in Claim 15 wherein the scaled coefficients of different subbands are encoded in different numbers of bits other than zero.
 
17. The method as claimed in Claim 15 wherein the encoded speech is decoded by replicating subbands of transmitted coefficients as substitutes for subbands of nontransmitted coefficients, the transmitted coefficients being replicated such that the nth subband which is transmitted is replicated as the nth subband which is not transmitted.
 
18. A method as claimed in Claim 14 wherein the normalized spectrum is provided by;
   determining a maximum magnitude of the discrete spectrum within each of a plurality of regions of the spectrum;
   digitally encoding the maximum magnitude of each region; and
   scaling each coefficient of the discrete spectrum in each region to the maximum magnitude of each region to provide a set of normalized coefficients.
 


Ansprüche

1. Sprachcodierer, enthaltend:
eine Fourier-Transformationseinrichtung (28) zur Ausführung einer diskreten Fourier-Transformation eines ankommenden Sprachsignals zur Erzeugung eines diskreten Transformationsspektrums von Koeffizienten;
eine Normierungseinrichtung (30) zum Modifizieren des Transformationsspektrums zur Erzeugung eines normierten, flacheren Spektrums und zum Codieren einer Funktion, durch die das diskrete Spektrum modifiziert wird; und
eine Einrichtung (30) zum Codieren wenigstens eines Teils des Spektrums,
dadurch gekennzeichnet, daß
die Normierungseinrichtung (30) eine Einrichtung (44) zum Definieren der approximierten Einhüllenden des diskreten Spektrums in jedem von mehreren Unterbändern von Koeffizienten und zum Codieren der definierten Einhüllenden eines jedes Unterbandes von Koeffizienten und Einrichtungen zum Skalieren jedes Spektrumkoeffizienten relativ zur definierten Einhüllenden des betreffenden Unterbandes von Koeffizienten aufweist; und
die Einrichtung (30) zum Codieren, die die skalierten Spektrumkoeffizienten innerhalb jedes Unterbandes in eine Anzahl von Bits codiert, die durch die definierte Einhüllende des Unterbandes bestimmt wird.
 
2. Sprachcodiersystem nach Anspruch 1, bei der die Anzahl von Bits, die für mehrere Unterbänder bestimmt wird, gleich Null ist, so daß die skalierten Koeffizienten für jene Unterbänder nicht übertragen werden.
 
3. Sprachcodiersystem nach Anspruch 2, bei dem die Skalierkoeffizienten verschiedener Unterbänder in verschiedene Anzahlen von Bits codiert werden, die von Null verschieden sind.
 
4. Sprachcodiersystem nach Anspruch 2, bei dem die codierte Sprache durch Wiederholung von Unterbändern übertragener Koeffizienten als Ersatz für Unterbänder nicht-übertragener Koeffizienten decodiert wird, wobei die übertragenen Koeffizienten derart wiederholt werden, daß das übertragene n-te Unterband als das nicht-übertragene n-te Unterband wiederholt wird.
 
5. Sprachcodiersystem nach Anspruch 1, bei dem die Koeffizienten unterschiedlicher Unterbänder in verschiedene Anzahlen von Bits codiert werden, die ungleich Null sind.
 
6. Sprachcodiersystem nach Anspruch 1, bei dem die Einrichtungen (30) zum Codieren die skalierten Koeffizienten von weniger als allen Unterbändern codieren, wobei die codierten skalierten Koeffizienten jene sind, die den definierten Einhüllenden größerer Amplitude entsprechen, wobei die skalierten Koeffizienten von Unterbändern, die den definierten Einhüllenden größter Amplituden entsprechen, in mehr Bits codiert werden als die Koeffizienten von Unterbändern, die den definierten Einhüllenden kleinerer Amplituden entsprechen.
 
7. Sprachcodiersystem nach Anspruch 6, bei dem die codierte Sprache durch Wiederholen von Unterbändern übertragener Koeffizienten als Ersatz für Unterbänder nicht-übertragener Koeffizienten decodiert werden, wobei die übertragenen Koeffizienten derart wiederholt werden, daß das übertragene n-te Unterband als das nicht-übertragene n-te Unterband wiederholt wird.
 
8. Sprachcodiersystem nach Anspruch 6, bei dem die Transformatinseinrichtung (28) eine diskrete Fouriertransformation ausführt.
 
9. Sprachcodiersystem nach Anspruch 1, bei dem die Normierungseinrichtung enthält:
eine Einrichtung (44) zur Bestimmung der maximalen Amplitude des diskreten Spektrums innerhalb jedes an mehreren Bereichen des Spektrums; und
eine Einrichtung zum digitalen Codieren der maximalen Amplitude jedes Bereichs; und
eine Einrichtung (45) zum Skalieren jedes Koeffizienten des diskreten Spektrums in jedem Bereich auf die maximale Amplitude eines jeden Bereiches zur Erzeugung eines ersten Satzes normierter Koeffizienten.
 
10. Sprachcodiersystem nach Anspruch 9, bei dem die Normierungseinrichtung weiterhin enthält:
Einrichtung zum Bestimmen der maximalen Amplitude des ersten Satzes normierter Ausgänge in jedem von mehreren Unterbereichen des Spektrums;
Einrichtung zum digitalen Codieren der maximalen Amplitude jedes Unterbereichs; und
Einrichtung zum Skalieren jedes Ausgangs des ersten Satzes normierter Ausgänge zur Maximalamplitude jedes Unterbereiches zur Erzeugung eines zweiten Satzes normierter Ausgänge.
 
11. Sprachcodierer nach Anspruch 10, bei dem jede der maximalen Amplituden logarithmisch codiert wird.
 
12. Sprachcodierer nach Anspruch 10, bei dem die maximale Amplitude für jeden von vier Bereichen entsprechend der ersten vier Formate bestimmt wird.
 
13. Sprachcodierer nach Anspruch 10, bei dem nur ein Basisband des normtierten Spektrums codiert wird.
 
14. Verfahren zur Sprachcodierung, enthaltend:
Ausführen einer diskreten Fouriertransformation eines Sprachfensters zur Erzeugung eines diskreten Transformationsspektrums;
Erzeugen eines normierten Spektrums durch Definition wenigstens einer Kurve, die die Amplitude des diskreten Spektrums approximiert, digitales Codieren der definierten Kurve und Definition des diskreten Spektrums bezüglich der definierten Kurve; und
Codieren wenigstens eines Teils des normierten Spektrums,
dadurch gekennzeichnet, daß
das normierte Spektrum durch Definition der approximierten Einhüllenden des diskreten Spektrums in jedem von mehreren Unterbändern von Koeffizienten und digitales Codieren der definierten Einhüllenden jedes Unterbandes von Koeffizienten und Skalieren jedes Koeffizienten bezüglich der definierten Amplitude des betreffenden Unterbandes von Koeffizienten erzeugt wird; und
die skalierten Koeffizienten innerhalb jedes Unterbandes in eine Anzahl von Bits codiert werden, die durch die definierte Einhüllende des Unterbandes bestimmt wird.
 
15. Verfahren nach Anspruch 14, bei dem die Anzahl von Bits, die für mehrere Unterbänder bestimmt wird, gleich Null ist, so daß die skalierten Koeffizienten für jene Unterbänder nicht übertragen werden.
 
16. Verfahren nach Anspruch 15, bei dem die skalierten Koeffizienten verschiedener Unterbänder in verschiedene Anzahlen von Bits codiert werden, die ungleich Null sind.
 
17. Verfahren nach Anspruch 15, bei dem die codierte Sprache durch Wiederholen von Unterbändern übertragener Koeffizienten als Ersatz für Unterbänder nicht-übertragener Koeffizienten decodiert wird, wobei die übertragenen Koeffizienten derart wiederholt werden, daß das übertragene n-te Unterband durch das nicht-übertragene n-te Unterband wiederholt wird.
 
18. Verfahren nach Anspruch 14, bei dem das normierte Spektrum erzeugt wird durch:
Bestimmen einer Maximalamplitude des diskreten Spektrums innerhalb jedes von mehreren Bereichen des Spektrums;
digitales Codieren der Maximalamplitude jedes Bereiches; und
Skalieren jedes Koeffizienten des diskreten Spektrums in jedem Bereich zur Maximalamplitude jedes Bereiches zur Erzeugung eines Satzes normierter Koeffizienten.
 


Revendications

1. Codeur de la parole comprenant :
   un moyen de transformation de Fourier (28) assurant une transformation discrète de Fourier d'un signal de parole entrant pour engendrer un spectre transformé discret de coefficients;
   un moyen de normalisation (30) pour modifier le spectre transforme pour obtenir un spectre normalisé plus plat et pour coder une fonction par laquelle le spectre discret est modifié; et
   un moyen (30) pour coder au moins une partie du spectre,
caractérisé en ce que
   le dit moyen de normalisation (30) comprend un moyen (44) pour définir l'enveloppe approximée du spectre discret dans chacune d'une pluralité de sous-bandes de coefficients et pour coder l'enveloppe définie de chaque sous-bande de coefficients et un moyen pour établir chaque coefficient du spectre par rapport à l'enveloppe définie de la sous-bande respective de coefficients; et
   le dit moyen (30) pour coder code les coefficients établis du spectre à l'intérieur de chaque sous-bande dans un nombre de binons déterminé par l'enveloppe définie de la sous-bande.
 
2. Système de codage de la parole selon la revendication 1 dans lequel le nombre déterminé de binons pour une pluralité de sous-bandes est zéro, de telle façon que les coefficients établis pour ces sous-bandes ne soient pas transmis.
 
3. Système de codage de la parole selon la revendication 2 dans lequel les coefficients établis de différentes sous-bandes sont codés en différents nombres de binons autres que zéro.
 
4. Système de codage de la parole selon la revendication 2 dans lequel la parole codée est codée en copiant des sous-bandes des coefficients transmis en tant que substituts pour les sous-bandes de coefficients non-transmis, les coefficients transmis étant copiés de telle façon que la nième sous-bandes qui est transmisé soit copiée en tant que nième sous-bande qui n'est pas transmisé.
 
5. Système de codage de la parole selon la revendication 1 dans lequel les coefficients des différentes sous-bandes sont codés en différents nombres de binons autres que zéro.
 
6. Système de codage de la parole selon la revendication 1 dans lequel :
   le moyen de codage (30) code les coefficients établis de moins de toutes les sous-bandes, les coefficients établis codes étant ceux correspondant aux enveloppes définies de plus grande amplitude, les coefficients établis des sous-bandes correspondant aux enveloppes définies de plus grande amplitude étant codés en plus de binons que les coefficients des sous-bandes correspondant aux enveloppes définies d'amplitude plus petite.
 
7. Système de codage de la parole selon la revendication 6 dans lequel la parole codée est décodée en copiant des sous-bandes de coefficients transmis en tant que substituts pour des sous-bandes de coefficients non-transmis, les coefficients transmis étant copiés de telle manière que la nième sous-bande qui est transmisé soit copiée en tant que nième sous-bande qui n'est pas transmise.
 
8. Système de codage de la parole selon la revendication 6 dans lequel le moyen de transformation (28) réalise une transformation discrète de Fourier.
 
9. Système de codage de la parole selon la revendication 1 dans lequel le moyen de normalisation comprend :
   un moyen (44) pour déterminer l'amplitude maximale du spectre discret à l'intérieur de chacune d'une pluralité de régions du spectre; et
   un moyen pour coder en numérique l'amplitude maximale de chaque région; et
   un moyen (45) pour établir chaque coefficient du spectre discret dans chaque région par rapport à l'amplitude maximale de chaque région pour obtenir un premier ensemble de coefficients normalisés.
 
10. Système de codage de la parole selon la revendication 9 dans lequel le moyen de normalisation comprend, en outre :
   un moyen pour déterminer l'amplitude maximale du premier ensemble de sorties normalisées dans chacune d'une pluralité de sous-régions du spectre;
   un moyen pour coder en numérique l'amplitude maximale de chaque sous-région; et
   un moyen pour établir chaque sortie du premier ensemble de sorties normalisées par rapport à l'amplitude maximale de chaque sous-région pour obtenir un deuxième sous-ensemble de sorties normalisées.
 
11. Codeur de la parole selon la revendication 10 dans lequel chacune des amplitudes maximales est codée de façon logarithmique.
 
12. Codeur de la parole selon la revendation 10 dans lequel l'amplitude maximale est déterminée pour chacune des quatre régions correspondantes aux premiers quatre formants.
 
13. Système de codage de la parole selon la revendication 10 dans lequel seule une bande de base du spectre normalisé est codée.
 
14. Procédé de codage de la parole comprenant les étapes suivantes :
   réalisation d'une transformation discrète de Fourier d'une fenêtre de parole pour engendrer un spectre transforme discret;
   obtention d'un spectre normalisé en définissant au moins une courbe approximant l'amplitude du spectre discret, en codant en numérique la courbe définie et en définissant le spectre discret par rapport à la courbe définie; et
   codage d'au moins une partie du spectre normalisé.
caractérisé en ce que
   le spectre normalisé est obtenu en définissant l'enveloppe approximée du spectre discret dans chacune d'une pluralité de sous-bandes de coefficients et en codant en numérique l'enveloppe définie de chaque sous-bande de coefficients et en établissant chaque coefficient par rapport à l'amplitude définie de la sous-bande respective de coefficients; et
   les coefficients établis à l'intérieur de chaque sous-bande sont codés en un nombre de binons déterminé par l'enveloppe définie de la sous-bande.
 
15. Procédé selon la revendication 14 dans lequel le nombre de binons déterminé pour une pluralité de sous-bandes est zéro, de telle façon que les coefficients établis pour ces sous-bandes ne soient pas transmis.
 
16. Procédé selon la revendication 15 dans lequel les coefficients établis de différentes sous-bandes sont codés en différents nombres de binons autres que zéro.
 
17. Procédé selon la revendication 15 dans lequel la parole codée est décodée par copie de sous-bandes de coefficients transmis en tant que substituts pour des sous-bandes de coefficients non-transmis, les coefficients transmis étant copiés de telle façon que la nième sous-bande qui est transmisé soit copiée en tant que nième sous-bande qui n'est pas transmisé.
 
18. Procédé selon la revendication 14 dans lequel le spectre normalisé est obtenu en :
   déterminant une amplitude maximale du spectre discret à l'intérieur de chacune d une pluralité de régions du spectre;
   codant en numérique l'amplitude maximale de chaque région; et
   établissant chaque coefficient du spectre discret dans chaque région par rapport à l'amplitude maximale de chaque région pour déterminer un ensemble de coefficients normalisés.
 




Drawing