System for speech coding and apparatus for the same

(19)

(11)

EP 0 405 548 A2

(12)	EUROPEAN PATENT APPLICATION

(43)	Date of publication:
	02.01.1991 Bulletin 1991/01

(21)	Application number: 90112351.3

(22)	Date of filing: 28.06.1990

(51)	International Patent Classification (IPC)⁵: G10L 9/14

(84)	Designated Contracting States:
	DE FR GB

(30)

Priority:

28.06.1989 JP 166180/89
30.06.1989 JP 168645/89
27.07.1989 JP 195302/89

(71)	Applicant: FUJITSU LIMITED
	Kawasaki-shi, Kanagawa 211 (JP)

(72)	Inventors:
	Taniguchi, Tomohiko Kohoku-ku, Yokohama-shi, Kanagawa 223 (JP) Tanaka, Yoshinori Kawasaki-shi, Kanagawa 211 (JP) Ohta, Yasuji Kohoku-ku, Yokohama-shi, Kanagawa 223 (JP) Amano, Fumio Setagaya-ku, Tokyo 158 (JP) Unagami, Shigeyuki Atsugi-shi, Kanagawa 243-01 (JP) Sasama, Akira Fuji-shi, Shizuoka 417 (JP)

(74)	Representative: Lehn, Werner, Dipl.-Ing. et al
	Hoffmann, Eitle & Partner, Patentanwälte, Postfach 81 04 20 81904 München 81904 München (DE)

(56)

References cited: :

(54)	System for speech coding and apparatus for the same

(57) A CELP type of speech signal coding system, wherein a code vector obtained by applying linear prediction to a vector of a residual speech signal of white noise stored in a code book and a pitch prediction vector obtained by applying linear prediction to a residual signal of a preceding frame given a delay corresponding to a pitch frequency are added, use is made of an impulse vector obtained by applying linear prediction to a residual signal vector of impulses having a predetermined relationship with the vectors of the white noise code book, variable gains are given to at least the above code vector and impulse vector, a reproduced signal is produced, and this reproduced signal is used for identification of the input speech signal, thus enabling the creation of a pulse series corresponding to the sound source of voiced speech sounds, enabling accurate evaluation and identification of a pulse-like sound source of voiced speech sounds and enabling improvement of the quality of the reproduced speech while reducing the amount of information transmitted.

Description

BACKGROUND OF THE INVENTION

1. Field of the Invention

[0001] The present invention relates to a system for speech coding and an apparatus for the same, more particularly relates to a system for high quality speech coding and an apparatus for the same using vector quantization for data compression of speech signals.

2. Description of the Related Art

[0002] In recent years, use has been made of vector quantization for maintaining the quality and compressing the data of speech signals in intra company communication systems, digital mobile radio systems, etc. The vector quantization system is a well known one in which predictive filtering is applied to the signal vectors of a code book to prepare reproduced signals and the error powers between the reproduced signals and an input speech signal are evaluated to determine the index of the signal vector with the smallest error. There is rising demand, however, for a more advanced method of vector quantization so as to further compress the speech data.

[0003] Figure 1 shows an example of a system for high quality speech coding using vector quantization. This system is known as the code excited LPC (CELP) system. In this, a code book 10 is preset with 2^m patterns of residual signal vectors produced using N samples of white noise signal which corresponds to N dimensional vector (in this case, shape vectors showing the phase, hereinafter referred to simply as vectors). The vectors are normalized so that the power of N samples (N being, for example 40) becomes a fixed value.

[0004] Vectors read out from the code book 10 by the command of the evaluating circuit 16 are given a gain by a multiplier unit 11, then converted to reproduced signals through two adaptive prediction units, i.e., a pitch prediction unit 12 which eliminates the long term correlation of the speech signals and a linear prediction unit 13 which eliminates the short term correlation of the same.

[0005] The reproduced signals are compared with digital speech signals of the N samples input from a terminal 15 in a subtractor 14 and the errors are evaluated by the evaluating circuit 16.

[0006] The evaluating circuit 16 selects the vector of the code book 10 giving the smallest power of the error and determines the gain of the multiplier unit 11 and a pitch prediction coefficient of the pitch prediction unit 12.

[0007] Further, as shown in Fig. 2, the linear prediction unit 13 uses the linear prediction coefficient found from the current frame sample values by a linear prediction analysis unit 18 in a linear difference equation as filter tap coefficients. The pitch prediction unit 12 uses the pitch prediction coefficient and pitch frequency of the input speech signal found by a pitch prediction analysis unit 31 through a reverse linear prediction filter 30 as filter parameters.

[0008] The index of the optimum vector in the code book 10, the gain of the multiplier unit 11, and the parameters for constituting the prediction units (pitch frequency, pitch prediction coefficient, and linear prediction coefficient) are multiplexed by a multiplexer circuit 17 and become coded information.

[0009] The pitch period of the pitch prediction unit 12, is, for example, 40 to 167 samples, and each of the possible pitch periods is evaluated and the optimum period and the optimum period is chosen. Further, the transmission function of the linear prediction unit 13 is determined by linear predictive coding (LPC) analysis of the input speech signal. Finally, the evaluating circuit 16 searches through the code book 10 and determines the index giving the smallest error power between the input speech signal and residual signal. The index of the code book 10 which is determined, that is, the phase of the residual vector, the gain of the multiplier unit 11, that is, the amplitude of the residual vector, the frequency and coefficient of the pitch prediction unit 12, and the coefficients of the linear prediction unit 13 are transmitted multiplexed by the multiplexer circuit 17.

[0010] On the decoder side, a vector is read out from a code book 20 having the same construction as the code book 10, in accordance with the index, gain, and prediction unit parameters obtained by demultiplexing by the demultiplexer circuit 19 and is given a gain by a multiplier unit 21, then a reproduced speech signal is obtained by prediction by the prediction units 22 and 23.

[0011] In such a CELP system, as the means for producing the speech signal, use is made of the code book 10 comprised of white noise and the pitch prediction unit 12 for giving periodicity at the pitch frequencies, but the decision on the phase of the code book 10, the gain (amplitude) of the multiplier unit 11, and the pitch frequency (phase) and pitch prediction coefficient (amplitude) of the prediction unit 12 is made equivalently as shown in Fig. 3.

[0012] That is, the processing for reproducing the vector of the code book 10 by the pitch prediction unit and linear prediction units for identification of the input signal, considered in terms of the vectors, may be considered processing for the identification, by subtraction and evaluation by a subtractor 50, of a target vector X obtained by removing from the input signal S of one frame input from a terminal 40, by a subtractor 41, the effects of the previous frame S₀ stored in a previous frame storage 42, with a vector X′ obtained by adding by an adder 49 a code vector gC obtained by applying linear prediction to a vector selected from a code book 10 by a linear prediction unit 44 (corresponding to the linear prediction unit 13 of Fig. 1) and giving a gain g to the resultant vector C by a multiplier unit 45 and a pitch prediction vector bP obtained by applying linear prediction by a linear prediction unit 47 to a residual signal of the previous frame given a delay corresponding to a pitch frequency from a pitch frequency delay unit 46 (corresponding to the pitch frequency analyzed by the pitch prediction analysis unit 31 of Fig. 1) and giving a gain b (corresponding to the pitch prediction coefficient analyzed by the pitch prediction unit 31 of Fig. 1) to the resultant vector P.

[0013] When the phase C of the code vector and the phase P of the pitch prediction vector are given, the amplitude g of the code vector and the amplitude b of the pitch prediction vector which, as shown in Fig. 4, satisfy the condition that the value of the error power |E|² partially differentiated by b and g by the following equation (1) is 0 so as to give the minimum error signal power, that is, satisfy
∂|E|²/∂b = 0, ∂|E|²/∂g = 0
may be found from the following equations (2) and (3) for all combinations of the phases (C,P) of the two vectors and thereby the set of the most optimal amplitudes and phases (g, b, C, P) sought:
|E|² = |X - bP -gC|² (1)
b = ((C,C)(X,P)-(C,P)(X,C))/Δ (2)
g = ((P,P)(X,C)-(C,P)(X,P)}/Δ (3)
where,
Δ = (P,P)(C,C)-(C,P)(C,P)}
and (,) indicates the scalar product of the vector.

[0014] Here, speech signals include voiced speech sounds and unvoiced speech sounds which are characterized in that the respective drive source signals sound sources) are periodic pulses or white noise with no periodicity.

[0015] In the CELP system, explained above as a conventional system, pitch prediction and linear prediction were applied to the vectors of the code book comprised of white noise as a sound source and the pitch periodicity of the voiced speech sounds was created by the pitch prediction unit 12.

[0016] Therefore, while the characteristics were good when the sound source signal was a white noise-like unvoiced speech sound, the pitch periodicity generated by the pitch prediction unit was created by giving a delay to the past sound source series by pitch prediction analysis, and the past sound source series was series of white noise originally obtained by reading code vectors from a code book, therefore, it was difficult to create a pulse series corresponding to the sound source of a voiced speech sound. This was a problem in that in the transitional state from an unvoiced speech sound to a voiced speech sound, the effect of this was large and high frequency noise was included in the reproduced speech, resulting in a deterioration of the quality.

SUMMARY OF THE INVENTION

[0017] Therefore, the present invention has as its object, in a CELP type speech coding system and apparatus wherein a gain is given to a code vector obtained by applying linear prediction to white noise of a code book and a pitch prediction vector obtained by applying linear prediction to a residual signal of a preceding frame given a delay corresponding to the pitch frequency, a reproduced signal is generated from the same, and the reproduced signal is used to identify the input speech signal, the creation of a pulse series corresponding to the sound source of a voiced speech sound and the accurate identification and coding for even a pulse-like sound source of a voiced speech sound so as to improve the quality of the reproduced speech.

[0018] To achieve the above object, there is provided, according to one technical aspect of the present invention, a system for speech coding of the CELP type wherein a reproduced signal is generated from a code vector obtained by applying linear prediction to a vector of a residual signal of white noise of a code book and a pitch prediction vector obtained by applying linear prediction to a residual signal of a preceding frame given a delay corresponding to a pitch frequency, the error between the reproduced signal and an input speech signal is evaluated, the vector giving the smallest error is sought, and the input speech signal is encoded accordingly, the system for speech coding characterized in that in addition to the code vector and pitch prediction vector, use is made of a residual signal vector of an impulse having a predetermined relationship with the vectors of the white noise code book, variable gains are given to at least the code vector and an impulse vector obtained by applying linear prediction to the vector of the residual signal of the impulse, then the vectors are added to form a reproduced signal and the reproduced signal is used to identify the input speech signal.

[0019] Further, there is provided, according to another technical aspect of the present invention, an apparatus for speech coding characterized by being provided with a pitch frequency delay circuit giving a delay corresponding to a pitch frequency to a vector of a preceding residual signal, a first code book storing a plurality of vectors of residual signals of white noise, an impulse generating circuit generating an impulse having a predetermined relationship with the vectors of the residual signals of the white noise stored in the first code book, linear prediction circuits connected to the pitch frequency delay circuit, the first code book, and the impulse generating circuit, a variable gain circuit for giving a variable gain to vectors output from the linear prediction circuits connected to at least the first code book and the impulse generating circuit, a first addition circuit for adding the outputs of the variable gain circuit and producing a reproduced composite vector, an input speech signal input unit, a second addition circuit for adding the reproduced composite vector and the vector of the input speech signal, and an evaluating circuit for evaluating the output of the second addition circuit and identifying the input speech signal from the vector of the reproduced signal.

BRIEF DESCRIPTION OF THE DRAWINGS

[0020]

Figures 1 and 2 are block diagrams for explaining an example of a speech coding system of the related art;

Figs. 3 and 4 are views for explaining the method of analysis in the system of the related art;

Fig. 5 is a block diagram of an embodiment of the system of the present invention;

Fig. 6 is a circuit diagram for realization of the embodiment shown in Fig. 5;

Fig. 7 is a view showing the method of analysis according to the system of the present invention;

Fig. 8 is a block diagram of part of another embodiment of the system of the present invention;

Fig. 9 is a view showing signals of various portions of Fig. 8;

Fig. 10 is a circuit diagram showing another embodiment of the present invention;

Fig. 11 is a block diagram of the other embodiment of the present invention shown in Fig. 10;

Fig. 12 is a view of an example of a main element pulse position detecting circuit used in the other embodiment of the present invention shown in Fig. 10;

Fig. 13 is a block diagram showing another embodiment of the present invention;

Fig. 14 is a view showing signals of various portions in Fig. 13;

Figs. 15(A) and (B) are views for explaining the method of calculation of the pitch correlation of the embodiment of Fig. 13;

Fig. 16 is a view showing an example of the circuit for realizing the other embodiment of the present invention shown in Fig. 13; and

Fig. 17 is a view showing the method of analysis in the other embodiment of the present invention shown in Fig. 13.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0021] Embodiments of the speech coding system and the speech coding apparatus of the present invention will be explained in detail below while referring to the appended drawings.

[0022] The basic constitution of the speech coding system of the present invention, as mentioned above, is that of a conventionally known CELP type speech coding system wherein in addition to the code vector and pitch prediction vector, use is made of a residual signal vector of an impulse having a predetermined relationship with the vectors of the white noise code book, variable gains are given to at least the code vector and an impulse vector obtained by applying linear prediction to the vector of the residual signal of the impulse, then the vectors are added to form a reproduced signal and the reproduced signal is used to identify the input speech signal.

[0023] That is, the present invention is constituted by a conventionally known system wherein a synchronous pulse serving as a sound source for voiced speech sounds is introduced and a pulse-like sound source of voiced speech sounds is created by the use of a residual signal vector of an impulse having a predetermined relationship with the vectors of the white noise code book. By this, in the present invention, the vector of the residual signal of the white noise and the vector of the residual signal of the impulse are added while varying the amplitude components of the two vectors so as to reproduce a composite vector, so it is possible to accurately identify and code not only the white noise-like sound source of unvoiced speech sounds, but also the periodic pulse series sound source of voiced speech sounds and thereby to improve the quality of the reproduced signal.

[0024] The residual signal vector of the impulse used in the present invention may be an impulse vector having a predetermined relationship with the residual vectors of white noise stored in the first code book 10, specifically, may be one corresponding to one residual vector of white noise stored in the first code book. Further, the one impulse vector may be one corresponding to one of the predetermined sample positions, i.e., predetermined pulse positions, of a white noise residual vector in the first code book. More specifically, as mentioned later, the impulse vector may be one corresponding to a main element pulse position in the white noise residual vector or, as a simpler method, the impulse vector may be one corresponding to the maximum amplitude pulse position of the white noise residual vector. The impulse residual vector used in the present invention may be one formed by separation from a white noise residual vector stored in the first code book. Further, for that purpose, use may be made of a second code book for storing command information for separating this from the white noise residual vector stored in the first code book. Also, the second code book may store preformed impulse vectors.

[0025] Therefore, the second code book preferably is of the same size as the first code book.

[0026] Figure 5 is a block diagram of an embodiment of a speech coding system of the present invention. In the figure, portions the same as in Fig. 1 are given the same reference numerals and explanations of the same are omitted.

[0027] Figure 5 shows the constitution of the transmission side. In the code book 10 are stored 2^m patterns of N dimensional vectors of residual signals formed by white noise, as in the past. In the code book 60 are stored N patterns of N dimensional vectors of residual signals of impulses shifted successively in phase.

[0028] The impulse vectors from the code book 60 are supplied through a multiplier unit 61 to an adder 62 where they are added with vectors of white noise supplied from the code book 10 through an adder 11 and the result is supplied to a pitch prediction unit 12. An evaluating circuit 16 searches through the code books 10 and 60 and determines the vector giving the smallest error signal power between the input speech signal and the reproduced signal from the linear prediction unit 13. The index of the code book 10 decided on, that is, the phase-1 of the residual vector of the white noise, the index of the code book 60, that is, the phase-2 of the residual vector of the impulse, and the gains of the multiplier units 11 and 61, i.e., the amplitude-1 and amplitude-2 of the residual vectors, the frequency and coefficient of the pitch prediction unit 12 as in the past, and the coefficient of the linear prediction unit 13 are transmitted multiplexed by a multiplexer circuit 65.

[0029] On the receiving side, the transmitted multiplexed signal is demultiplexed by the demultiplexer circuit 66. Code books 20 and 70 have the same constitutions as the code books 10 and 60. From the code books 20 and 70 are read out the vectors indicated by the indexes (phase-1 and phase-2). These are passed through the multiplier units 21 and 71, then added by the adder 72 and reproduced by the pitch prediction unit 22 and further the linear prediction unit 23.

[0030] Further, while not shown in the embodiment, in the same way as in Fig. 2, use is made of a linear prediction analysis unit 18, reverse linear prediction unit filter 30, and pitch prediction analysis unit 31, of course.

[0031] Figure 6 shows an example of the circuit constitution for realizing the above embodiment according to the speech coding system of the present invention. In Fig. 6, portions the same as in Fig. 3 are given the same reference numerals and explanations thereof are omitted.

[0032] In Fig. 6, a vector of a residual signal of white noise from a first code book 43 is subjected to prediction by a linear prediction unit 44 and multiplied with a gain g: by a multiplier unit 45, one example of a variable gain circuit, to obtain a white noise code vectors g₁C₁. Further, the vectors of residual signals of impulses from a second code book 80 are subjected to prediction by a linear prediction unit 81 and multiplied by a gain g₂ by a multiplier unit 82, similarly an example of a variable gain circuit, to obtain an impulse code vector g₂C₂ The above-mentioned code vectors g₁C₁ and g₂C₂ and a pitch prediction vector bP output from a multiplier unit 48 are added by adders 49 and 83 to give a composite vector X˝. The error E between the composite vector X˝ output by the adder 83 and the target vector is evaluated by an evaluating circuit 51. Figure 7 illustrates the vector operation mentioned above.

[0033] At this time, the equation for evaluation of the error signal power |E|² is expressed by equation (4). The amplitude b of the pitch prediction vector and the amplitudes g₁ and g₂ of the code vectors giving the minimum such power are determined by equations (5), (6), and (7):
|E|² = |X-bP-g₁c₁-g₂c₂|² (4)
where,
∂|E|²/αb = 0
∂|E|²/αg₁ = 0
∂|E|²/αg₂ = 0

[0034] By this,
b = {(Z5 X Z6 X Z7 + Z2 X Z4 X Z9 + Z3 X Z4 X Z8) - (Z3 X Z5 X Z9 + Z4 X Z4 X Z7 + Z2 X Z6 X Z8)}/Δ (5)
g₁ = {(Z1 X Z6 X Z8 + Z3 X Z4 X Z7 + Z2 X Z3 X Z9) (Z3 X Z3 X Z8 + Z1 X Z4 X Z9 + Z2 X Z6 X Z7)}/Δ (6)
g₂ = ((Z1 X Z5 X Z9 + Z2 x Z3 X Z8 + Z2 X Z4 X Z7) - (Z3 X Z5 X Z7 + Z2 X Z2 X Z9 + Z1 X Z4 X Z8)}/Δ (7)
Δ = Z1 X Z5 X Z6 + 2 X Z2 X Z3 X Z4 - Z3 X Z3 X Z5 - Z1 X Z4 X Z4 - Z2 X Z2 X Z6
where,
Z1 = (P, P), Z2 = (P, C₁),
Z3 = (P, C₂), Z4 = (C₁, C₂),
Z5 = (C₁, C₁), Z6 = (C₂, C₂),
Z7 = (X, P), Z8 = (X, C₁),
Z9 = (X, C₂)

[0035] Therefore, to determine the most suitable code vector and pitch prediction vector, one may find the amplitudes g₁, g₂, and b by the equations (5), (6), and (7) for all the combinations of the phases C₁, C₂, and P of the three vectors and search for the set of the amplitudes and phases g₁, g₂, b, C₁, C₂, and P giving the smallest error signal power.

[0036] Here, the phase of the impulse code vector C₂ corresponds unconditionally to the phase of the white noise code vector C₁, so to determine the optimum drive source vector, one may find the b, g₁, and g₂ giving the value of 0 for the error power |E|² partially differentiated by b, g₁, and g₂ for all combinations of the phases (P,C₁) of the white noise code vector C₁ and the pitch prediction vector P and thereby find amplitudes (b, g₁, and g₂) by equations (5) to (7) and search for the set of amplitudes and phases (b, g₁, g₂ P, C₁) giving the smallest error signal power of equation (4).

[0037] In this way, it is possible to identify input speech signals by adding a periodic pulse serving as a sound source of voiced speech sounds missing in the white noise code book.

[0038] Figure 8 shows the case of establishment of an impulse vector at a pulse position showing the maximum amplitude in the white noise residual vector, with respect to the impulse vectors and the white noise residual vectors stored in the first code book in the present invention. In Fig. 8, the first code book 10 is provided with a table 90 with a common index i (corresponding to the second code book) and stores the position of the elements (sample) with the maximum amplitudes among the patterns of white noise vectors of the code book 10. The white noise vector and maximum amplitude position read out from the code book 10 and the table 90 respectively in accordance with the search pattern indexes entering from the evaluating circuit 16 through a terminal 91 are supplied to an impulse separating circuit 92 where, as shown in Fig. 9(A), just the maximum amplitude position sample is removed from the white noise vector. So, the white noise vector shown in (B) of the figure which has a plurality of amplitude values at each of the sampling position except the maximum amplitude value at the sampling position in which the maximum amplitude value was obtained and the amplitude value is shown as "0" at the sampling position, and the impulse shown in (C) of the figure which only has a maximum amplitude value at the sampling position and no other amplitude value is shown at any other remaining sampling position, are be generated and supplied respectively to the multiplier units 11 and 61, and the code book 60 thus eliminated. Of course, the same applies to the code books 20 and 70. In this case, the sum of the white noise vector and the impulse vector output by the impulse separating circuit 92 becomes the same as the original white noise vector of the code book 10, so when the amplitude ratio g₁/g₂ of the multiplier units 11 and 61 is "1", use may be made of the original white noise and when it is "0" use may be made of the complete impulse.

[0039] By so making the phase of the impulse vector correspond unconditionally to the white noise vectors, the need for transmission of the phase-2 of the impulse code vector is eliminated and the effect of data compression is increased.

[0040] Since the white noise vector and the impulse vector are added by varying the gain of the amplitudes of the respective elements, it is possible to accurately identify and code not only the white noise-like sound source of unvoiced speech sounds, but also the periodic pulse series sound source of voiced speech sound, a problem in the past, and thereby to vastly improve the quality of the reproduced speech.

[0041] In the embodiment of Fig. 6, the first addition circuit is formed by an adder 49 and an adder 83, but the first addition circuit may be formed by a single unit instead of the adders 49 and 83.

[0042] Next, another embodiment of the speech coding system of the present invention will be shown in Fig. 10.

[0043] In Fig. 6, provision was made of a code book comprised of fixed impulses generated in accordance with only predetermined pulse positions of the vectors in the code book 10, but even if the input speech signal is identified by adding the vector based on the fixed impulses to the conventional pitch prediction vector and white noise vector, the optimal identification cannot necessarily be performed. This is because, as shown in Fig. 6, since linear prediction is applied even to the impulse vector, there is a distortion in space.

[0044] Therefore, in the third embodiment, the principle of which is shown in Fig. 10, instead of using fixed impulse vectors, the phase difference between the white noise vector C₁ after application of linear prediction 44 and the vector obtained by applying linear prediction to the impulse by the main element pulse position detection circuit 90 is evaluated, whereby the position of the main element pulse is detected. The main element impulse is generated at this position by the impulse generating unit 91. The three vectors, i.e., the pitch prediction vector P, the white noise code vector C₁, and the main element impulse vector are added and the composite vector is used to identify the input speech signal S.

[0045] Further, even in the third embodiment, a search is made for the set of the amplitudes and phases (b, g₁, g₂, P, C₁) giving the smallest error signal power by equations (4) to (7).

[0046] Figure 11 is a block diagram of the third embodiment of the present invention. The third embodiment differs from the embodiment of Fig. 5 only in that it uses a main element pulse position detection circuit 110 instead of an impulse code book 60.

[0047] That is, the main element pulse position detection circuit 110 extracts the position of the main element pulse for the vectors of the white noise code book 10, the main element pulse generated at that position is multiplied by the gain (amplitude) component by the multiplier unit 61, one type of variable gain circuit, then is added to the white noise read out from the code book 10 as in the past and multiplied by the gain by the multiplier unit 11, also one type of variable gain circuit, and reproduction is performed by the pitch prediction unit 12 and the linear prediction unit 13.

[0048] Further, since the independent variable gains are multiplied with the white noise and the main element impulse, the coding information may be, like with Fig. 5, the white noise code index (phase) and gain (amplitude), the amplitude of the main element impulse, and the parameters for constructing the prediction units (pitch frequency, pitch prediction coefficient, linear prediction coefficient) transmitted multiplexed by the multiplexer circuit 65. Further, the receiving side may be similarly provided with a main element pulse position detection circuit 120 and the speech signal reproduced based on the parameters demultiplexed at the demultiplexer circuit 66.

[0049] Therefore, since the sound source signal is generated by adding the white noise and the impulse, it is possible to accurately generate not only a white noise-like sound source of unvoiced speech sounds, but also a periodic pulse series sound source of voiced speech sounds by control of the amplitude components and therefore possible to improve the quality of the reproduced speech.

[0050] Figure 12 shows an embodiment of the main element pulse position detection circuit 110 used in the above-mentioned embodiment. In this embodiment, provision is made of a linear prediction unit 111 which applies linear prediction to N number of impulse vectors (these may be generated also from a separately provided memory) with different pulse positions, a phase difference calculation unit 112 which calculates a phase difference between a code vector C₁ obtained by applying linear prediction to the white noise of the code book 10 by the linear prediction unit 11 and an impulse code vector C₂ⁱ (where i = 1, 2, ...N) to which linear prediction from the linear prediction unit 111 is applied, a maximum value detection unit 113 which detects the maximum value of the phase difference calculated by the phase difference calculation unit 112, and an impulse generating circuit 114 which decides on the position of the main element pulse by the maximum value detected by the maximum value detection unit 113 and generates an impulse at the position of the main element pulse.

[0051] In such a main element pulse position detection circuit 110, the impulse code vector is sought giving the minimum phase difference ϑ_i between the code vector C₁ obtained by applying linear prediction to the vectors stored in the code book 10 and the N number of impulse code vectors C₂ⁱ, that is, giving the maximum value of
cos²ϑ_i = (C₁, C₂ⁱ)²/{(C₁, C₁)·(C₂ⁱ, C₂ⁱ)},
thereby enabling determination of the position of the main element pulse.

[0052] In this case, by providing a main element pulse position detection circuit even on the decoder side, it is possible to extract the phase information of the main element pulse from the phase of the code vector even without transmission of the same and therefore it is possible to improve the characteristics by an increase of just the amplitude information of the main element pulse.

[0053] According to the above explained first to third embodiments, in addition to the addition of two vectors, i.e., the white noise code vector and the pitch prediction vector, an impulse code vector generated by a code book or table etc. at a position corresponding to the position of predetermined pulses of the white noise code vector is added and the identification performed by this composite vector of three vectors, so it is possible to create not only a sound source of unvoiced speech sounds, but also a pulse-like sound source of voiced speech sounds and possible to improve the quality of the reproduced speech. Further, by separating the vector of the residual signal of the impulse from the vector of the residual signal of the white noise, it is possible to increase the effect of data compression.

[0054] Further, according to the above embodiment, it is possible to control the amplitude of the elements by combining the white noise vector and the impulse vector corresponding to the main element, so it is possible to create a more effective pulse sound source than even with generation of a fixed impulse.

[0055] Next, an explanation will be made of a fourth embodiment of the speech coding system of the present invention. The fourth embodiment of the present invention constitutes the conventional CELP type speech coding system wherein the vector of the residual signal of the white noise and the vector of the residual signal of the impulse are added by a ratio based on the strength of the pitch correlation of the input speech signal obtained by pitch prediction so as to obtain a composite vector. The composite vector is reproduced to obtain a reproduced signal and the error of that with the input speech signal is evaluated.

[0056] Therefore, in the fourth embodiment, since the vector of the residual signal of the white noise and the vector of the residual signal of the impulse are added by a ratio based on the strength of the pitch correlation of the input speech signal and the composite vector is reproduced, it is possible to accurately identify and code not only the white noise-like sound source of unvoiced speech sounds, but also the periodic pulse series sound source of voiced speech sounds and thereby to improve the quality of the reproduced speech.

[0057] Figure 13 is a block diagram of the fourth embodiment of the system of the present invention. In the figure, portions the same as Fig. 1 are given the same reference numerals and explanations thereof are omitted.

[0058] In Fig. 13, there is additionally provided a table 60 in the code book 10 in which are stored 2^m patterns of N order vectors of residual signals of white noise. In this table 60 are stored the positions of elements (samples) of the maximum amplitude for each of the 2^m patterns of vectors in the code book 10.

[0059] The white noise vector read out from the code book 10 in accordance with the search pattern index from the evaluating circuit 16 is supplied to the impulse generating unit 61 and the weighting and addition circuit 62, while the maximum amplitude position read out from the table is supplied to the impulse generating unit 61.

[0060] The impulse generating unit 61 picks out the element of the maximum amplitude position from in the white noise vector as shown in Fig. 14(A) and generates an impulse vector as shown in Fig. 14(B) with the remaining N-1 elements all made 0 and supplies the impulse vector to the weighting and addition circuit 62.

[0061] The weighting and addition circuit 62 multiplies the weighting sinϑ and cosϑ supplied from the later mentioned pitch correlation calculation unit 63 with the white noise vector and impulse vector for performing the weighting, then performs the addition. The composite vector obtained here is supplied to the multiplier unit 11.

[0062] The code vector gC becomes equal to the impulse vector when the pitch correlation is maximum (cosϑ = 1) and becomes equal to the white noise vector when the pitch correlation becomes minimum (cosϑ = 0). That is, the property of the code vector may be continuously changed between the impulse and white noise in accordance with the strength of the pitch correlation of the input speech signal, whereby the precision of identification of the sound source with respect to an input speech signal can be improved.

[0063] The pitch correlation calculation unit 63 finds the phase difference ϑ between the later mentioned pitch prediction vector and the vector of the input speech signal to obtain the pitch correlation (weighting) cosϑ and the weighting sinϑ.

[0064] The evaluating circuit 16 searches through the code book 10 and decides on the index giving the smallest error signal power. The index of the code book 10 decided on, that is, the phase of the residual vector of the white noise, the gain, that is, the amplitude of the residual vector, of the multiplier unit 11, the frequency and coefficient (λ and cosϑ) of the pitch prediction unit 12 as in the past, and the coefficient of the linear prediction unit 13 are transmitted multiplexed by the multiplexer circuit 17. In this embodiment too, the gain is preferably variable.

[0065] The transmitted multiplexed signal is demultiplexed by the demultiplexer circuit 19. The code book 20 and the table 70 are each of the same construction as the code book 10 and the table 60. The vector and maximum amplitude position indicated by the respective indexes (phases) are read out from the code book 20 and the table 70.

[0066] The impulse generating unit 71 generates an impulse vector in the same way as the impulse generating unit 61 on the coding unit side and supplies the same to the weighting circuit 72. The weighting circuit 72 prepares the weighting sinϑ from the pitch correlation (weighting) cosϑ from among the coefficients (λ and cosϑ) from the pitch prediction unit 12 transmitted and demultiplexed. With these, the white noise vector and the impulse vector are weighted and added and the composite vector is supplied to the multiplier 21. Reproduction is performed at the pitch prediction unit 22 and the linear prediction unit 23.

[0067] The circuit construction of the speech coding system of the above embodiment may be expressed as shown in Fig. 16. In Fig. 16, portions the same as in Fig. 2 are given the same reference numerals and explanations thereof are omitted.

[0068] In Fig. 16, the vector of the residual signal of the white noise from the code book 43 is subjected to prediction by the linear prediction unit 44 and multiplied with the weighting sinϑ by the multiplier unit 80, one type of variable gain circuit, to obtain a white noise code vector. Further, the vector of the residual signal of the impulse generated from the white noise vector at the impulse generating unit 81 is subjected to prediction by the linear prediction unit 82 and multiplied by the weighting cosϑ by the multiplier 83, one type of variable gain circuit, to obtain an impulse code vector. These are added by the adder 84 and further multiplied by the gain g at the adder 45 (amplitude of code vector) to give the code vector gC. This code vector gC is added by the adder 49 with the pitch prediction vector bP output from the multiplier unit 48 and the composite vector X˝ is obtained. The error E between the composite vector X˝ output by the adder 50 and the target vector X is evaluated by the evaluating circuit 51. Figure 17 illustrates this vector operation.

[0069] In this case, the code vector gC changes in accordance with the weighting cosϑ, sinϑ from white noise to an impulse, but the pitch prediction vector bP and the code vector gC may be used to determine the phases P and C and amplitudes b and g of the two vectors in the same way as the past without change to the process of identification of the input.

[0070] Here, an explanation will be made of the pitch correlation calculation unit 85 together with Figs. 15(A) and (B). Figure 15(A) takes out a portion of Fig. 16.

[0071] The amplitude component b of the pitch prediction vector bP is nothing other than the prediction coefficient b of the pitch prediction unit, but this value may be found by identifying the input signal by only the pitch prediction vector using the code vector gC as "0" in the above-mentioned speech signal analysis (equation (8) and equation (9)). Here, the pitch prediction coefficient b, as shown in equation (10), is the product of the amplitude ratio λ of the target vector X and the pitch prediction vector P and the pitch correlation cosϑ. The value of the pitch correlation is maximum (cosϑ = 1) when the phase of the pitch prediction vector matches the phase of the target vector (ϑ = 0). The larger the phase difference ϑ of the two vectors, the smaller this is. Further, the value is also the value showing the strength of the periodicity of the speech signal, so it is possible to use this to control the ratio of the white noise element and the impulse element in the speech signal. Figure 17 illustrates the above-mentioned vector operation.
|E|² = |X-bP|² (8)
where,
∂|E|²/∂b = 0

[0072] By this,
b = (X,P)/(P,P) (9)
b = λ·cosϑ (10)
where, λ is the amplitude ratio and ϑ is the phase difference and
λ = |X|/|P|

[0073] In this way, the white noise vector and the impulse vector are added with the amplitudes of their respective elements controlled, so it is possible to accurately identify and code not only the white noise-like sound source of unvoiced speech sounds, but also the periodic pulse series sound source of voiced speech sounds, a problem in the past, and thereby to vastly improve the quality of the reproduced speech.

[0074] Further, the phase of the impulse vector added to the white noise vector is made to correspond unconditionally to the phase of the white noise and even the strength of the pitch correlation cosϑ is transmitted as the pitch prediction coefficient (b = λ·cosϑ , so there is no increase in the amount of information transmitted compared with the conventional system.

[0075] Note that the drawing of a correspondence between the phases of the impulse vectors and the phases of the white noise vectors is not limited to the above-mentioned maximum amplitude position.

[0076] As mentioned above, according to the speech coding system of this embodiment, it is possible to accurately identify and code not only the sound source of unvoiced speech sounds but also the pulse-like sound source of voiced speech sounds, not possible in the past, and is possible to improve the quality of the reproduces signal. Further, there is no increase in the amount of the information transmitted, making this very practical.

[0077] That is, in the embodiment, not all the information on the gain (amplitude) and residual vectors (phase) is transmitted, so transmission is possible with the information compressed. It is possible to freely select fro the above plurality of embodiments, in accordance with the desired objective, in this invention, where there is never any deterioration of the quality of the reproduced signal. For example, when desiring to obtain a compression effect without increasing the amount of information, use may be made of the second and third embodiments, while when desiring to obtain a compression effect even at the expense of the characteristics of the reproduced speech, use may be made of the fourth embodiment.

Claims

1. A system for speech coding of the CELP type wherein a reproduced signal is generated from a control vector obtained by applying linear prediction to a vector of a residual signal of white noise of a code book and a pitch prediction vector obtained by applying linear prediction to a residual signal of a preceding frame given a delay corresponding to a pitch frequency, the error between the reproduced signal and an input speech signal is evaluated, the vector giving the smallest error is sought, and the input speech signal is encoded accordingly, the system for speech coding characterized in that in addition to the code vector and pitch prediction vector, use is made of a residual signal vector of an impulse having a predetermined relationship with the vectors of the the white noise code book, variable gains are given to at least the code vector and an impulse vector obtained by applying linear prediction to the vector of the residual signal of the impulse, then the vectors are added to form a reproduced signal and the reproduced signal is used to identify the input speech signal.

2. A system for speech coding according to claim 1, characterized in that the respective residual signal vectors of the impulses having a predetermined relationship with the vectors of said white noise code book correspond to the vectors of the said white noise code book.

3. A system for speech coding according to claim 2, characterized in that the vectors of the residual signals of the impulses correspond to just predetermined pulse positions in the vectors of the said white noise code book.

4. A system for speech coding according to claim 2, characterized in that the vectors of the residual signals of the impulses correspond to pulse positions of the maximum amplitude in the vectors of the said white noise code book.

5. A system for speech coding according to claim 2, characterized in that the vectors of the residual signals of the impulses corresponding to one position selected from one of said predetermined pulse positions in the vectors of the said white noise code book and the pulse positions of the maximum amplitude are stored in a separately provided code book.

6. A system for speech coding according to claim 4, characterized in that the vectors of the residual signals of the impulses corresponding to one position selected from one of said predetermined pulse positions in the vectors of the said white noise code book and the pulse positions of the maximum amplitude are stored in a separately provided code book.

7. A system for speech coding according to claim 1, characterized in that the residual signal vector of the impulse having a predetermined relationship with the vectors of the white noise code book is the main element impulse in the vectors of the white noise code book.

8. A system for speech coding according to claim 1, characterized in that the residual signal vector of the white noise and the vector of the residual signal of the impulse are adjusted by a predetermined coefficient derived from a vector of said input speech signal and a pitch prediction vector obtained by applying linear prediction to a residual signal of a preceding frame and that the error is evaluated.

9. A system for speech coding according to claim 8, characterized in that the residual signal vector of the white noise and the vector of the residual signal of the impulse are weighted by a predetermined coefficient derived from a vector of said input speech signal and a pitch prediction vector obtained by applying linear prediction to a residual signal of a preceding frame and that the error is evaluated.

10. A system for speech coding according to claim 9, characterized in that the residual signal vector of the white noise and the vector of the residual signal of the impulse are added in a ratio according to an intensity of a pitch correlation obtained by applying linear prediction to the vector of said input speech signal and the vector of the residual signal of the preceding frame, reproducing said composite vector, and evaluating the error between the resultant reproduced signal and the vector of said input speech signal.

11. A system for speech coding according to claim 10, characterized in that the said pitch correlation is a function of angle.

12. A system for speech coding according to claim 1, characterized in that the vector of the residual signal of said impulse is separated from the vector of the residual signal of the white noise.

13. An apparatus for speech coding characterized by being provided with a pitch frequency delay circuit giving a delay corresponding to a pitch frequency to a vector of a preceding residual signal, a first code book storing a plurality of vectors of residual signals of white noise, an impulse generating circuit generating an impulse having a predetermined relationship with the vectors of the residual signals of the white noise stored in the said first code book, linear prediction circuits connected to said pitch frequency delay circuit, said first code book, and said impulse generating circuit, a variable gain circuit for giving a variable gain to vectors output from said linear prediction circuits connected to at least said first code book and said impulse generating circuit, a first addition circuit for adding the outputs of the said variable gain circuit and producing a reproduced composite vector, an input speech signal input unit, a second addition circuit for adding said reproduced composite vector and the vector of said input speech signal, and an evaluating circuit for evaluating the output of said second addition circuit and identifying the input speech signal from the vector of the reproduced signal.

14. An apparatus for speech coding according to claim 13, characterized in that the said first addition circuit is comprised of a first adder which adds only the outputs from the said linear prediction circuits connected to said pitch frequency delay circuit and said first code book and a second adder which adds the output from the linear prediction circuit connected to the said impulse generating circuit.

15. An apparatus for speech coding according to claim 13, characterized in that said impulse generating circuit is driven by a main element pulse position detection circuit which receives as input the output from the said linear prediction circuit connected to said first code book.

16. An apparatus for speech coding according to claim 15, characterized in that said main element pulse position detection circuit has a function of extracting a pulse position giving the smallest phase error between an output vector from said linear prediction circuit connected to said first code book and a vector obtained by applying linear prediction to one pulse corresponding to sample times of residual signal vectors stored in said first code book.

17. An apparatus for speech coding according to claim 13, characterized in that said impulse generating circuit comprises a second code book storing a plurality of impulses corresponding to the plurality of residual signal vectors of white noise stored in said first code book.

18. An apparatus for speech coding according to claim 17, characterized in that the said second code book stores the orders showing the maximum pulses in the residual signal vectors of the white noise stored in the said first code book.

19. An apparatus for speech coding according to claim 17, characterized in that the said impulse generating circuit has an impulse separating circuit which separates said impulses from the residual signal vectors of the white noise stored in the said first code book.

20. An apparatus for speech coding according to claim 13, characterized in that in producing from the outputs of the said first code book and said impulse generating circuit a reproduced vector through the linear prediction circuit and variable gain circuit, provision is made of a weighting circuit for controlling said linear prediction circuit and variable gain circuit and that said weighting circuit is connected to a pitch correlation calculating circuit which receives as input a pitch prediction vector obtained by applying linear prediction to a vector of an input speech signal and a residual signal vector of a preceding frame.

Drawing