BACKGROUND OF THE INVENTION
1. Field of the Invention
[0001] The present invention relates to a system for speech coding and an apparatus for
the same, more particularly relates to a system for high quality speech coding and
an apparatus for the same using vector quantization for data compression of speech
signals.
2. Description of the Related Art
[0002] In recent years, use has been made of vector quantization for maintaining the quality
and compressing the data of speech signals in intra company communication systems,
digital mobile radio systems, etc. The vector quantization system is a well known
one in which predictive filtering is applied to the signal vectors of a code book
to prepare reproduced signals and the error powers between the reproduced signals
and an input speech signal are evaluated to determine the index of the signal vector
with the smallest error. There is rising demand, however, for a more advanced method
of vector quantization so as to further compress the speech data.
[0003] Figure 1 shows an example of a system for high quality speech coding using vector
quantization. This system is known as the code excited LPC (CELP) system. In this,
a code book 10 is preset with 2
m patterns of residual signal vectors produced using N samples of white noise signal
which corresponds to N dimensional vector (in this case, shape vectors showing the
phase, hereinafter referred to simply as vectors). The vectors are normalized so that
the power of N samples (N being, for example 40) becomes a fixed value.
[0004] Vectors read out from the code book 10 by the command of the evaluating circuit 16
are given a gain by a multiplier unit 11, then converted to reproduced signals through
two adaptive prediction units, i.e., a pitch prediction unit 12 which eliminates the
long term correlation of the speech signals and a linear prediction unit 13 which
eliminates the short term correlation of the same.
[0005] The reproduced signals are compared with digital speech signals of the N samples
input from a terminal 15 in a subtractor 14 and the errors are evaluated by the evaluating
circuit 16.
[0006] The evaluating circuit 16 selects the vector of the code book 10 giving the smallest
power of the error and determines the gain of the multiplier unit 11 and a pitch prediction
coefficient of the pitch prediction unit 12.
[0007] Further, as shown in Fig. 2, the linear prediction unit 13 uses the linear prediction
coefficient found from the current frame sample values by a linear prediction analysis
unit 18 in a linear difference equation as filter tap coefficients. The pitch prediction
unit 12 uses the pitch prediction coefficient and pitch frequency of the input speech
signal found by a pitch prediction analysis unit 31 through a reverse linear prediction
filter 30 as filter parameters.
[0008] The index of the optimum vector in the code book 10, the gain of the multiplier unit
11, and the parameters for constituting the prediction units (pitch frequency, pitch
prediction coefficient, and linear prediction coefficient) are multiplexed by a multiplexer
circuit 17 and become coded information.
[0009] The pitch period of the pitch prediction unit 12, is, for example, 40 to 167 samples,
and each of the possible pitch periods is evaluated and the optimum period and the
optimum period is chosen. Further, the transmission function of the linear prediction
unit 13 is determined by linear predictive coding (LPC) analysis of the input speech
signal. Finally, the evaluating circuit 16 searches through the code book 10 and determines
the index giving the smallest error power between the input speech signal and residual
signal. The index of the code book 10 which is determined, that is, the phase of the
residual vector, the gain of the multiplier unit 11, that is, the amplitude of the
residual vector, the frequency and coefficient of the pitch prediction unit 12, and
the coefficients of the linear prediction unit 13 are transmitted multiplexed by the
multiplexer circuit 17.
[0010] On the decoder side, a vector is read out from a code book 20 having the same construction
as the code book 10, in accordance with the index, gain, and prediction unit parameters
obtained by demultiplexing by the demultiplexer circuit 19 and is given a gain by
a multiplier unit 21, then a reproduced speech signal is obtained by prediction by
the prediction units 22 and 23.
[0011] In such a CELP system, as the means for producing the speech signal, use is made
of the code book 10 comprised of white noise and the pitch prediction unit 12 for
giving periodicity at the pitch frequencies, but the decision on the phase of the
code book 10, the gain (amplitude) of the multiplier unit 11, and the pitch frequency
(phase) and pitch prediction coefficient (amplitude) of the prediction unit 12 is
made equivalently as shown in Fig. 3.
[0012] That is, the processing for reproducing the vector of the code book 10 by the pitch
prediction unit and linear prediction units for identification of the input signal,
considered in terms of the vectors, may be considered processing for the identification,
by subtraction and evaluation by a subtractor 50, of a target vector X obtained by
removing from the input signal S of one frame input from a terminal 40, by a subtractor
41, the effects of the previous frame S₀ stored in a previous frame storage 42, with
a vector X′ obtained by adding by an adder 49 a code vector gC obtained by applying
linear prediction to a vector selected from a code book 10 by a linear prediction
unit 44 (corresponding to the linear prediction unit 13 of Fig. 1) and giving a gain
g to the resultant vector C by a multiplier unit 45 and a pitch prediction vector
bP obtained by applying linear prediction by a linear prediction unit 47 to a residual
signal of the previous frame given a delay corresponding to a pitch frequency from
a pitch frequency delay unit 46 (corresponding to the pitch frequency analyzed by
the pitch prediction analysis unit 31 of Fig. 1) and giving a gain b (corresponding
to the pitch prediction coefficient analyzed by the pitch prediction unit 31 of Fig.
1) to the resultant vector P.
[0013] When the phase C of the code vector and the phase P of the pitch prediction vector
are given, the amplitude g of the code vector and the amplitude b of the pitch prediction
vector which, as shown in Fig. 4, satisfy the condition that the value of the error
power |E|² partially differentiated by b and g by the following equation (1) is 0
so as to give the minimum error signal power, that is, satisfy
∂|E|²/∂b = 0, ∂|E|²/∂g = 0
may be found from the following equations (2) and (3) for all combinations of the
phases (C,P) of the two vectors and thereby the set of the most optimal amplitudes
and phases (g, b, C, P) sought:
|E|² = |X - bP -gC|² (1)
b = ((C,C)(X,P)-(C,P)(X,C))/Δ (2)
g = ((P,P)(X,C)-(C,P)(X,P)}/Δ (3)
where,
Δ = (P,P)(C,C)-(C,P)(C,P)}
and (,) indicates the scalar product of the vector.
[0014] Here, speech signals include voiced speech sounds and unvoiced speech sounds which
are characterized in that the respective drive source signals sound sources) are periodic
pulses or white noise with no periodicity.
[0015] In the CELP system, explained above as a conventional system, pitch prediction and
linear prediction were applied to the vectors of the code book comprised of white
noise as a sound source and the pitch periodicity of the voiced speech sounds was
created by the pitch prediction unit 12.
[0016] Therefore, while the characteristics were good when the sound source signal was a
white noise-like unvoiced speech sound, the pitch periodicity generated by the pitch
prediction unit was created by giving a delay to the past sound source series by pitch
prediction analysis, and the past sound source series was series of white noise originally
obtained by reading code vectors from a code book, therefore, it was difficult to
create a pulse series corresponding to the sound source of a voiced speech sound.
This was a problem in that in the transitional state from an unvoiced speech sound
to a voiced speech sound, the effect of this was large and high frequency noise was
included in the reproduced speech, resulting in a deterioration of the quality.
SUMMARY OF THE INVENTION
[0017] Therefore, the present invention has as its object, in a CELP type speech coding
system and apparatus wherein a gain is given to a code vector obtained by applying
linear prediction to white noise of a code book and a pitch prediction vector obtained
by applying linear prediction to a residual signal of a preceding frame given a delay
corresponding to the pitch frequency, a reproduced signal is generated from the same,
and the reproduced signal is used to identify the input speech signal, the creation
of a pulse series corresponding to the sound source of a voiced speech sound and the
accurate identification and coding for even a pulse-like sound source of a voiced
speech sound so as to improve the quality of the reproduced speech.
[0018] To achieve the above object, there is provided, according to one technical aspect
of the present invention, a system for speech coding of the CELP type wherein a reproduced
signal is generated from a code vector obtained by applying linear prediction to a
vector of a residual signal of white noise of a code book and a pitch prediction vector
obtained by applying linear prediction to a residual signal of a preceding frame given
a delay corresponding to a pitch frequency, the error between the reproduced signal
and an input speech signal is evaluated, the vector giving the smallest error is sought,
and the input speech signal is encoded accordingly, the system for speech coding characterized
in that in addition to the code vector and pitch prediction vector, use is made of
a residual signal vector of an impulse having a predetermined relationship with the
vectors of the white noise code book, variable gains are given to at least the code
vector and an impulse vector obtained by applying linear prediction to the vector
of the residual signal of the impulse, then the vectors are added to form a reproduced
signal and the reproduced signal is used to identify the input speech signal.
[0019] Further, there is provided, according to another technical aspect of the present
invention, an apparatus for speech coding characterized by being provided with a pitch
frequency delay circuit giving a delay corresponding to a pitch frequency to a vector
of a preceding residual signal, a first code book storing a plurality of vectors of
residual signals of white noise, an impulse generating circuit generating an impulse
having a predetermined relationship with the vectors of the residual signals of the
white noise stored in the first code book, linear prediction circuits connected to
the pitch frequency delay circuit, the first code book, and the impulse generating
circuit, a variable gain circuit for giving a variable gain to vectors output from
the linear prediction circuits connected to at least the first code book and the impulse
generating circuit, a first addition circuit for adding the outputs of the variable
gain circuit and producing a reproduced composite vector, an input speech signal input
unit, a second addition circuit for adding the reproduced composite vector and the
vector of the input speech signal, and an evaluating circuit for evaluating the output
of the second addition circuit and identifying the input speech signal from the vector
of the reproduced signal.
BRIEF DESCRIPTION OF THE DRAWINGS
[0020]
Figures 1 and 2 are block diagrams for explaining an example of a speech coding system
of the related art;
Figs. 3 and 4 are views for explaining the method of analysis in the system of the
related art;
Fig. 5 is a block diagram of an embodiment of the system of the present invention;
Fig. 6 is a circuit diagram for realization of the embodiment shown in Fig. 5;
Fig. 7 is a view showing the method of analysis according to the system of the present
invention;
Fig. 8 is a block diagram of part of another embodiment of the system of the present
invention;
Fig. 9 is a view showing signals of various portions of Fig. 8;
Fig. 10 is a circuit diagram showing another embodiment of the present invention;
Fig. 11 is a block diagram of the other embodiment of the present invention shown
in Fig. 10;
Fig. 12 is a view of an example of a main element pulse position detecting circuit
used in the other embodiment of the present invention shown in Fig. 10;
Fig. 13 is a block diagram showing another embodiment of the present invention;
Fig. 14 is a view showing signals of various portions in Fig. 13;
Figs. 15(A) and (B) are views for explaining the method of calculation of the pitch
correlation of the embodiment of Fig. 13;
Fig. 16 is a view showing an example of the circuit for realizing the other embodiment
of the present invention shown in Fig. 13; and
Fig. 17 is a view showing the method of analysis in the other embodiment of the present
invention shown in Fig. 13.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0021] Embodiments of the speech coding system and the speech coding apparatus of the present
invention will be explained in detail below while referring to the appended drawings.
[0022] The basic constitution of the speech coding system of the present invention, as mentioned
above, is that of a conventionally known CELP type speech coding system wherein in
addition to the code vector and pitch prediction vector, use is made of a residual
signal vector of an impulse having a predetermined relationship with the vectors of
the white noise code book, variable gains are given to at least the code vector and
an impulse vector obtained by applying linear prediction to the vector of the residual
signal of the impulse, then the vectors are added to form a reproduced signal and
the reproduced signal is used to identify the input speech signal.
[0023] That is, the present invention is constituted by a conventionally known system wherein
a synchronous pulse serving as a sound source for voiced speech sounds is introduced
and a pulse-like sound source of voiced speech sounds is created by the use of a residual
signal vector of an impulse having a predetermined relationship with the vectors of
the white noise code book. By this, in the present invention, the vector of the residual
signal of the white noise and the vector of the residual signal of the impulse are
added while varying the amplitude components of the two vectors so as to reproduce
a composite vector, so it is possible to accurately identify and code not only the
white noise-like sound source of unvoiced speech sounds, but also the periodic pulse
series sound source of voiced speech sounds and thereby to improve the quality of
the reproduced signal.
[0024] The residual signal vector of the impulse used in the present invention may be an
impulse vector having a predetermined relationship with the residual vectors of white
noise stored in the first code book 10, specifically, may be one corresponding to
one residual vector of white noise stored in the first code book. Further, the one
impulse vector may be one corresponding to one of the predetermined sample positions,
i.e., predetermined pulse positions, of a white noise residual vector in the first
code book. More specifically, as mentioned later, the impulse vector may be one corresponding
to a main element pulse position in the white noise residual vector or, as a simpler
method, the impulse vector may be one corresponding to the maximum amplitude pulse
position of the white noise residual vector. The impulse residual vector used in the
present invention may be one formed by separation from a white noise residual vector
stored in the first code book. Further, for that purpose, use may be made of a second
code book for storing command information for separating this from the white noise
residual vector stored in the first code book. Also, the second code book may store
preformed impulse vectors.
[0025] Therefore, the second code book preferably is of the same size as the first code
book.
[0026] Figure 5 is a block diagram of an embodiment of a speech coding system of the present
invention. In the figure, portions the same as in Fig. 1 are given the same reference
numerals and explanations of the same are omitted.
[0027] Figure 5 shows the constitution of the transmission side. In the code book 10 are
stored 2
m patterns of N dimensional vectors of residual signals formed by white noise, as in
the past. In the code book 60 are stored N patterns of N dimensional vectors of residual
signals of impulses shifted successively in phase.
[0028] The impulse vectors from the code book 60 are supplied through a multiplier unit
61 to an adder 62 where they are added with vectors of white noise supplied from the
code book 10 through an adder 11 and the result is supplied to a pitch prediction
unit 12. An evaluating circuit 16 searches through the code books 10 and 60 and determines
the vector giving the smallest error signal power between the input speech signal
and the reproduced signal from the linear prediction unit 13. The index of the code
book 10 decided on, that is, the phase-1 of the residual vector of the white noise,
the index of the code book 60, that is, the phase-2 of the residual vector of the
impulse, and the gains of the multiplier units 11 and 61, i.e., the amplitude-1 and
amplitude-2 of the residual vectors, the frequency and coefficient of the pitch prediction
unit 12 as in the past, and the coefficient of the linear prediction unit 13 are transmitted
multiplexed by a multiplexer circuit 65.
[0029] On the receiving side, the transmitted multiplexed signal is demultiplexed by the
demultiplexer circuit 66. Code books 20 and 70 have the same constitutions as the
code books 10 and 60. From the code books 20 and 70 are read out the vectors indicated
by the indexes (phase-1 and phase-2). These are passed through the multiplier units
21 and 71, then added by the adder 72 and reproduced by the pitch prediction unit
22 and further the linear prediction unit 23.
[0030] Further, while not shown in the embodiment, in the same way as in Fig. 2, use is
made of a linear prediction analysis unit 18, reverse linear prediction unit filter
30, and pitch prediction analysis unit 31, of course.
[0031] Figure 6 shows an example of the circuit constitution for realizing the above embodiment
according to the speech coding system of the present invention. In Fig. 6, portions
the same as in Fig. 3 are given the same reference numerals and explanations thereof
are omitted.
[0032] In Fig. 6, a vector of a residual signal of white noise from a first code book 43
is subjected to prediction by a linear prediction unit 44 and multiplied with a gain
g: by a multiplier unit 45, one example of a variable gain circuit, to obtain a white
noise code vectors g₁C₁. Further, the vectors of residual signals of impulses from
a second code book 80 are subjected to prediction by a linear prediction unit 81 and
multiplied by a gain g₂ by a multiplier unit 82, similarly an example of a variable
gain circuit, to obtain an impulse code vector g₂C₂ The above-mentioned code vectors
g₁C₁ and g₂C₂ and a pitch prediction vector bP output from a multiplier unit 48 are
added by adders 49 and 83 to give a composite vector X˝. The error E between the composite
vector X˝ output by the adder 83 and the target vector is evaluated by an evaluating
circuit 51. Figure 7 illustrates the vector operation mentioned above.
[0033] At this time, the equation for evaluation of the error signal power |E|² is expressed
by equation (4). The amplitude b of the pitch prediction vector and the amplitudes
g₁ and g₂ of the code vectors giving the minimum such power are determined by equations
(5), (6), and (7):
|E|² = |X-bP-g₁c₁-g₂c₂|² (4)
where,
∂|E|²/αb = 0
∂|E|²/αg₁ = 0
∂|E|²/αg₂ = 0
[0034] By this,
b = {(Z5 X Z6 X Z7 + Z2 X Z4 X Z9 + Z3 X Z4 X Z8) - (Z3 X Z5 X Z9 + Z4 X Z4 X Z7 +
Z2 X Z6 X Z8)}/Δ (5)
g₁ = {(Z1 X Z6 X Z8 + Z3 X Z4 X Z7 + Z2 X Z3 X Z9) (Z3 X Z3 X Z8 + Z1 X Z4 X Z9 +
Z2 X Z6 X Z7)}/Δ (6)
g₂ = ((Z1 X Z5 X Z9 + Z2 x Z3 X Z8 + Z2 X Z4 X Z7) - (Z3 X Z5 X Z7 + Z2 X Z2 X Z9
+ Z1 X Z4 X Z8)}/Δ (7)
Δ = Z1 X Z5 X Z6 + 2 X Z2 X Z3 X Z4 - Z3 X Z3 X Z5 - Z1 X Z4 X Z4 - Z2 X Z2 X Z6
where,
Z1 = (P, P), Z2 = (P, C₁),
Z3 = (P, C₂), Z4 = (C₁, C₂),
Z5 = (C₁, C₁), Z6 = (C₂, C₂),
Z7 = (X, P), Z8 = (X, C₁),
Z9 = (X, C₂)
[0035] Therefore, to determine the most suitable code vector and pitch prediction vector,
one may find the amplitudes g₁, g₂, and b by the equations (5), (6), and (7) for all
the combinations of the phases C₁, C₂, and P of the three vectors and search for the
set of the amplitudes and phases g₁, g₂, b, C₁, C₂, and P giving the smallest error
signal power.
[0036] Here, the phase of the impulse code vector C₂ corresponds unconditionally to the
phase of the white noise code vector C₁, so to determine the optimum drive source
vector, one may find the b, g₁, and g₂ giving the value of 0 for the error power |E|²
partially differentiated by b, g₁, and g₂ for all combinations of the phases (P,C₁)
of the white noise code vector C₁ and the pitch prediction vector P and thereby find
amplitudes (b, g₁, and g₂) by equations (5) to (7) and search for the set of amplitudes
and phases (b, g₁, g₂ P, C₁) giving the smallest error signal power of equation (4).
[0037] In this way, it is possible to identify input speech signals by adding a periodic
pulse serving as a sound source of voiced speech sounds missing in the white noise
code book.
[0038] Figure 8 shows the case of establishment of an impulse vector at a pulse position
showing the maximum amplitude in the white noise residual vector, with respect to
the impulse vectors and the white noise residual vectors stored in the first code
book in the present invention. In Fig. 8, the first code book 10 is provided with
a table 90 with a common index i (corresponding to the second code book) and stores
the position of the elements (sample) with the maximum amplitudes among the patterns
of white noise vectors of the code book 10. The white noise vector and maximum amplitude
position read out from the code book 10 and the table 90 respectively in accordance
with the search pattern indexes entering from the evaluating circuit 16 through a
terminal 91 are supplied to an impulse separating circuit 92 where, as shown in Fig.
9(A), just the maximum amplitude position sample is removed from the white noise vector.
So, the white noise vector shown in (B) of the figure which has a plurality of amplitude
values at each of the sampling position except the maximum amplitude value at the
sampling position in which the maximum amplitude value was obtained and the amplitude
value is shown as "0" at the sampling position, and the impulse shown in (C) of the
figure which only has a maximum amplitude value at the sampling position and no other
amplitude value is shown at any other remaining sampling position, are be generated
and supplied respectively to the multiplier units 11 and 61, and the code book 60
thus eliminated. Of course, the same applies to the code books 20 and 70. In this
case, the sum of the white noise vector and the impulse vector output by the impulse
separating circuit 92 becomes the same as the original white noise vector of the code
book 10, so when the amplitude ratio g₁/g₂ of the multiplier units 11 and 61 is "1",
use may be made of the original white noise and when it is "0" use may be made of
the complete impulse.
[0039] By so making the phase of the impulse vector correspond unconditionally to the white
noise vectors, the need for transmission of the phase-2 of the impulse code vector
is eliminated and the effect of data compression is increased.
[0040] Since the white noise vector and the impulse vector are added by varying the gain
of the amplitudes of the respective elements, it is possible to accurately identify
and code not only the white noise-like sound source of unvoiced speech sounds, but
also the periodic pulse series sound source of voiced speech sound, a problem in the
past, and thereby to vastly improve the quality of the reproduced speech.
[0041] In the embodiment of Fig. 6, the first addition circuit is formed by an adder 49
and an adder 83, but the first addition circuit may be formed by a single unit instead
of the adders 49 and 83.
[0042] Next, another embodiment of the speech coding system of the present invention will
be shown in Fig. 10.
[0043] In Fig. 6, provision was made of a code book comprised of fixed impulses generated
in accordance with only predetermined pulse positions of the vectors in the code book
10, but even if the input speech signal is identified by adding the vector based on
the fixed impulses to the conventional pitch prediction vector and white noise vector,
the optimal identification cannot necessarily be performed. This is because, as shown
in Fig. 6, since linear prediction is applied even to the impulse vector, there is
a distortion in space.
[0044] Therefore, in the third embodiment, the principle of which is shown in Fig. 10, instead
of using fixed impulse vectors, the phase difference between the white noise vector
C₁ after application of linear prediction 44 and the vector obtained by applying linear
prediction to the impulse by the main element pulse position detection circuit 90
is evaluated, whereby the position of the main element pulse is detected. The main
element impulse is generated at this position by the impulse generating unit 91. The
three vectors, i.e., the pitch prediction vector P, the white noise code vector C₁,
and the main element impulse vector are added and the composite vector is used to
identify the input speech signal S.
[0045] Further, even in the third embodiment, a search is made for the set of the amplitudes
and phases (b, g₁, g₂, P, C₁) giving the smallest error signal power by equations
(4) to (7).
[0046] Figure 11 is a block diagram of the third embodiment of the present invention. The
third embodiment differs from the embodiment of Fig. 5 only in that it uses a main
element pulse position detection circuit 110 instead of an impulse code book 60.
[0047] That is, the main element pulse position detection circuit 110 extracts the position
of the main element pulse for the vectors of the white noise code book 10, the main
element pulse generated at that position is multiplied by the gain (amplitude) component
by the multiplier unit 61, one type of variable gain circuit, then is added to the
white noise read out from the code book 10 as in the past and multiplied by the gain
by the multiplier unit 11, also one type of variable gain circuit, and reproduction
is performed by the pitch prediction unit 12 and the linear prediction unit 13.
[0048] Further, since the independent variable gains are multiplied with the white noise
and the main element impulse, the coding information may be, like with Fig. 5, the
white noise code index (phase) and gain (amplitude), the amplitude of the main element
impulse, and the parameters for constructing the prediction units (pitch frequency,
pitch prediction coefficient, linear prediction coefficient) transmitted multiplexed
by the multiplexer circuit 65. Further, the receiving side may be similarly provided
with a main element pulse position detection circuit 120 and the speech signal reproduced
based on the parameters demultiplexed at the demultiplexer circuit 66.
[0049] Therefore, since the sound source signal is generated by adding the white noise and
the impulse, it is possible to accurately generate not only a white noise-like sound
source of unvoiced speech sounds, but also a periodic pulse series sound source of
voiced speech sounds by control of the amplitude components and therefore possible
to improve the quality of the reproduced speech.
[0050] Figure 12 shows an embodiment of the main element pulse position detection circuit
110 used in the above-mentioned embodiment. In this embodiment, provision is made
of a linear prediction unit 111 which applies linear prediction to N number of impulse
vectors (these may be generated also from a separately provided memory) with different
pulse positions, a phase difference calculation unit 112 which calculates a phase
difference between a code vector C₁ obtained by applying linear prediction to the
white noise of the code book 10 by the linear prediction unit 11 and an impulse code
vector C₂
i (where i = 1, 2, ...N) to which linear prediction from the linear prediction unit
111 is applied, a maximum value detection unit 113 which detects the maximum value
of the phase difference calculated by the phase difference calculation unit 112, and
an impulse generating circuit 114 which decides on the position of the main element
pulse by the maximum value detected by the maximum value detection unit 113 and generates
an impulse at the position of the main element pulse.
[0051] In such a main element pulse position detection circuit 110, the impulse code vector
is sought giving the minimum phase difference ϑ
i between the code vector C₁ obtained by applying linear prediction to the vectors
stored in the code book 10 and the N number of impulse code vectors C₂
i, that is, giving the maximum value of
cos²ϑ
i = (C₁, C₂
i)²/{(C₁, C₁)·(C₂
i, C₂
i)},
thereby enabling determination of the position of the main element pulse.
[0052] In this case, by providing a main element pulse position detection circuit even on
the decoder side, it is possible to extract the phase information of the main element
pulse from the phase of the code vector even without transmission of the same and
therefore it is possible to improve the characteristics by an increase of just the
amplitude information of the main element pulse.
[0053] According to the above explained first to third embodiments, in addition to the addition
of two vectors, i.e., the white noise code vector and the pitch prediction vector,
an impulse code vector generated by a code book or table etc. at a position corresponding
to the position of predetermined pulses of the white noise code vector is added and
the identification performed by this composite vector of three vectors, so it is possible
to create not only a sound source of unvoiced speech sounds, but also a pulse-like
sound source of voiced speech sounds and possible to improve the quality of the reproduced
speech. Further, by separating the vector of the residual signal of the impulse from
the vector of the residual signal of the white noise, it is possible to increase the
effect of data compression.
[0054] Further, according to the above embodiment, it is possible to control the amplitude
of the elements by combining the white noise vector and the impulse vector corresponding
to the main element, so it is possible to create a more effective pulse sound source
than even with generation of a fixed impulse.
[0055] Next, an explanation will be made of a fourth embodiment of the speech coding system
of the present invention. The fourth embodiment of the present invention constitutes
the conventional CELP type speech coding system wherein the vector of the residual
signal of the white noise and the vector of the residual signal of the impulse are
added by a ratio based on the strength of the pitch correlation of the input speech
signal obtained by pitch prediction so as to obtain a composite vector. The composite
vector is reproduced to obtain a reproduced signal and the error of that with the
input speech signal is evaluated.
[0056] Therefore, in the fourth embodiment, since the vector of the residual signal of the
white noise and the vector of the residual signal of the impulse are added by a ratio
based on the strength of the pitch correlation of the input speech signal and the
composite vector is reproduced, it is possible to accurately identify and code not
only the white noise-like sound source of unvoiced speech sounds, but also the periodic
pulse series sound source of voiced speech sounds and thereby to improve the quality
of the reproduced speech.
[0057] Figure 13 is a block diagram of the fourth embodiment of the system of the present
invention. In the figure, portions the same as Fig. 1 are given the same reference
numerals and explanations thereof are omitted.
[0058] In Fig. 13, there is additionally provided a table 60 in the code book 10 in which
are stored 2
m patterns of N order vectors of residual signals of white noise. In this table 60
are stored the positions of elements (samples) of the maximum amplitude for each of
the 2
m patterns of vectors in the code book 10.
[0059] The white noise vector read out from the code book 10 in accordance with the search
pattern index from the evaluating circuit 16 is supplied to the impulse generating
unit 61 and the weighting and addition circuit 62, while the maximum amplitude position
read out from the table is supplied to the impulse generating unit 61.
[0060] The impulse generating unit 61 picks out the element of the maximum amplitude position
from in the white noise vector as shown in Fig. 14(A) and generates an impulse vector
as shown in Fig. 14(B) with the remaining N-1 elements all made 0 and supplies the
impulse vector to the weighting and addition circuit 62.
[0061] The weighting and addition circuit 62 multiplies the weighting sinϑ and cosϑ supplied
from the later mentioned pitch correlation calculation unit 63 with the white noise
vector and impulse vector for performing the weighting, then performs the addition.
The composite vector obtained here is supplied to the multiplier unit 11.
[0062] The code vector gC becomes equal to the impulse vector when the pitch correlation
is maximum (cosϑ = 1) and becomes equal to the white noise vector when the pitch correlation
becomes minimum (cosϑ = 0). That is, the property of the code vector may be continuously
changed between the impulse and white noise in accordance with the strength of the
pitch correlation of the input speech signal, whereby the precision of identification
of the sound source with respect to an input speech signal can be improved.
[0063] The pitch correlation calculation unit 63 finds the phase difference ϑ between the
later mentioned pitch prediction vector and the vector of the input speech signal
to obtain the pitch correlation (weighting) cosϑ and the weighting sinϑ.
[0064] The evaluating circuit 16 searches through the code book 10 and decides on the index
giving the smallest error signal power. The index of the code book 10 decided on,
that is, the phase of the residual vector of the white noise, the gain, that is, the
amplitude of the residual vector, of the multiplier unit 11, the frequency and coefficient
(λ and cosϑ) of the pitch prediction unit 12 as in the past, and the coefficient of
the linear prediction unit 13 are transmitted multiplexed by the multiplexer circuit
17. In this embodiment too, the gain is preferably variable.
[0065] The transmitted multiplexed signal is demultiplexed by the demultiplexer circuit
19. The code book 20 and the table 70 are each of the same construction as the code
book 10 and the table 60. The vector and maximum amplitude position indicated by the
respective indexes (phases) are read out from the code book 20 and the table 70.
[0066] The impulse generating unit 71 generates an impulse vector in the same way as the
impulse generating unit 61 on the coding unit side and supplies the same to the weighting
circuit 72. The weighting circuit 72 prepares the weighting sinϑ from the pitch correlation
(weighting) cosϑ from among the coefficients (λ and cosϑ) from the pitch prediction
unit 12 transmitted and demultiplexed. With these, the white noise vector and the
impulse vector are weighted and added and the composite vector is supplied to the
multiplier 21. Reproduction is performed at the pitch prediction unit 22 and the linear
prediction unit 23.
[0067] The circuit construction of the speech coding system of the above embodiment may
be expressed as shown in Fig. 16. In Fig. 16, portions the same as in Fig. 2 are given
the same reference numerals and explanations thereof are omitted.
[0068] In Fig. 16, the vector of the residual signal of the white noise from the code book
43 is subjected to prediction by the linear prediction unit 44 and multiplied with
the weighting sinϑ by the multiplier unit 80, one type of variable gain circuit, to
obtain a white noise code vector. Further, the vector of the residual signal of the
impulse generated from the white noise vector at the impulse generating unit 81 is
subjected to prediction by the linear prediction unit 82 and multiplied by the weighting
cosϑ by the multiplier 83, one type of variable gain circuit, to obtain an impulse
code vector. These are added by the adder 84 and further multiplied by the gain g
at the adder 45 (amplitude of code vector) to give the code vector gC. This code vector
gC is added by the adder 49 with the pitch prediction vector bP output from the multiplier
unit 48 and the composite vector X˝ is obtained. The error E between the composite
vector X˝ output by the adder 50 and the target vector X is evaluated by the evaluating
circuit 51. Figure 17 illustrates this vector operation.
[0069] In this case, the code vector gC changes in accordance with the weighting cosϑ, sinϑ
from white noise to an impulse, but the pitch prediction vector bP and the code vector
gC may be used to determine the phases P and C and amplitudes b and g of the two vectors
in the same way as the past without change to the process of identification of the
input.
[0070] Here, an explanation will be made of the pitch correlation calculation unit 85 together
with Figs. 15(A) and (B). Figure 15(A) takes out a portion of Fig. 16.
[0071] The amplitude component b of the pitch prediction vector bP is nothing other than
the prediction coefficient b of the pitch prediction unit, but this value may be found
by identifying the input signal by only the pitch prediction vector using the code
vector gC as "0" in the above-mentioned speech signal analysis (equation (8) and equation
(9)). Here, the pitch prediction coefficient b, as shown in equation (10), is the
product of the amplitude ratio λ of the target vector X and the pitch prediction vector
P and the pitch correlation cosϑ. The value of the pitch correlation is maximum (cosϑ
= 1) when the phase of the pitch prediction vector matches the phase of the target
vector (ϑ = 0). The larger the phase difference ϑ of the two vectors, the smaller
this is. Further, the value is also the value showing the strength of the periodicity
of the speech signal, so it is possible to use this to control the ratio of the white
noise element and the impulse element in the speech signal. Figure 17 illustrates
the above-mentioned vector operation.
|E|² = |X-bP|² (8)
where,
∂|E|²/∂b = 0
[0072] By this,
b = (X,P)/(P,P) (9)
b = λ·cosϑ (10)
where, λ is the amplitude ratio and ϑ is the phase difference and
λ = |X|/|P|
[0073] In this way, the white noise vector and the impulse vector are added with the amplitudes
of their respective elements controlled, so it is possible to accurately identify
and code not only the white noise-like sound source of unvoiced speech sounds, but
also the periodic pulse series sound source of voiced speech sounds, a problem in
the past, and thereby to vastly improve the quality of the reproduced speech.
[0074] Further, the phase of the impulse vector added to the white noise vector is made
to correspond unconditionally to the phase of the white noise and even the strength
of the pitch correlation cosϑ is transmitted as the pitch prediction coefficient (b
= λ·cosϑ , so there is no increase in the amount of information transmitted compared
with the conventional system.
[0075] Note that the drawing of a correspondence between the phases of the impulse vectors
and the phases of the white noise vectors is not limited to the above-mentioned maximum
amplitude position.
[0076] As mentioned above, according to the speech coding system of this embodiment, it
is possible to accurately identify and code not only the sound source of unvoiced
speech sounds but also the pulse-like sound source of voiced speech sounds, not possible
in the past, and is possible to improve the quality of the reproduces signal. Further,
there is no increase in the amount of the information transmitted, making this very
practical.
[0077] That is, in the embodiment, not all the information on the gain (amplitude) and residual
vectors (phase) is transmitted, so transmission is possible with the information compressed.
It is possible to freely select fro the above plurality of embodiments, in accordance
with the desired objective, in this invention, where there is never any deterioration
of the quality of the reproduced signal. For example, when desiring to obtain a compression
effect without increasing the amount of information, use may be made of the second
and third embodiments, while when desiring to obtain a compression effect even at
the expense of the characteristics of the reproduced speech, use may be made of the
fourth embodiment.
1. A system for speech coding of the CELP type wherein a reproduced signal is generated
from a control vector obtained by applying linear prediction to a vector of a residual
signal of white noise of a code book and a pitch prediction vector obtained by applying
linear prediction to a residual signal of a preceding frame given a delay corresponding
to a pitch frequency, the error between the reproduced signal and an input speech
signal is evaluated, the vector giving the smallest error is sought, and the input
speech signal is encoded accordingly, the system for speech coding characterized in
that in addition to the code vector and pitch prediction vector, use is made of a
residual signal vector of an impulse having a predetermined relationship with the
vectors of the the white noise code book, variable gains are given to at least the
code vector and an impulse vector obtained by applying linear prediction to the vector
of the residual signal of the impulse, then the vectors are added to form a reproduced
signal and the reproduced signal is used to identify the input speech signal.
2. A system for speech coding according to claim 1, characterized in that the respective
residual signal vectors of the impulses having a predetermined relationship with the
vectors of said white noise code book correspond to the vectors of the said white
noise code book.
3. A system for speech coding according to claim 2, characterized in that the vectors
of the residual signals of the impulses correspond to just predetermined pulse positions
in the vectors of the said white noise code book.
4. A system for speech coding according to claim 2, characterized in that the vectors
of the residual signals of the impulses correspond to pulse positions of the maximum
amplitude in the vectors of the said white noise code book.
5. A system for speech coding according to claim 2, characterized in that the vectors
of the residual signals of the impulses corresponding to one position selected from
one of said predetermined pulse positions in the vectors of the said white noise code
book and the pulse positions of the maximum amplitude are stored in a separately provided
code book.
6. A system for speech coding according to claim 4, characterized in that the vectors
of the residual signals of the impulses corresponding to one position selected from
one of said predetermined pulse positions in the vectors of the said white noise code
book and the pulse positions of the maximum amplitude are stored in a separately provided
code book.
7. A system for speech coding according to claim 1, characterized in that the residual
signal vector of the impulse having a predetermined relationship with the vectors
of the white noise code book is the main element impulse in the vectors of the white
noise code book.
8. A system for speech coding according to claim 1, characterized in that the residual
signal vector of the white noise and the vector of the residual signal of the impulse
are adjusted by a predetermined coefficient derived from a vector of said input speech
signal and a pitch prediction vector obtained by applying linear prediction to a residual
signal of a preceding frame and that the error is evaluated.
9. A system for speech coding according to claim 8, characterized in that the residual
signal vector of the white noise and the vector of the residual signal of the impulse
are weighted by a predetermined coefficient derived from a vector of said input speech
signal and a pitch prediction vector obtained by applying linear prediction to a residual
signal of a preceding frame and that the error is evaluated.
10. A system for speech coding according to claim 9, characterized in that the residual
signal vector of the white noise and the vector of the residual signal of the impulse
are added in a ratio according to an intensity of a pitch correlation obtained by
applying linear prediction to the vector of said input speech signal and the vector
of the residual signal of the preceding frame, reproducing said composite vector,
and evaluating the error between the resultant reproduced signal and the vector of
said input speech signal.
11. A system for speech coding according to claim 10, characterized in that the said
pitch correlation is a function of angle.
12. A system for speech coding according to claim 1, characterized in that the vector
of the residual signal of said impulse is separated from the vector of the residual
signal of the white noise.
13. An apparatus for speech coding characterized by being provided with a pitch frequency
delay circuit giving a delay corresponding to a pitch frequency to a vector of a preceding
residual signal, a first code book storing a plurality of vectors of residual signals
of white noise, an impulse generating circuit generating an impulse having a predetermined
relationship with the vectors of the residual signals of the white noise stored in
the said first code book, linear prediction circuits connected to said pitch frequency
delay circuit, said first code book, and said impulse generating circuit, a variable
gain circuit for giving a variable gain to vectors output from said linear prediction
circuits connected to at least said first code book and said impulse generating circuit,
a first addition circuit for adding the outputs of the said variable gain circuit
and producing a reproduced composite vector, an input speech signal input unit, a
second addition circuit for adding said reproduced composite vector and the vector
of said input speech signal, and an evaluating circuit for evaluating the output of
said second addition circuit and identifying the input speech signal from the vector
of the reproduced signal.
14. An apparatus for speech coding according to claim 13, characterized in that the
said first addition circuit is comprised of a first adder which adds only the outputs
from the said linear prediction circuits connected to said pitch frequency delay circuit
and said first code book and a second adder which adds the output from the linear
prediction circuit connected to the said impulse generating circuit.
15. An apparatus for speech coding according to claim 13, characterized in that said
impulse generating circuit is driven by a main element pulse position detection circuit
which receives as input the output from the said linear prediction circuit connected
to said first code book.
16. An apparatus for speech coding according to claim 15, characterized in that said
main element pulse position detection circuit has a function of extracting a pulse
position giving the smallest phase error between an output vector from said linear
prediction circuit connected to said first code book and a vector obtained by applying
linear prediction to one pulse corresponding to sample times of residual signal vectors
stored in said first code book.
17. An apparatus for speech coding according to claim 13, characterized in that said
impulse generating circuit comprises a second code book storing a plurality of impulses
corresponding to the plurality of residual signal vectors of white noise stored in
said first code book.
18. An apparatus for speech coding according to claim 17, characterized in that the
said second code book stores the orders showing the maximum pulses in the residual
signal vectors of the white noise stored in the said first code book.
19. An apparatus for speech coding according to claim 17, characterized in that the
said impulse generating circuit has an impulse separating circuit which separates
said impulses from the residual signal vectors of the white noise stored in the said
first code book.
20. An apparatus for speech coding according to claim 13, characterized in that in
producing from the outputs of the said first code book and said impulse generating
circuit a reproduced vector through the linear prediction circuit and variable gain
circuit, provision is made of a weighting circuit for controlling said linear prediction
circuit and variable gain circuit and that said weighting circuit is connected to
a pitch correlation calculating circuit which receives as input a pitch prediction
vector obtained by applying linear prediction to a vector of an input speech signal
and a residual signal vector of a preceding frame.