[0001] The present invention relates to a speech synthesizing apparatus for generating synthetic
speech with excellent quality.
[0002] Recently, various kinds of speech synthesizing apparatuses for generating desired
synthetic speech have been developed. For example, as shown in Fig. 1, this kind of
speech synthesizing apparatus usually comprises a vocal-tract-approximation-digital
filter 1 for setting the feature of a speech to be synthesized in accordance with
synthesis parameters; a random noise generator 2; a periodic sound generator 3 for
generating a sound signal at a pitch interval that depends upon the fundamental frequency
of the speech to be synthesized; a selector 4 for selectively supplying output signals
from the generators 2 and 3 to the digital filter depending upon whether the speech
to be synthesized is voiced speech or unvoiced speech; and a control circuit 5 for
respectively supplying synthesis parameters and pitch data to the digital filter 1
and to periodic sound generator 3 in accordance with the speech to be synthesized,
and for supplying sampling pulses-to these circuits. The digital filter 1 receives
a linear predictive coefficient, line spectral pair (LSP) parameter, cepstrum, etc.
as synthesis parameters depending upon the speech synthesizing method, e.g., linear
predictive coding, LSP method, cepstrum method, etc. and generates the speech data
corresponding to the speech to be synthesized. In the case where the speech to be
synthesized is what is called periodic voiced speech, the voiced sound signal such
as an impulse, triangular wave, glottal pulse wave, etc. which is generated from the
periodic sound generator 3 at the pitch interval corresponding to the above-mentioned
pitch data is supplied to the digital filter 1 through the selector 4. On the other
hand, in the case where the speech to be synthesized is unvoiced, a random noise from
the random noise generator 2 is supplied to the digital filter 1 through the selector
4.
[0003] The digital filter 1 filtering-processes the voiced sound signal or random noise
that is selectively supplied through the selector 4 in accordance with the synthesis
parameters from the control circuit 5, thereby making the speech. The synthesis parameter
is updated, for example, at every frame of about 10 msec or at every time interval
synchronized with the pitch period. Since the synthetic speech signal which is generated
from the digital filter 1 is a discrete signal, it is converted to an analog signal
by a D/A converter 6, and thereafter it is supplied to an electric-acoustic converter
7 through a low-pass filter 8 having a cut-off frequency which is less than half of
the frequency of a sampling pulse which is supplied from the control circuit 5.
[0004] In synthesizing voiced speech, the voiced sound signal is substantially equal to
the interval determined by the input pitch data and is generated from the periodic
sound generator 3 synchronously with the sampling pulse. Then it is subjected to the
filtering processing in the digital filter 1 synchronously with the sampling pulse.
For instance, in the case where impulses are used as a voiced sound signal, as shown
in Fig. 2A, the periodic sound generator 3 generates an impulse. The pulses must be
spaced by an integer multiple of the sampling period that is most approximate to the
interval determined by the input pitch data. This impulse is supplied through the
selector 4 to the digital filter 1, where it is subjected to filtering-process, generating
the speech as shown in Fig. 2B.
[0005] Generally, since the compass of the voice of a man is low and its pitch period is
long, the pitch period which is equal to N times the sampling period can be set with
a relatively high degree of accuracy in accordance with the input pitch data. However,
in the case where the compass is high and its pitch period is short and where this
pitch period frequently varies such as in the voice of a woman or child, it is difficult
to approximate the pitch period corresponding to the input pitch data by using N times
the sampling period. Furthermore, it is impossible to smoothly execute the processing
to set the pitch period to have a length which is N times the sampling period for
the variation in pitch period. In such a case, the vibrato component is mixed with
the synthetic speech, causing the sound quality to deteriorate. In addition, such
an incontinuous change of frequency in the synthetic speech is sensed as an unpleasant
sound by audience.
[0006] To solve this kind of problem, a method is considered whereby the interval determined
by the input pitch data is set to have a length which is N times the sampling period
with a high degree of approximation by increasing the frequency of the sampling pulse
which is used. However, in this case, in order to impart the proper synthesis parameter
to the digital filter from the control circuit 5 in accordance with each sampling
pulse, it is required to store a great amount of synthesis parameters into a memory
(not shown) in the control circuit 5, causing the readout control of these synthesis
parameters to be complicated.
[0007] It is an object of the present invention to provide a speech synthesizing apparatus
for generating synthetic speech with excellent quality irrespective of the length
and variation of the pitch period.
[0008] This object is accomplished by a speech synthesizing apparatus comprising a control
circuit for generating clock pulses, synthesis parameters and pitch data corresponding
to a speech to be synthesized; a memory which has stored therein a predetermined number
of sampling data corresponding to a predetermined number of sampling values within
a predetermined time range of a continuous wave which is obtained by developing a
voiced sound signal by use of an interpolation function; a readout control circuit
for sequentially reading out the sampling data of the continuous wave within the predetermined
time range that is synchronized with a pitch period represented by the input pitch
data from the memory in response to the clock pulse; a digital filter for filtering-processing
the sampling data read out synchronously with the clock pulse from that memory in
accordance with the synthesis parameters generated from the control circuit; and a
speech generator circuit for generating speech corresponding to the digital data from
this digital filter.
[0009] In this invention, the sampling values corresponding to the generation timings of
the clock pulse of the continuous wave which is generated within a predetermined time
range synchronously with the pitch period are sequentially generated. Therefore, irrespective
of the length of the pitch period, a voiced sound signal corresponding to the continuous
wave which is always synchronized with the pitch period is supplied to the digital
filter so that a synthesized speech of excellent quality can be obtained.
[0010] This invention can be more fully understood from- the following detailed description
when taken in conjunction with the accompanying drawings, in which:
Fig. 1 is a block diagram showing a conventional speech synthesizing apparatus;
Figs. 2A and 2B are signal waveform diagrams for explaining the operation of the speech
synthesizing apparatus of Fig. 1;
Fig. 3 is a block diagram of a speech synthesizing apparatus of the present invention;
Figs. 4A to 4D are signal waveform diagrams for explaining the operation of the speech
synthesizing apparatus shown in Fig. 3;
Fig. 5 is a flow chart explaining the operation of the voiced sound generator circuit
shown in Fig. 3;
Fig. 6 is a signal waveform diagram explaining the operation of the voiced sound generator
circuit shown in Fig. 3; and
Fig. 7 is a block diagram of the voiced sound generator circuit of a speech synthesizing
apparatus according to another embodiment of the invention.
[0011] Fig. 3 is a block diagram of a speech synthesizing apparatus according to one embodiment
of the present invention. This speech synthesizing apparatus is constituted similarly
to the speech synthesizing apparatus shown in Fig. 1 except that it has a periodic
sound generator circuit 10 in place of the periodic sound generator 3. The periodic
sound generator circuit 10 comprises a central processing circuit 100: a read only
memory (ROM) 101; a random access memory (RAM) 102; an I/O port 103 which receives
the pitch data from the control circuit 5; and an I/O port 104 which supplies the
voiced sound data to the sound selector 4.
[0012] The fundamental operation of the periodic sound generator circuit 10 will be explained
hereinbelow. It is now assumed that the pitch data representative of the pitch period
which is equal to m (integer) times the sampling period T is supplied from the control
circuit 5 and that the periodic sound generator circuit 10 generates the voiced sound
data indicative of the impulse x
0(n) at the n-th sampling timing as shown in Fig. 4A in response to this pitch data.
A continuous wave signal x
a(t) which is obtained by developing this impulse x
0(n) by the interpolation function based on the sampling theorem is given by the following
equation:

[0013] As this interpolation function, it is possible to use, for instance, a Lagrangean
polynomial, a spline function or the like.
[0014] As shown in Fig. 4B, this continuous wave signal x
a(t) is generated over the time interval from -∞ to +∞; however, the signal component
x
b(t) at a predetermined time range around time 0 of the continuous wave signal x
a(t) can be substantially regarded to be equivalent to the continuous wave signal x
a(t). For instance, when using a time window ω(t) such as a square window or a hamming
window or the like which becomes 0 in the ranges of t ≤ -2T and t ≥ 2T and becomes
1 in the range of -2T < t < 2T, the signal component x
b(t) as shown in Fig. 4C is obtained. This signal component x
b(t) is given by the following equation:

[0015] By sampling this signal component x
b(t) with respect to 4N points as shown in Fig. 4D in the time range of -2T < t < 0
and in the time range of 0 < t < 2T, respectively, 4N sampling data SD(-2N) to SD(0)
or SD(0) to SD(2N) are given in each time range. These sampling data SD(i) (i is an
integer in a range of -2N < i < 2
N) are given by the following equation:

[0016] Namely, the sampling data SD(-2N)...SD(0)...SD(2N) correspond to the sampling data
at the sampling points -2N...0...2N in Fig. 4D. In Fig. 4D, although N is set to be
5, N can be set to another value.
[0017] The items of sampling data SD(-2N)...SD(0)...SD(2N) are respectively stored in memory
areas M(-2N)...M(0)... M(2N) of the ROM 101. These memory areas M(-2N)... M(0)...2(2N)
are respectively designated by address data A[0]...A[2N]...A[4N].
[0018] The operation of the periodic sound generator circuit 10 shown in Fig. 3 will now
be explained with reference to the flow chart shown in Fig. 5. It is now assumed that
a pitch period PT of the voiced sound signal that should be synthesized is given by
the following equation:

where Pl and P2 are integers and 0 ≦ P2 < N.
[0019] In the initial state, the contents of the memory areas MRl to MR4 are all cleared.
[0020] When the pitch data indicative of an interval PX is supplied to the I/O port 103
under this state, a check is made synchronously with the sampling pulse to see if
a pitch count data PCD stored in the memory area MR3 is 0 or less in STEP 1. When
it is detected that the pitch count data PCD is 0 or less, deviation time data DT2
stored in the memory area MR2 is then stored in the memory area MRl as deviation time
data DT1. At the same time, new deviation time data DT2 which is given by the following
equation is stored in the memory area MR2:

where R
e{X} is a function representing a value which is equal to an integer portion of X.
This equation (5) denotes that the deviation time data DT2 is obtained by the sum
of the time that is given by the deviation time data DT1, i.e., the time between the
starting time of the pitch period in the present cycle and the leading edge of the
sampling pulse generated simultaneously with or immediately after the start of this
pitch period, and the time difference between the fraction interval (interval corresponding
to P2 x (T/N) in equation (4)) of the pitch interval expressed by the pitch data given
in the next operation cycle and one sampling period T. DT2/T has a value that is equal
to or greater than 0 but less than 2.
[0021] Thereafter in STEP 2, a check is made to see if this deviation time data DT2 is T
or more. When it is detected that DT2/T is less than 1, namely, in the case where
it is detected that the deviation time data DT2 represents the time difference between
the end of pitch period of the present cycle and the sampling pulse generated simultaneously
with or immediately after this end of pitch period, the-pitch count data PCD which
is given by the following equation is written in the memory area MR3:

[0022] On the other hand, in STEP 2, when it is detected that DT2/T is 1 or more, i.e.,
that the deviation time data DT2 indicates the time difference between the end of
the pitch period in the present cycle and the sampling pulse which is thereafter generated
at the second time, the pitch count data PCD which is given by the following equation
is written in the memory area MR3:

[0023] In this case, the value of this deviation time data DT2 which is subtracted by one
sampling period T is written in the memory area MR2 as new deviation time data. After
the pitch count data PCD was written in the memory area MR3 in this way, the contents
of the memory area MR4 are cleared. Further, thereafter, in STEP 3, a check is made
to see if count data CD of the memory area MR4 has a predetermined value Z or less.
This predetermined value Z denotes the amount of sampling data which is read out from
the ROM 101 in each operation cycle and is set to 3 in this embodiment. In STEP 3,
when it is detected that the count data CD has a predetermined value Z or less, one
of the address data A[0]...A[2N]...A[4N] corresponding to the sum of this count data
CD and 1/T of the deviation time data DT1 stored in the memory area MR1 is supplied
to the ROM 101. Corresponding one of the sampling data SD(-2N)...SD(0)...SD(2N) is
read out from this ROM 101 and is supplied to the digital filter 1 through the I/O
port 104 and selector 4. Thereafter, the count data CD in the memory area MR4 is increased
by one count. Further, thereafter, the pitch count data PCD in the memory area MR3
is decreased by one count, and the apparatus stands by until the next sampling pulse
is supplied. On the other hand, in STEP 3, when it is detected that the count data
CD becomes larger than the predetermined value Z, "0" is supplied to the digital filter,
and the pitch count data PCD is decreased by one count without increasing the count
data CD. Then, the apparatus waits for the input of the next sampling pulse. When
the next sampling pulse is supplied in this state, STEP 1 is again executed. When
the pitch count data PCD is determined to be larger than 0 in STEP 1, STEP 2 is executed.
[0024] It is now assumed that the first and second pitch data representative of the pitch
periods PXl and PX2 of 25.4T (Pl = 25, P2 = 2) and 25.2T (Pl = 25, P2 = 1) are supplied
to the CPU 100 through the I/O port 103 in accordance with this sequence.
[0025] The CPU 100 executes STEP 1'in response to the first sampling pulse (time t = 0)
which was input immediately after the first pitch data was received. In this stage,
since the pitch count data PCD stored in the memory area MR3 is 0, the deviation time
data DT2 which is given by the following equation is stored in the memory area MR2:

[0026] Since the deviation time data DT2 of 0.6T stored in the memory area MR2 is smaller
than T, the pitch count data PCD which is given by the following equation is stored
in the memory area MR3:

[0027] Thereafter, the contents of the memory area MR4 are cleared, and it is detected in
STEP 3 that the count data CD is smaller than the predetermined value Z(=3). Due to
this, the address data A[Xl] which is given by the following equation is given to
the ROM 101:

[0028] Due to this, the sampling data SD(-2N) is read out from the memory area M(-2N) of the
ROM 101 and is supplied to the digital filter 1 through the I/O port 104 and selector
4. Next, the content of the memory area MR4 is changed from 0 to 1, and the content
of the memory area MR3 is changed from 26 to 25.
[0029] The CPU 100 .executes STEP 1 in response to the next sampling pulse. Since the pitch
count data PCD is now 25, STEP 2 is executed. Since the count data CD is 1, the address
data A[X2] is given by the following equation:

[0030] Thereafter, the count data CD of the memory area MR4 is increased by one count, while
the count data PC
D of the memory area MR3 is decreased by one count.
[0031] After that, a similar operation is repeatedly executed, so that address data A[10]
and A[15] are supplied to the ROM 101 and the corresponding sampling data are read
out from the ROM 101. After the address data A[15] was read out, the content of the
count data CD becomes 4 and the content of the pitch count data PCD becomes 22. In
the following cycle, it is detected that the count data CD is larger than 3 in STEP
3. Therefore, 0 is supplied to the digital filer, and the pitch count data PCD is
decreased by one count whenever each sampling pulse is supplied. This operation is
executed until this pitch count data PC
D becomes 0.
[0032] As described above, after the first pitch data was input, the address data A[0],
A[5], A[10], and A[15] are - sequentially given to the ROM 101 in response to the
first four sampling pulses, and then the sampling data SD(-10), SD(-5), SD(0), and
SD(5) are sequentially read out from the ROM 101 and are given to the digital filter
1. As shown in Fig. 6, these items of sampling data SD(-10), SD(-5) and SD(5) are
0, while the sampling data SD(0) corresponds to an impulse of a predetermined amplitude
X0.
[0033] The pitch count data CD becomes 0 after the 26th sampling pulse is generated at time
25T. When it is detected in STEP 1 that the pitch count data PCD is 0 or less in response
to the sampling pulse generated at time 26T, the CPU 100 transfers the contents of
the memory area MR2 to the memory area MRl, and the contents DT2 of the memory area
MR2 are updated in accordance with the following equation:

[0034] Since the deviation time data DT2 is larger than T, the deviation time data of 0.4T
is written in the memory area MR2. The pitch count data PCD which is given by the
following equation is stored in the memory area MR3.

[0035] Thereafter, the address data A[X26] is obtained in accordance with the following
equation in a similar manner as mentioned above:

[0036] Thus, the sampling data SD(-2N + 3) is read out from the memory area M(-2N + 3) of
the ROM 101.
[0037] In a similar manner, the address data A[8], A[l3] and A[18] are sequentially generated
whenever the sampling pulses are.supplied at times 27T to 29T. In response to the
address data A[3], A[8], A[13], and A[18], the four items of sampling data corresponding
to the impulses at the sampling points -7, -2, 3, and 8 (where, N = 5) in Fig. 4D
are read out from the ROM 101. Namely, the sampling data corresponding to the four
impulses that are generated at times 26T to 29
T shown in Fig. 6 is supplied to the digital filter 1.
[0038] A similar operation is executed whenever new pitch data is supplied. For instance,
in the next operation cycle, as shown at the right of Fig. 6, the four items of sampling
data corresponding to the impulses at the sampling points -8, -3, 2, and 7 in Fig.
4D are read out from the ROM 101 at the sampling timings 51T to 54T.
[0039] The synthesized impulse of the four impulses, corresponding to the four items of
sampling data that are generated at the sampling timings 26T to 29T in Fig. 6, corresponds
to the impulse having the amplitude of X
0, as indicated by the broken line in the diagram. The impulse is generated at time
27.4T, that is, at a time after the interval of 25.4T from the sampling timing 2T
at which the impulse having the amplitude of x
0 was generated in the first cycle. On the other hand, the synthesized impulse of the
four impulses corresponding to the four items of sampling data which are generated
at the sampling timings 51T to 54T corresponds to the impulse having the amplitude
of X
o which is generated at time 52.6T, that is, at the time after the elapse of the interval
of 25.2T from the time 27.4T.
[0040] As described above, in this embodiment, one or a predetermined number of impulses
which are equivalent to the voiced speech signal that is generated synchronously with
the pitch interval determined by the input pitch data are generated synchronously
with the sampling pulses. Therefore, even if the pitch period is set to be short,
or even if the pitch period rapidly changes, synthetic speech of relatively excellent
quality can be generated.
[0041] Fig. 7 shows a block diagram of a periodic sound generator circuit of a speech synthesizing
apparatus according to another embodiment of the present invention. This periodic
sound generator circuit comprises registers 110 to 112 for respectively storing the
pitch data PX and deviation time data DT1 and DT2; a divider 113 for dividing the
pitch data PX by the sampling period T; a register 114 for storing the output data
of the divider 113; an integer detector 115 for detecting the integer part of the
output data of the register 114; a DT2 calculation circuit 116 for calculating a new
deviation time data DT2 on the basis of the output data of the registers 110 and 111;
a divider 117 for dividing the output data of the calculation circuit 116 by T; a
fraction detector l18 for detecting the fraction part of the output data of the divider
117; and a comparator 119 for comparing the output data of the divider 117 with a
constant 1. The comparator 119 generates an output data "1" when it is detected that
the output data of the divider 117 is equal to or larger than 1. On the other hand,
the comparator 119 generates an output data "0" when such an output data is detected
to be smaller than T. A subtracter 120 subtracts the output data of the comparator
119 from the output data generated from the integer detector 115 and sets the result
into a PCD down-counter L21. This PCD down-counter 121 executes the down-counting
operation in response to the sampling pulse. When the contents of the down-counter
121 become 0, it sets the CD up-counter 122 to 0. The CD up-counter 122 executes the
up-counting operation in response to the sampling pulse. A comparator 123 checks the
output data of the up-counter 122 to see if it is three or less. When it is detected
that the output data is three or less, the comparator 123 generates an output pulse.
An address calculation circuit 124 calculates the address data on the basis of the
output data of the register 111 and of the up-counter 122 in response to the output
pulse from the comparator 123, supplies this calculated address data to a memory 125,
and reads out the corresponding sampling data from the memory 125. The memory 125
stores the sampling data SD(-2N)... SD(0)...SD(2N) as in the ROM 101 shown in Fig.
3.
[0042] When the pitch data representative of the pitch period PX (= Pl x T + P2 x T/N) is
input during the initial state, the data (Pl + P2/N) which was obtained by dividing
the pitch data by T in the divider 113 is stored in the register 114. Therefore, output
data of Pl is generated from the integer detector 115. On the other hand, the DT2
calculation circuit 116 calculates the new deviation time data DT2 in accordance with
equation (5). The output data of the calculation circuit 116 is divided by T in the
divider 117. Thereafter, it is supplied to the fraction detector 118 and comparator
119. After the output data of the fraction detector 118 is increased by T times in
the multiplier 126, it is stored in the DT2 register 112. In addition, since the output
data of the divider 117 is less than that of the first cycle, the output data of 0
is generated from the comparator 119. Thus, the output data of Pl of the integer detector
115 is directly set into the PCD down-counter 121 in the first operation cycle. Although
not shown, in the case where the contents of the PCD down-counter 121 are one or more,
the down-counter 121 supplies inhibition signals to the registers 110 to 112, calculation
circuit 116 and subtracter 120, thereby inhibiting the operations thereof.
[0043] Since the contents of the DT1 register 111 and counter 122 are both 0, the address
data A[0] is generated in the address calculation circuit 124 in accordance with the
following equation.

[0044] The corresponding sampling data is read out from the memory 125 in accordance with
this address data
A[0]. In a similar manner, whenever the sampling pulse is input, the address data A[5],
A[l0] and A[15] are generated from the address calculation circuit 124, while the
corresponding sampling data is read out from the memory 125. In response to the next
sampling pulse, the contents of the CD up-counter 122 become 4, so that a "0" output
signal is generated from the comparator 123. Thus, the address calculation circuit
124 does not execute the calculation.
[0045] On the other hand, the PCD down-counter 121 executes the down-counting operation
until the contents thereof become 0 in response to the sampling pulse. When the contents
of the down-counter 121 become 0, the CD up-counter 122 is set to 0, and at the same
time the registers 110 to 112, calculation circuit 116 and subtracter 120 are set
into the operational states. Therefore, the pitch data is stored in the PX register
110 in response to the next sampling pulse, and at the same time the deviation time
data from the DT2 register 112 is stored in the DTl register 111.
[0046] In a similar manner, the circuit shown in Fig. 7 generates the voiced sound signal
corresponding to the input pitch data in response to the sampling circuit.
[0047] Although the present invention has been described with respect to the embodiments,
the invention is not limited to these embodiments. For example, in the flow chart
shown in Fig. 5, although the determination regarding the generation of the address
data is made by checking the count data CD to see if it is three or less in STEP 3,
it is also possible to check whether or not the following condition is satisfied in
STEP 3:
[0048] 
[0049] Further, it is possible to use the time window which becomes at a 1 level in the
range from -T to T as the time window w(t). Also other memory areas M(-N)...M(0)...M(N)
for storing the sampling data SD(-N)...SD(0)...SD(N) may be provided in the ROM 101.