(19) |
 |
|
(11) |
EP 0 146 470 B1 |
(12) |
EUROPEAN PATENT SPECIFICATION |
(45) |
Mention of the grant of the patent: |
|
21.02.1990 Bulletin 1990/08 |
(22) |
Date of filing: 11.12.1984 |
|
(51) |
International Patent Classification (IPC)5: G06F 3/16 |
|
(54) |
A text to speech system
Text-zu-Sprache-Übersetzungssystem
Système de conversion texte/parole
|
(84) |
Designated Contracting States: |
|
AT BE CH DE FR GB IT LI NL SE |
(30) |
Priority: |
12.12.1983 US 560221
|
(43) |
Date of publication of application: |
|
26.06.1985 Bulletin 1985/26 |
(73) |
Proprietor: DIGITAL EQUIPMENT CORPORATION |
|
Maynard, MA 01754 (US) |
|
(72) |
Inventor: |
|
- Klatt, Dennis Howard
Brookline
MA 02146 (US)
|
(74) |
Representative: Mongrédien, André et al |
|
Les Séquoias
34, rue de Marnes F-92410 Ville d'Avray F-92410 Ville d'Avray (FR) |
(56) |
References cited: :
|
|
|
|
- ELECTRONICS INTERNATIONAL, vol. 56, no. 8, April 1983, pages 133-138, New York, US;
E. BRUCKERT et al.: "Three-tiered software and VLSI aid developmental system to read
text aloud"
- JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, vol. 67, no. 3, March 1980, pages 971-995,
Acoustical Society of America, New York, US; D.H. KLATT: "Software for a cascade/parallel
formant synthesizer"
- ELECTRONICS INTERNATIONAL, vol. 53, no. 8, April 1980, pages 113-118, New York, US;
D.W. WEINRICH: "Speech-synthesis chip borrows human intonation"
- PROCEEDINGS OF THE ICASSP'82, vol. 3 of 3, 3rd-5th May 1982, Paris, FR, pages 1589-1592,
IEEE, New York, US; D.H. KLATT: "The klattalk text-to-speech conversion system"
|
|
|
|
Note: Within nine months from the publication of the mention of the grant of the European
patent, any person may give notice to the European Patent Office of opposition to
the European patent
granted. Notice of opposition shall be filed in a written reasoned statement. It shall
not be deemed to
have been filed until the opposition fee has been paid. (Art. 99(1) European Patent
Convention).
|
Background
[0001] There are many text to speech devices in the prior art. As can be verified in the
literature, related to the prior art, it has been generally accepted that since the
energy of typical human speech is distributed over a frequency spectrum of 5,000 hertz,
a sampling rate of 10,000 samples per second (or twice the upper frequency value of
the accepted human speech frequency spectrum) provides sufficient points, or ordinate
lengths, to generate an accurate analog waveform to represent a spoken version of
the text. In fact such sampling does provide an analog waveform to represent the spoken
version of the text, but if the imitated speaker is a female, with a relatively high
pitched voice, then the imitation speech generated by prior art devices is a poor
quality. This is particularly the case in the system disclosed in the publication
"Electronics International", vol. 56, no. 8, April 1983, Pages 133 to 138, by E. Bruckert
et al, "Three-tiered software and VLSI aid developmental system to read text aloud".
[0002] It is well understood in the speech simulation art that the sounds which are developed
by opening and closing human vocal chords, (called voiced sounds as compared to asperation
sounds and frication sounds) have a fundamental frequency in the range of 50 cps to
400 cps. The speech of a typical female having a somewhat high pitched voice, in all
probability, emanates, at least in part, from vocal chords opening and closing with
a frequency of somewhat beteen 160 cps to 400 cps. In considering the simulation of
female speech, I have found that if a digitized glottal waveform, which is to be ultimately
transformed into an analog signal, is sampled (for ultimate transformation into an
analog signal) at the traditional rate of 10,000 samples per second and that waveform
has been developed to provide a major component in an imitation of female speech,
the resulting female speech is of poor qualiity. I have further found that if the
digitized, glottal waveform is generated so as to provide enough information (temporal
accuracy in specification of fundamental frequency) to provide 40,000 samples per
second, such a waveform provides the basis for improving the quality of the female
speech being generated. Since the digital signal processor, used to generate the digitized
glottal waveform, is limited in its ability to perform digital filtering at samples
rates above 10,000 samples per second, the digitized glottal waveform (having information
sufficiency to provide 40,000 samples per second), must be down sampled to the rate
of 10,000 samples per second. In order to preserve some of the advantages of increased
information, the present system low pass filters the waveform to remove high frequency
signal components and to provide a desirable averaging operation before sampling at
the lower rate. Accordingly, the system provides the resulting waveform at 10,000
samples per second to be combined by software with waveforms from other sound sources.
The down sampled waveform nonetheless has been the basis for very much improved quality
of the generated female speech and slightly improved quality of the male speech.
Summary
[0003] The present system includes a microprocessor which is adapted to receive ASCII signals
from either a main computer through a UART, or the like or from a local console. The
microprocessor is programmed in accordance with Hunnicutt rules, whereby the ASCII
signals representing text expressions are transformed into phonemic sequences. The
microprocessor is programmed to generate, in a preferred embodiment, some 18 parameters.
The parameters, in a preferred embodiment, are 16 bits in length, which are computed
every 6.4 milliseconds and represent such speech qualities as voicing source amplitude,
nasal zero frequency, first formant frequency, etc. The parameter values are generated
through a program which takes into account the arrangement of the phonemes and the
phonemes per se. The parameter values are then transmitted to a high speed digital
signal processor. In the high speed digital signal processor a set of equations are
disposed in memory and a program is stored by which the parameter values control the
additions and multiplications required to realize the signal transformations implied
by the equations. the simulation of the equations provides a dititized glottal waveform,
i.e. a model of a glottal pulse. Because the parameters are generated with factors
that represent a vocal chord operation, to a high degree of temporal accuracy, and
because of other factors commensurate with that consideration, the digitized glottal
waveform, generated in the high speed digital signal processor, includes sufficient
information to provide 40,000 samples per second. It should be understood that any
sampling rate greater than 20,000 samples per second will give improved results but
I have found 40,000 samples per second to provide excellent results. However, since
the maximum sampling rate available for resonance filtering (within reasonable cost
factor restraints) is on the order of 10,000 samples per second, the system employs
two accomodating steps. First the digitized glottal waveform is sub- jectto a programmed
low pass filtering operation. In a preferred embodiment such a low pass filtering
operation removes signal components which exceed a frequency of 5,000 hertz. This
of course reduces the information to be retained but removes information which the
system does not want to represent. In addition, I have found that the low pass filtering
operation provides a certain amount of desirable averaging of available point, or
ordinate, values. After the digitized glottal waveform has been low pass filtered,
the signals are down sampled at a rate of 10,000 samples per second. It should be
understood that if proper equipment is employed, roughly the same results can be attained
at down sampling rates in the range of 6800 to 15,000 samples per second. I have found
that even though the sampling rate is the same as the traditional rate, the fact that
the originally developed digitized glottal waveform provided temporal accuracy in
accordance with 40,000 samples per second enables the ultimate digitized glottal waveform
to be combined with other sound source digitized waveforms and transformed into an
improved analog waveform (that retains this temporal precision) and hence an improved
speech experience.
[0004] The invention is mainly directed to a digital processor for use in the synthesis
of speech from digitized text as defined in Claim 1.
[0005] The invention is also directed to a system having a digital processor for transforming
digital signals representing text into an analog waveform signal representing speech
as defined in Claim 7.
[0006] The objects and features of the present invention will be better understood in view
of the following description taken in conjunction with the drawing.
[0007] There are many text to speech devices and publications related thereto. For instance
my publication "Software For A Cascade/Parallel Formant Synthesizer" published March
1980 by the Acoustical Society of America; and my publication "A Text to. Speech Conversion
System", published in the Proceedings Office Automation Conference, March 1981; and
my publication 'Review of the Science and Technology of Speech Synthesis", published
by National Academy Press in 1982; and the aforementioned Bruckert et al publication;
and in particular all the publications and bibliographies referred to therein provide
a broad review of the text to speech conversion art and many of the concepts with
which I deal in this description.
[0008] It is well understood in the speech analysis art that sounds created by the opening
and closing of human vocal chords are sounds which have a fundamental frequency in
the range of 50 cps to 400 cps. Indeed the opening and closing of the vocal chords
may operate at frequencies outside of that range, but in general the frequency range
of 50 cps to 400 cps is considered appropriate. In the prior art speech simulation
devices, a great deal of effort has been spent in developing hardware and software
to build quality into the end product, namely the speech imitation. We have developed
difference equations by which we can model the vocal tract; and we have developed
software an hardware by which we can separately simulate different sound sources such
as voicing, asperation, and friction. However, through all of this effort little attention
has been paid to the problem of the quality of simulated female speech as compared
with the quality of simulated male speech and possibly the quality of all speech sources
in between.
[0009] It is generally accepted that the vocal chords of a female open and close with a
fequency in the range of 160 cps to 400 cps. Accordingly if we develop a digitized
glottal waveform having a traditional sampling rate of 10,000 samples per second,
we find that we have approximately twenty five information samples per period between
vocal chord closings. Twenty five samples is insufficient to include certain features
of female speech which when present provide a good quality imitation. Accordingly
in my present system, I have increased the information available, which in turn includes
the heretofore absent features. While I have continued with the traditional sampling
rate to provide imitation speech, the imitated speech from my system shows improved
quality in the case of female speech and some improvement in the imitation of the
male speech. It is to those improvements that my present invention is directed.
[0010] Consider the drawing. In the drawing there is shown a microprocessor 11, into which
there are fed ASCII coded alphabetical letters. In a preferred embodiment, (a system
called DECtalk, produced by Digital Equipment Corporation), the text (which may be
displayed on a CRT) is transformed into speech. As can be seen in the drawing, ASCII
signals are transmitted over channel 13 to the microprocessor 11 and thereat the ASCII
coded signals are operated on by a stored program means 15 in the microprocessor 11.
In a preferred embodiment, the micro-processor 11 is a model 68000 manufactured by
Motorola Corporation. The stored prgram means 15 includes a set of values generated
in accordance with the Hunnicutt rules, the details of which are not available because
such a program is licensed under an agreement of confidentiality. However the program
is available under license to the public from the Hunnicutt company. The results and
the use of the results is well understood in the speech analysis art and the program
is not per se basic to this invention. Other programs which transform coded text letters
into phonemic -expressions can be used.
[0011] The microprocessor 11 is further programmed by program mean 17 to use the phenemic
expressions in the generation of a plurality of parameters. The parameter values are
composed of 16 bits and in their generation there is taken into account the peculiarities
of phonemes and the relationship of the phonemes with respect to one another. Rules
for generating the parameters can be found in "Speech Synthesis by Rule", by J. N.
Holmes, I. Mattingly and J. Shearme, published in Language and Speech, Vol. 7, (1964).
The parameters can vary from one embodiment to another, depending upon the detail
to which the published rules are followed or the altering thereof in view of empirical
considerations. the generation of the parameters is not basic to the present invention;
the use of the rules are well understood by those skilled in the art; and in view
of the publications mentioned above, no further discussion is deemed necessary.
[0012] In a preferred embodiment the parameters are generated every 6.4 ms and transmitted
from the microprocessor 11 to the FIFO memory 19. The FIFO memory 19 isolates the
high speed digital signal processor 21 from the relatively slower microprocessor 11.
In a preferred embodiment the FIFO memory is a model 74LS224 manufactured by Texas
Instruments Corp. It should be understood that other forms of memory, or isolation
circuitry, could be used.
[0013] The parameter expressions are transmitted to the high speed digital signal processor
21, whereat they are used through the program means 20 to control additions and multiplications
in accordance with programmed simulations of certain difference equations. In a preferred
embodiment the DSP 21 is a model 32010 manufactured by Texas Instruments Corporation.
It is understood in the art that certain difference equations can be simulated to
provide a model of the vocal tract. Those equations and the programmed routines to
compute a relationship for those equations are described in my publication "Software
For A Cascade/Parallel Formant Synthesizer" published in March, 1980, by the Acoustical
Society of America. Since the difference equations, per se, are not fundamental to
the present invention, no further discussion thereof is deemed necessary. The output
of the equation simulation program is well understood and it should be recognized
that it represents a digitized glottal waveform representative of some text. I have
determined that the digitized glottal waveform generated, (under the conditions of
the present discussion), should be generated to provide enough information, i.e. enough
16 bit samples, to enable a sampling rate of 40,000 samples per second. I have determined
that 40,000 ordinate values, or 40,000 points, can provide an analog waveform signal
which includes speech features heretofore not generated in imitating human speech.
The foregoing is particularly true where the voiced sounds are generated from vocal
chord operations at the high end of vocal chord frequency range; such human speech
being that which is typically identified with a female. However, since the high speed
digital signal processor 21 is limited in its total computation power to sampling
and digital filtering at only 10,000 samples per second (and I know of no better sampling
rate by an equipment at comparable costs) the digitized glottal waveform with increased
information must be sampled at a slower rate, i.e. downsampled.
[0014] Steps are taken to maintain the advantages of the increased information, i.e. the
increased number of plottable points, or ordinate values, while nonetheless sampling
at the traditional and slower sample rate. The system provides a second program means,
23, in the digital signal processor, which effects a low pass filter operation on
the digitized glottal waveform. The rules and program steps necessary to software
effect a low pass filter operation are found in the publication "Digital Signal Processing"
by Oppenheim and Schafer, published by Prentice Hall, 1975. The technique of low pass
filtering a digitized waveform is well understood in the art and is not, per se, basic
to the present invention. Accordingly no further detailed discussion of the programmed
low pass filter operation is deemed necessary. When the digitized glottal waveform
has been subjected to the low pass filtering operation, there results a digitized
glottal waveform which has had certain signal components, whose frequencies exceed
a certin threshold, removed. In a preferred embodiment signal components whose frequencies
exceed 5000 hertz are removed. Other thresholds could be used.
[0015] I have discovered that this low pass filtering operation performs certain averaging
functions and such averaging has proven to be useful in the end product. It should
also be noted that since the low pass filtering operation has removed certain signal
components, the amount of information retained has been reduced but the information
removed is information that the system does not want to represent. Hence the value
of the information remaining in the makeup of the digitized glottal waveform is enhanced.
[0016] The system next provides a third program means 25, which includes a program to select
and transmit every fourth sample, i.e. every fourth group of 16 bits. The digitized
glottal waveform is combined, in the waveform generator program means 20, with other
sound source waveforms. Thereafter under the guidance of a fifth program 28 the combined
signal is digital resonance filtered to add peaks to the combined signal. Finally
the combined digitized waveform is transmitted to the digital to analog converter
27. In a preferred embodiment the digital to analog converter is a mode AD7541 manufactured
by the Analog Devices Corporation.
[0017] In the digital to analog converter 27, the combined waveform is transformed into
an analog waveform signal. It is well understood that in the transformation of digital
signals to analog signals alias signals are always generated. The system employs an
anti-aliasing filter 29 to remove alias signals, i.e. signals with frequencies in
excess of 5000 hertz. The use of anti-aliasing devices is well understood and no further
discussion is necessary.
[0018] Finally the combined waveform, now in an analog version, is transmitted to the speaker
31 whereat it excites the speaker to sound out the text in good quality, good imitation
speech.
[0019] In a preferred embodiment the Motorola 68000 microprocessor is used because it has
a 10- megahertz clock and, with 24 bit addressing, can address 16 megabytes of memory.
The digital signal processor selected is a Texas Instruments TM 32010 because of its
capability to execute fast mathematical computations. The memory means employed with
the 68000 microprocessor, in a preferred embodiment, consists of 256k bytes of ROM
and 48k bytes of RAM.
1. A digital processor for use in the synthesis of speech from digitized text, characterized
by:
(a) means for receiving first digital signals representing a plurality of speech parameters;
(b) means for generating second digital signals representing a first predetermined
number of samples per second of a first digitized glottal waveform as a function of
said first digital signals;
(c) means for low pass filtering said second digital signals to form third digital
signals representing a second digitized glottal waveform with frequencies above a
predetermined threshold frequency removed; and
(d) means for down-sampling said third digital signals to form fourth digital signals
representing a second predetermimed number of samples per second of a third digitized
glottal waveform.
wherein said first predetermined number of samples per second is greater than twice
said predetermined threshold frequency.
2. The digital processor as defined in claim 1, characterized in that said means for
generating second digital signals comprises a first computer program, said means for
low pass filtering comprises a second computer program, said means for down-sampling
comprises a third computer program, and said digital processor further comprises means
for storing said first, second and third computer programs.
3. The digital processor as defined in claim 1, characterized in that said first predetermined
number is greater than said second predetermined number by at least a factor of 2.
4. The digital processor as defined in claim 3, characterized in that said first predetermined
number equals at least 20,000 samples per second and said second predetermined number
equals about 10,000 samples per second.
5. The digital processor as defined in claim 4, characterized in that said first predetermined
number equals about 40,000 samples per second, and said means for down-sampling selects
and transmits every fourth sample of said third digital signals.
6. The digital processor as defined in claim 1, characterized in that said means for
low pass filtering removes components from said second digital signals having frequencies
greater than 5,000 hertz.
7. A system having a digital processor as defined in claim 1 for transforming fifth
digital signals representing text into an analog waveform signal representing speech,
characterized in that said system comprises first processing means for transforming
said fifth digital signals into said first digital signals representing a plurality
of speech parameters, second processing means for forming sixth digital signals representing
a digitized combined waveform from at least said fourth digital signals and seventh
digital signals representing digitized sound waveforms other than a digitized glottal
waveform, and means for converting said sixth digital signals into said analog waveform
signal.
8. The system as defined in claim 7, characterized in that said means for generating
second digital signals comprises a first computer program, said means for low pass
filtering comprises a second computer program, said means for down-sampling comprises
a third computer program, and said digital processor further comprises means for storing
said first, second and third computer programs.
9. The system as defined in claim 7, characterized in that said first predetermined
number is greater than said second predetermined number by at least a factor of 2.
10. The system as defined in claim 9, characterized in that said first predetermined
number equals at least 20,000 samples per second and said second predetermined number
equals about 10,000 samples per second.
11. The system as defined in claim 10, characterized in that said first predetermined
number equals about 40,000 samples per second, and said means for down-sampling selects
and transmits every fourth sample of said third digital signals.
12. The system as defined in claim 7, characterized in that said means for low pass
filtering removes components from said second digital signals having frequencies greater
than 5,000 hertz.
1. Digitaler Prozessor für eine Sprachsynthese aus digitalisiertem Text, gekennzeichnet
durch:
(a) Einrichtungen zum Empfangen erster digitaler Signale, die eine Vielzahl von Sprachparametern
repräsentieren;
(b) Einrichtungen zum Erzeugen zweiter digitaler Signale, die eine erste, vorgegebene
Anzahl von Stichproben pro Sekunde einer ersten digitalisierten, in der Stimmritze
gebildeten Wellenform als Funktion der ersten digitalen Signale repräsentieren;
(c) Einrichtungen zum Tiefpaßfiltern der genannten zweiten digitalen Signale, um dritte
digitale Signale zu bilden, die eine zweite digitalisierte, in der Stimmritze gebildete
Wellenform repräsentieren, wobei Frequenzen oberhalb einer vorgegebenen Schwellwert-Frequenz
entfernt sind; und
(d) Einrichtungen zum Abwärtsabtasten der genannten dritten digitalen Signale, um
vierte digitale Signale zu bilden, die eine zweite vorgegebene Anzahl von Stichproben
pro Sekunde einer dritten digitalisierten, in der Stimmritze gebildeten Wellenform
repräsentieren, wobei die genannte erste vorgegebene Anzahl von Stichproben pro Sekunde
größer ist als das Doppelte der genannten vorgegebenen Schwellenwert-Frequenz.
2. Digitaler Prozessor gemäß Anspruch 1, dadurch gekennzeichnet, daß die genannten
Einrichtungen zum Erzeugen zweiter digitaler Signale ein erstes Computerprogramm aufweisen,
daß die genannten Einrichtungen zum Tiefpaßfiltern ein zweites Computerprogramm aufweisen,
daß die genannten Einrichtungen zum Abwärtsabtasten ein drittes Computerprogramm aufweisen
und daß der genannte digitale Prozessor weiterhin Einrichtungen aufweist zum Speichern
des ersten, zweiten und dritten Computerprogramms.
3. Digitaler Prozessor gemäß Anspruch 1, dadurch gekennzeichnet, daß die genannte
erste vorgegebene Anzahl um zumindest einen Faktor zwei größer ist als die zweite
vorgegebene Anzahl.
4. Digitaler Prozessor gemäß Anspruch 3, dadurch gekennzeichnet, daß die genannte
erste vorgegebene Anzahl zumindest 20.000 Stichproben pro Sekunde entspricht und die
genannte zweite vorgegebene Anzahl etwa 10,000 Stichproben pro Sekunde entspricht.
5. Digitaler Prozessor gemäß Anspruch 4, dadurch gekennzeichnet, daß die genannte
erste vorgegebene Anzahl etwa 40,000 Stichproben pro Sekunde entspricht und das die
genannte Einrichtung zum Abwärtsabtasten jede vierte Stichprobe des genannten dritten
digitalen Signals auswählt und überträgt.
6. Digitaler Prozessor gemäß Anspruch 1, dadurch gekennzeichnet, daß die genannte
Einrichtung zum Tiefpaßfiltern solche Komponenten aus den genannten zweiten digitalen
Signalen ausfiltert, welche Frequenzen größer als 5,000 Hertz aufweisen.
7. Ein System mit einem digitalem Prozessor gemäß Anspruch 1 zum Transformieren fünfter
digitaler Signale, die Text repräsentieren, in eine analoges Wellenforni-Signal, welches
Sprache repräsentiert, dadurch gekennzeichnet, daß das genannte System einen ersten
Prozessor aufweist zum Transformieren der genannten fünften digitalen Signale in die
genannten ersten digitalen Signale, welche eine Vielzahl von Sprechparametern repräsentieren,
sowie einen zweiten Prozessor zum Bilden sechster Signale, welche ein digitalisierte
kombinierte Wellenform aus zumindest den genannten vierten digitalen Signalen und
siebten digitalen repräsentieren, wobei die siebten digitalen Signale eine digitalisierte
Schall-Wellenform repräsentieren, die verschieden ist von einer digitalisierten, in
der Sprachritze geformten Wellenform, und mit Einrichtungen zum Umwandeln der genannten
sechsten digitalen Signale in das genannte analoge Wellenform-Signal.
8. System nach Anspruch 7, dadurch gekennzeichnet, daß die genannten Einrichtungen
zum Erzeugen zweiter digitaler Signale ein erstes Computerprogramm aufweisen, die
genannten Einrichtungen zum Tiefpaßfiltern ein zweites Computerprogramm aufweisen,
die genannten Einrichtungen zum Abwärtsabtasten ein drittes Computerprogramm aufweisen,
und daß der genannte digitale Prozessor weiterhin Einrichtungen zum Speichern des
ersten, zweiten und dritten Computerprogramms aufweist.
9. System nach Anspruch 7, dadurch gekennzeichnet, daß die genannte erste vortgegebene
Anzahl um einen Faktor von zumindest zwei größer ist als die genannte zweite vorgegebene
Anzahl.
10. System nach Anspruch 9, dadurch gekennzeichnet, daß die genannte erste vorgegebene
Anzahl zumindest 20,000 Stichproben pro Sekunde entspricht und die genannte zweite
vorgegebene Anzahl etwa 10,000 Stichproben pro Sekunde entspricht.
11. System gemäß Anspruch 10, dadurch gekennzeichnet, daß die genannte erste vorgegebene
Anzahl etwa 40,000 Stichproben pro Sekunde entspricht und daß die genannte Einrichtung
zum Abwärtsabtasten jede vierte Stichprobe des genannten dritten digitalen Signals
auswählt und überträgt.
12. System nach Anspruch 7, dadurch gekennzeichnet, daß die genannten Einrichtungen
zum Tiefpaßfiltern solche Komponenten aus dem genannten zweiten digitalen Signal entfernen,
die Frequenzen größer als 5,000 Hertz aufweisen.
1. Un processeur numérique utilisable dans la synthèse de la parole à partir d'un
texte numérisé, caractérisé par:
(a) des moyens pour recevoir des premiers signaux numériques représentant une pluralité
de paramètres de la parole;
(b) des moyens pour générer des seconds signaux numériques représentant un premier
nombre prédéterminé d'échantillons par seconde d'une première forme d'onde glottale
numérisée, en fonction desdits premiers signaux numériques;
(c) des moyens pour filtrer avec un filtre passe-bas lesdits seconds signaux numériques
pour former des troisièmes signaux représentant une seconde forme d'onde glottale
numérisée pour laquelle les fréquences excédant une fréquence seuil prédéterminée
sont éliminées; et
(d) des moyens pour sous-échantillonner lesdits troisièmes signaux numériques pour
former les quatrièmes signaux numériques représentant un second nombre prédéterminé
d'échantillons par seconde d'une troisième forme d'onde glottale numérisée, dans lequel
ledit premier nombre prédéterminé d'échantillons par seconde est supérieur à deux
fois ladite fréquence seuil prédéterminée.
2. Le processeur numérique tel que défini à la revendication 1, caractérisé en ce
que lesdits moyens pour générer les seconds signaux numériques comprennent un premier
programme de calcul, lesdits moyens de filtrage par filtre passe-bas comprennent un
second programme de calcul, lesdits moyens de sous-échantillonnage comprennent un
troisième programme de calcul, et ledit processeur numérique comprend de plus des
moyens pour enregistrer lesdits premier, second et troisième programmes de calcul.
3. Le processur numérique tel que défini à la revendication 1, caractérisé en ce que
ledit premier nombre prédéterminé est plus grand que ledit second nombre prédéterminé,
le rapport étant au moins un facteur de 2.
4. Le processeur numérique tel que défini à la revendication 3, caractérisé en ce
que ledit premier nombre prédéterminé est au moins égal à 20000 échantillons par seconde,
et ledit second nombre prédéterminé est au moins égal à environ 10000 échantillons
par seconde.
5. Le processeur numérique tel que défini à la revendication 4, caractérisé en ce
que ledit premier nombre prédéterminé est au moins égal à 40000 échantillons par seconde,
et lesdits moyens de souséchantillonnage sélectionnent et transmettent chaque quatrième
échantillon desdits troisièmes signaux numériques.
6. Le processeur numérique tel que défini à la revendication 1, caractérisé en ce
que lesdits moyens de filtrage par le filtre passe-bas éliminent les composants desdits
seconds signaux numériques, dont les fréquences sont supérieures à 5000 hertz.
7. Un système ayant un processeur numérique tel que défini à la revendication 1, pour
transformer les cinquièmes signaux numériques représentant du texte en un signal de
forme d'anode analogique représentant de la parole, caractérisé en ce que ledit système
comprend des premiers moyens de traitement pour transformer lesdits cinquièmes signaux
numériques en lesdits premiers signaux numériques représentant une pluralité de paramètes
de la parole, des seconds moyens de traitement pour former les sixièmes signaux numériques
représentant une forme d'onde numérisée combinée à partir au moins desdits quatrièmes
signaux numériques et des septièmes signaux numériques représentant des formes d'ondes
de sons numérisés autres qu'une forme d'onde glottale numérisée, et des moyens pour
convertir lesdits sixièmes signaux numériques en ladite forme d'onde de signal analogique.
8. Le système tel que défini à la revendication 7, caractérisé en ce que lesdits moyens
pour générer les seconds signaux numériques comprennent un premier programme de calcul,
lesdits moyens de filtrage par filtre passe-bas comprennent un second programme de
calcul, lesdits moyens de sous-échantillonage comrennent un troisième programme de
calcul, et ledit processeur numérique comprend de plus des moyens pour enregistrer
lesdits premier, second et troisième programmes de calcul.
9. Le système tel que défini à la revendication 7, caractérisé en ce que ledit premier
nombre prédéterminé est plus grand que ledit second nombre prédéterminé, le rapport
étant au moins un facteur de 2.
10. Le système tel que défini à la revendication 9, caractérisé en ce que ledit premier
nombre prédéterminé est au moins étal à 20000 échantillons par seconde, et ledit second
nombre prédéterminé est égal à environ 10000 échantillons par seconde.
11. Le système tel que défini à la revendication 10, caractérisé en ce que ledit premier
nombre prédéterminé est égal à environ 40000 échantillons par seconde, et lesdits
moyens de sous- échantillonnage sélectionnent et transmettent chaque quatrième échantillon
desdits troisièmes signaux numériques.
12. Le système tel que défini à la revendication 7, caractérisé en ce que lesdits
moyens pour filtrer par le filtre passe-bas éliminent les composants desdits seconds
signaux numériques, dont la fréquence est supérieure à 5000 hertz.
