Speech recognition system - Patent 0423800

(19)

(11)

EP 0 423 800 A2

(12)	EUROPEAN PATENT APPLICATION

(43)	Date of publication:
	24.04.1991 Bulletin 1991/17

(21)	Application number: 90120020.4

(22)	Date of filing: 18.10.1990

(51)	International Patent Classification (IPC)⁵: G10L 5/06, G10L 3/00

(84)	Designated Contracting States:
	DE FR GB

(30)

Priority:

19.10.1989 JP 272846/89

(71)	Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.
	Kadoma-shi, Osaka-fu, 571 (JP)

(72)	Inventors:
	Morii, Shuji Shibuya-ku, Tokyo (JP) Hiraoka, Shoji Miyamae-ku, Kawasaki (JP)

(74)	Representative: Pellmann, Hans-Bernd, Dipl.-Ing. et al
	Patentanwaltsbüro Tiedtke-Bühling-Kinne & Partner Bavariaring 4 80336 München 80336 München (DE)

(56)

References cited: :

(54)	Speech recognition system

(57) A speech recognition system for recognizing a speech to be inputted so as to operate a given apparatus in accordance with the recoginized speech. The speech recognition system includes a phoneme recognizing section responsive to input of a speech from an external device for extracting phonemes constituting the inputted speech to output them as a symbol train. The symbol train from the phoneme recognizing section is supplied to a coder for coding the symbol train and outputting the coded symbol train through a transmission line to a decoder for decoding the coded symbol train to restoring it to the original symbol train. The decoded symbol train is inputted to a word and sentence recognizing section which in turn recognizes a word or a sentence on the basis of the decoded symbol train using a word dictionary.

Description

BACKGROUND OF THE INVENTION

[0001] The present invention relates generally to speech recognition systems, and more particularly to such a speech recognition system for operation of an apparatus through speech recognition.

[0002] As a system for operating an apparatus placed at a remote side through a speech is known a banking service system as disclosed in "Electronic Technique, Vol. 25, No. 1, P 43 to 46, for example. As illustrated in Fig. 1, this system is arranged such that a speech inputted through a telephone set 51 or the like is transmitted through a public line 52 or the like up to a speech recognition apparatus 53 at the central processing equipment side where the inputted speech is recogized and the recoginition result is supplied to a task control apparatus. Another approach involves, as illustrated in Fig. 2, recognizing an inputted speech with a speech recognition apparatus 62 incorporated into a user side terminal unit 61 and coding the recognition result with a coder 63 built in the same terminal unit 61, the coded signal being supplied through a transmission line 64 to a decoder 65 and then supplied to a task control apparatus 66 placed at the central processing equipment side.

[0003] There are problems which arise with such types of speech recognition systems, however, in that the former is affected by the transmission property of the telephone line 52 such as the frequency range limitation of the user's speech and further affected by the line noisees introduced during the transmission so as to generally reduce the recognition performance of the speech recognition apparatus 53, and the latter prevents the reduction of the speech recognition rate due to the transmission because of no transmission of the speech itself through the telephone line 52 or the like, but having extreme difficulty to perform change of the vocabulary to be recognized and change of the operating procedure at the central processing equipment side to result in lack of flexibility concurrently with increasing the cost of the terminal side apparatus because the speech recognition apparatus 62 is disposed at the user's terminal unit 61 side.

SUMMARY OF THE INVENTION

[0004] It is therefore an object of the present invention to provide a speech recognition system which is capable of improving the speech recognition rate by preventing the affection of the line noises and others and further freely setting the vocabulary to be recognized and the operating procedure at the central processing equipment side to provide flexibility.

[0005] In accordance with the present invention, there is provided a speech recognition system comprising: means responsive to an input of a speech from an external device for recognizing phonemes or syllables constituting the inputted speech to output them as a symbol sequence; means coupled to the extracting means for coding the symbol train and outputting the coded symbol train; means for transmitting the coded symbol train; means coupled through the transmitting means to the coding means for decoding the coded symbol train to restoring it to the original symbol train; and means responsive to the decoded symbol train from the decoding means for recognizing a word or a sentence on the basis of the decoded symbol train.

BRIEF DESCRIPTION OF THE DRAWINGS

[0006] The present invention will be described in further detail with reference to the accompanying drawings, in which:

Figs. 1 and 2 are block diagrams showing conventional speech recognition systems;

Fig. 3 is a block diagram showing a speech recognition system according to a first embodiment of the present invention;

Fig. 4 is an illustration for describing one example of a word dictionary to be used in the Fig. 3 speech recognition system;

Fig. 5 is a graphic illustration for describing an inputted speech signal and a phoneme recognition; and

Fig. 6 is a block diagram showing a speech recognition system according to a second embodiment of this invention.

DETAILED DESCRIPTION OF THE INVENTION

[0007] Referring now to Fig. 3, there is illustrated a speech recognition system according.to a first embodiment of the present invention. Although the speech recognition is generally performed by using words, syllables, phonemes and others as basic units for recognition, in this invention, the syllables, phonemes or the like, which are units to allow the expression of a sentence and a word, are used as the basic units. The embodiment will be described in terms of one case of using phonemes which are minimum and indispensable phonological units for description of a given speech.

[0008] In Fig. 3, the speech recognition system illustrated at numeral 1 comprises a phoneme recognizing section 2 which recognizes an inputted speech and convertsit into a phoneme-symbol sequence, each phoneme being a basic unit of the inputted speech. The phoneme-symbol train is supplied to a coder 4 to be coded. The coded phoneme-symbol train is supplied through a transmission line 5 to a decoder 6 which in turn decodes the coded phoneme-symbol train. The decoded phoneme-symbol train is supplied to a word and sentence recognizing section 7 for recognizing a word and a sentence making up the speech. The word and senstence recognizing section 7 is also coupled to a word dictionary 8 storing a phoneme notation. The word and sentence recognizing section 7 performs the matching between the phoneme-symbol train outputted from the decoder 6 and the phoneme notation stored in the word dictionary 8. The output of the word and sentence recognizing section 7 is supplied to a task control apparatus 2 which performs applications of the banking service, information retrieval and others. In this embodiment, the task control apparatus 2 gives instructions for the speech recognition system 1, for example, selection a different dictionary to change a word to be recognized (one dictionary has a group of words which can be recognized with one speech and the word to be recognized is changeable by selection of one of dictionaries), and start of the recognition.

[0009] Here, the phoneme recognizing section 3 and the coder 4 are placed at the user side and the decoder 6, the word and sentence recognizing section 7 and the word dictionary 8 are placed at the central processing equipment side which is remotely disposed from the user side.

[0010] Fig. 4 shows one example of the contents of the word dictionary 8 which are mentioned with phoneme symbols. In Fig. 4, the "word" column shows Japanese Kanji (Chinese) characters corresponding to, the respective word dictionary items, but not used for the actual recognition. With this arrangement, an operation will be described hereinbelow. The following table 1 shows the kinds of the phonemes of the japanese language used.

[0011] A speech is inputted as an electric signal through a microphone, a handset and or the like to the phoneme recognizing section 3 in order to recognize the uttered phoneme. For example, in response to utterance of "SHIBUYA", the speech signal takes a signal as illustrated by (a) in Fig. 5 and, as obvious from the above-mentioned table 1, the phoneme symbol train becomes "sibuja" as illustrated by (b) in Fig. 5. According to the current speech recognition technique, it is impossible to obtain 100% phoneme recognition rate, and hence the phoneme train contains errors. The recognized phoneme symbol train is supplied to the coder 4 so as to be coded and outputted in order to be suitable for the transmission line 5. In the case that the transmission line 5 is a general public telephone line, the coding is performed in accordance with the frequency shift keying (FSK) system, the phase shift keying (PSK) system or the like. It is also appropriate to use a digital line such as a bus-structure network (Ethernet) as the transmission line 5. The decoder 6 performs a reverse process of the coding with respect to the signal transmitted through the transmission line 5 so as to restore it to the original phoneme symbol train. The word and sentence recognizing section 7 performs a matching of the phoneme symbol train from the decoder 7 with the phonemes of the respective dictionary items in'the word dictionary 8 illustrated in Fig. 4. In the case of word recognition, the word number for the word most similar thereto, i.e., "001 " in this embodiment, is outputted as the recognition result to the task control apparatus 2. Here, the word dictionary 8 can be constructed with a plurality of groups so as to be selectively used for every speech recognition process in order to limit the vocabulary. In the case of sentence recognition, it is required to additionally use syntax information, word-semantic information and others.

[0012] A speech recognition system according to a second embodiment of this invention will be described hereinbelow with reference to Fig. 6, where parts corresponding to those in Fig. 3 are marked with the same numerals. In Fig. 6, the speech recognition system indicated by a dotted line and illustrated at numeral 1 is included in a dialogue or interaction system comprising a terminal apparatus 11 and a central apparatus 12 which are coupled through a transmission line 5 to each other. The speech recognition system 1 comprises a phoneme recognizing section 3 responsive to an inputted speech, a coder 4 coupled to the phoneme recognizing section 3, a decoder coupled through the transmission line 5 to the coder 4, a word and senstence recognizing section 7 and a word dictionary 8. Of these sections of the speech recognition system 1, the speech recognizing section 3 and the coder 4 are placed at the terminal apparatus 11 side and the decoder 6, the word and sentence recognizing section and the word dictionary 8 are disposed at the central apparatus 12 side. Further, at the central apparatus 12 side are disposed a task control apparatus 2 coupled to the word and sentence recognizing section 7 and another coder 13 coupled to the task control apparatus 2, and at the terminal apparatus 11 side are disposed another decoder 14 coupled through the transmission line 5 to the coder 13 and a terminal control section 15 coupled to the decoder 14.

[0013] An operation of the above-mentioned arrangement will be described hereinbelow. As well as the above-described first embodiment, a pronounced speech by a user at the terminal apparatus 11 side is recognized by the speech recognition system 1. The operation of the task control apparatus for the recognition result is transmitted through the coder 13, transmission line 5 and decoder 14 to the terminal control section 15 which in turn delivers it to the user with a speech or letters through an indicator, a loud speaker or the like. After the operation of the task control apparatus, a speech is again introduced into the phoneme recognizing section 3 of the speech recognition system 1. Here, a recognition start command for the speech recognition system 1 is transmitted from the task control apparatus 2 to the word and sentence recognizing section 7 and further through the terminal control section 15 to the phoneme recognizing section 3. With the above-described arrangement, it is possible to provide a flexibility because the recognition vocabulary process and the operation procedure can be effected at the central processing apparatus side.

[0014] According to the above-described first and second embodiments, phonemes expressing a speech is recognized and a symbol train is coded and transmitted through a transmission means -to a central processing apparatus. The central processing apparatus decodes it and recognizes and outputs the corresponding word or sentence. Thus, as compared with the direct transmission of a speech, it is possible to prevent reduction of the speech recognition rate due to line noises and others and further possible to recognize a word speech and a sentence speech transmitted from a remote place. Moreover, as compared with the Fig. 2 conventional system, it is possible to reduce the cost of the terminal apparatus to be disposed at the user side.

[0015] It should be understood that the foregoing relates to only preferred embodiments of the present invention, and that it is intended to cover all changes and modifications of the embodiments of this invention herein used for the purposes of the disclosure, which do not constitute departures from the spirit and scope of the invention. For example, although in the above-described embodiments phonemes are used as the basic units of a language to be recognized, the present invention is not limited thereto and it is also appropriate to use syllables as the basic units. In addition, although the description is made in connection with japanese language, the recognition with respect to languages other than the japanese language can be made if recognition is performed in accordance with phonemes or others corresponding thereto.

[0016] A speech recognition system for recognizing a speech to be inputted so as to operate a given apparatus in accordance with the recoginized speech. The speech recognition system includes a phoneme recognizing section responsive to input of a speech from an external device for extracting phonemes constituting the inputted speech to output them as a symbol train. The symbol train from the phoneme recognizing section is supplied to a coder for coding the symbol train and outputting the coded symbol train through a transmission line to a decoder for decoding the coded symbol train to restoring it to the original symbol train. The decoded symbol train is inputted to a word and sentence recognizing section which in turn recognizes a word or a sentence on the basis of the decoded symbol train using a word dictionary.

Claims

1. A speech recognition system comprising:

means responsive to an input of a speech from an external device for extracting phonemes or syllables constituting the inputted speech to output them as a symbol train;

means coupled to said extracting means for coding said symbol train and outputting the coded symbol train;

means for transmitting the coded symbol train;

means coupled through said transmitting means to said coding means for decoding the coded symbol train to restoring it to the original symbol train; and

means responsive to the decoded symbol train from said decoding means for recognizing a word or a sentence on the basis of the decoded symbol train.

Drawing