BACKGROUND
[0001] The present invention relates to telephony services for hearing-impaired individuals,
but more specifically to an automated speech-to-text encoding/decoding method and
apparatus for use in a data communication network.
[0002] Hearing impaired inconveniences individuals encounter when using a telephone or other
voice communication device. These individuals require special equipment, such as an
electronic Teletype device, so that they may read whatever is being "said" by a party
at the other end of a call. Alternatively, hearing-impaired individuals may use a
third-party telecommunication relay service (TRS) offered by the service provider
which, under the American Disabilities Act, must provide this service if requested
by the hearing-impaired individual. TRS services require a live operator who uses
a Teletype machine to transcribe speech into text, and perhaps also to transcribe
text into speech. To access a TRS service, the hearing-impaired individual dials a
special TRS telephone number to establish a connection with the TRS operator. When
initially contacted to place a call, the operator will complete the second leg of
the call to the called party. An impaired or non-impaired person may initiate the
call to an impaired or non-impaired individual by calling a TRS operator.
[0003] In addition to being cumbersome, the aforementioned procedures require that the calling
party know in advance whether the called party is impaired.
[0004] Moreover, these types of services do not provide the hearing impaired individual
with transparent, unimpaired telephone service. In addition, the service provider
must bear the cost of providing TRS services.
[0005] WO 02/03693 A, entitled "Advanced Set Top Terminal Having a Video Call Feature" by Asmussen, discloses
a set top converter box or terminal for an interactive cable television program delivery
system. The set top features the capability to send and receive video calls through
the set top terminal equipped with a camera and microphone. In response to a video
call, message, web page, or other triggering event, the system automatically pauses
the video program and displays indication of the call on the television monitor.
WO 01/16940 A, entitled "System, Method, and Article of Manufacture for a Voice Recognition System
for Identity Authentication in Order to Gain Access to Data on the Internet" by St.
John, discloses a system that provides an additional layer of security in secure web
sites. In the disclosed system, in addition to the traditional verification methods,
a user requesting access to a web site is prompted to provide a voice sample prior
to being granted access to a secure web page. The voice sample provided to the party
requesting access is compared to a previously collected voice sample that is stored
with the authorized user's registration information. If the samples do not match,
access to the web site is denied.
EP 0,856,976 A, entitled "Communication System For Hearing-Impaired People, Telephone, and Method
For Using Such A System" by Naumburger, discloses a system having a speech recognition
unit which converts the signals received via the telephone network into a computer
readable code. It specifically converts speech signals into the corresponding ASCII
text. The resulting code or ASCII text is displayed as text on a monitor. The speech
recognition unit has a memory in which announcement text can be stored. The received
signals can be temporarily stored in this memory.
SUMMARY OF THE INVENTION
[0006] The present invention addresses the aforementioned problems by assisting the communication
needs of hearing-impaired subscribers and is particularly suited for use in almost
any type of network, such as a packet data network (Internet Protocol (IP), circuit-switched,
or asynchronous transfer mode (ATM)) that offers VoIP (Voice over IP) services. Such
networks and/or associated terminal devices possess specific hardware and software
elements that may be configured to implement features of the present invention without
substantial additional costs. The invention may also be implemented in an end-to-end
public-switched telephone network (PSTN), digital subscriber line (DSL), or other
routing or circuit-switched network.
[0007] The invention provides a speech-to-text translation device, a method of providing
automated speech-to-text translation and a computer-readable medium, as set out in
the accompanying claims.
[0008] Other features, aspects and advantages will become apparent upon review of the following
drawings taken in connection with the accompanying description. The invention, though,
is pointed out with particularity by the appended claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009]
FIG. 1 shows a block diagram of a telecommunication relay service system in accordance
with prior art.
FIG. 2 depicts a block diagram of an exemplary system in accordance with an aspect
of the present invention.
FIG. 3 depicts a system diagram in accordance with a more detailed aspect of the present
invention.
FIG. 4 illustrates one manner of speaker identification according to an aspect of
the present invention.
FIG. 5 illustrates another manner of speaker identification according to an aspect
of the present invention.
FIG. 6 shows textual feedback on a monitor resulting from action taken by a subscriber
according to a feature of the present invention.
FIG. 7 illustrates display of status information on a monitor according to another
feature of the present invention.
DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
[0010] In an illustrative embodiment, the invention extends existing VoIP (Voice over IP)
services to hearing-impaired individuals. Speech-to-text translation methods and apparatuses
described herein may be performed by or provided in the network itself, or in terminal-based
customer premises equipment (CPE) of a hearing-impaired individual.
[0011] Fig. 1 illustrates a conventional telecommunications relay service (TRS) used in
a public switched network (PSN)
42 in which a hearing-impaired individual, i.e., a calling party, uses a text telephone
(TT)
40 to establish a connection
41 over PSN
42 with a non-impaired individual in a communication relay session via a live communications
assistant (CA) or TRS operator 43 located at a service center
44. Operator
43 provides a "relay" service and employs a compatible text telephone
45. The text telephones may comprise a personal computer, a data terminal, an ASCRII-based
Teletype device, a telecommunication device for the deaf (TDD), a TTY, and/or other
means for generating and receiving text communications. Operator
43 ascertains with whom the hearing-impaired person desires to communicate, i.e., the
called party, and thereafter establishes a voice connection
46 to establish a link between the operator's voice telephone
47 and the voice telephone
48 of the desired party. Communication proceeds by alternating between text communication
and voice communication, as explained below.
[0012] In text communication, the hearing-impaired individual supplies text message segments
to text telephone
40. The hearing-impaired individual completes each message segment by supplying an end-of-message
code word, such as "GA," which means "go ahead," indicating that he or she has completed
their message. The text message segments appear at text telephone
45 of the operator
43 who reads and then speaks messages into the operator's voice telephone
47, thereby relaying the messages so that the text messages supplied by the hearing-impaired
individual are heard on the desired party's voice telephone
48.
[0013] When the non-impaired individual receiving the call hears the end of the message
code word, he or she begins to speak into his or her voice telephone
48. Operator
43 hears, via the operator's voice telephone
47, that which is spoken by the non-impaired individual, and then transcribes and supplies
the message to the operator's text telephone
45 for transmission to the first text telephone
40 of the hearing-impaired individual. When the non-impaired person finishes speaking,
he or she says an end-of-message code word, e.g., "go ahead." When the hearing-impaired
person reads the message at his or her text telephone
40, as transcribed by operator
43, he or she may enter a new message, or send an appropriate message such as "SK" to
indicate the end of the relay session.
[0014] Fig. 2 illustrates an environment in which an embodiment of the present invention
may be used to eliminate the cumbersome "relay" service described above. Other environments
or architectures may be provided according to the methods and/or apparatuses described
herein. The illustrated environment includes an IP network
51 that carries Internet traffic and a PSTN network
53 which carries telephone circuits. Cable modem
57 (or similar data terminal device) located at a first terminal end 58 of the network
conveys data packets to and from IP network
51. Cable modem
57 (or similar data terminal device) located at a second terminal end
60 of the network similarly conveys data packets to and from the IP network
51. A third terminal end
59 of the network terminates at a conventional telephone
62, which is connected with PSTN
53 and which transfers information to and from the telephone. PSTN 53 and IP network
51 intercommunicate via conventional gateways and interfaces as known in the art. Either
impaired or non-impaired individuals, as subsequently explained, may use the first
and second terminal ends
58 and
60 of the network while the third terminal end
59 is suited for a non-impaired individual.
[0015] In accordance with an embodiment of the present invention, terminal end
58 located at the premises of a hearing-impaired subscriber includes a broadband terminal
characterized by a multimedia terminal adapter (MTA)
50 that is also known as a broadband telephony interface (BTI). MTA 50 communicates
with IP network 51 via cable modem
57. MTA
50 also has a display interface to enable visual display of text information on monitor
61 using conventional device drivers, as well as a telephone interface to link with
a conventional telephone
62. By way of link
54, MTA
50 connects with a hybrid fiber coax (HFC) converter box
57 which, in turn, communicates with IP network
51 via an HFC network under established protocols, e.g., MCNS DOSCIS standards. Network
interfacing of MTA
50 may also occur directly with network
51 when cable modem functionality is integrated with MTA
50. An HFC network is mentioned here only for illustrative purposes, and is not meant
to limit the invention to such network.
[0016] A similar arrangement is provided at terminal end
60 of the network that may be located at the premises of a hearing-impaired or non-impaired
individual. In the case where two hearing-impaired subscribers desire to talk to each
other, a communication link is established between respective MTAs
50 at terminal ends
58 and
60. A non-impaired subscriber using a conventional telephone
62 located at terminal ends 59 or 60 may also communicate with a hearing impaired subscriber
located at terminal end 58.
[0017] Fig. 3 depicts an exemplary MTA
50 in greater detail. MTA 50 includes functional components of a personal computer (PC),
namely a processor
70 with buffers, registers, and random access memory, as well as a mass storage or memory
device
90, such as a flash RAM, magnetic storage drive, or CDROM. Processor
70 preferably includes executable code that enables conversion of speech to text, and
vice-versa, as well as encoding and decoding of IP packets conveyed over the IP network.
The processor also utilizes speech data buffers typically implemented by RAM and performs
the function of a tonal and inflection analyzer. Software executed by processor 70
may be downloaded from the network to which MTA
50 is connected, stored in a memory, and then executed from memory. Alternatively, certain
processor functions may be implemented in hardware or firmware. Speech buffers within
the processor 70, typically implemented by RAM, temporarily store speech data packets
during speech processing. Processor 70 may perform the operations of a digital speech
processor, or such a device (i.e., a commercially available CODEC (coder-decoder))
may be separately provided and interfaced with the processor
70 to encode/decode speech data packets.
[0018] MTA
50 also includes an analog (or digital) telephone interface
63 that interfaces with a conventional analog (or digital) telephone
62 and a television (or other conventional monitor) interface
57 employing, for example, NTSC, HDTV or other standards. The interface
57 conveys textual information to a monitor
61 using a standard format, i.e., it may perform or assist in performing the function
of converting a television to a display device at the direction of a processor that
controls MTA
50. Like many processing devices, a central bus
71 provides an information transfer path among various units within MTA
50.
[0019] As speech data is received from the network via cable modem interface
94, it is placed in a buffer of processor
70 on a first-in-first-out (FIFO) basis. When receiving speech data from the network,
speech data in the buffer is automatically decoded by processor
70 to display textual information of spoken words, and optionally to add punctuation,
exclamation, emphasis, highlighting, or other attributes of the speech. The size of
the buffer in processor
70 may be fixed or variable according to needs of the system, e.g., processor speed,
or the needs of hearing-impaired individuals, e.g., voice pattern identification,
punctuation, text display rate, etc. Buffer size may be increased or decreased dynamically
in accordance with encoding/decoding loading of the processor, or the subscriber may
manually set or adjust the size of the buffer.
[0020] Thus, when used by a hearing-impaired subscriber located at terminal end
58 (Fig. 2), for example, and after a telephone link is established with another party,
each word spoken by that other party is conveniently displayed on monitor
61 located in the subscriber's premises. Speech-to-text translation may be performed
between two hearing-impaired subscribers located, for example, at terminal stations
58 and
60 (Fig. 2), or between a hearing-impaired subscriber and a non-impaired subscriber
respectively located at terminal stations
58 and
59 (Fig. 2).
[0021] Processor
70, which performs speech/text CQDEC functions, converts representations of voice signals
received from user telephone
62 to a digital format and then transmits the resulting digital data to cable modem
interface
94 and ultimately to cable modem
57 (Fig. 2) for conveyance over IP network
51. To convert spoken words sent from a remote station, e.g., terminal end
59, for display on a local monitor
61, processor
70 captures digital voice data packets on the data bus
71 (which were sent from a remote subscriber terminal), converts the digital voice signals
to analog, and then encodes the analog voice to text for display on TV monitor
61. A hearing-impaired subscriber may then read the displayed message.
[0022] In one implementation, processor
70 receives packets that contain about ten to twenty milliseconds of speech data. As
speech packets are received, they are routed to the processor's buffer and stored
in a first-in-first-out (FIFO) order. By increasing the buffer size, speech-to-text
processor
70 may "look-ahead" for various speech inflections or patterns. This enables the addition
of punctuation, corrections or modifications to be made to the speech before being
displayed on monitor
61 (Fig. 2). By way of an example, a brief but sustained period of silence allows processor
70 to infer the proper position of a period. A longer period of silence allows the processor
to identify the beginning of a new paragraph. "Looking ahead," however, need not be
the normal operating mode because the additional buffering and processing load may
induce delay in the textual display function. This may depend on the speed and power
of processor
70. More importantly, any delay may impact non-impaired subscribers because they must
wait longer for a reply.
[0023] MTA
50 may also be programmed by the subscriber to respond to indications of a user, such
as dual-tone multiple-frequency (DTMF) digits, via a controller (shown as DTMF decoder
80 for illustrative purposes only) to activate or deactivate the functionality desired
by the subscriber. A keypad or keyboard for entering DTMF tones may be incorporated
in MTA
50, or the keypad of an existing telephone may be detected in order to implement operating
mode changes of MTA
50. Non-impaired persons may, for example, disable these functions when they use telephone
62. In effect, controller
80 (which, by the way, may also be implemented by processor
70) effects turn-on and turn-off of certain functionality in response to DTMF tones
input by a subscriber so that, for example, telephone
62 (Fig. 2) may be used normally, e.g., without speech-to-text encoding, or to place
the MTA apparatus in a "hearing-impaired" mode of operation where speech-to-text encoding
takes place. Processor
70 may also be programmed to respond to respective unique DTMF tones to enable, disable,
or adjust the period of a "look ahead" speech analysis feature provided by an internal
speech buffer; to activate/deactivate an internal tonal and inflection analyzer; to
increase or decrease the size of the speech buffer; to enable/disable speaker recognition
capabilities; or to make other mode changes in MTA
50. The buffer may comprise internal memory and the inflection and tonal analyzer may
comprise a software module, as known in the art.
[0024] With reference to Fig. 4, processor
70 provides the ability to determine, using speaker or voice pattern recognition, the
actual identification (i.e., the name) of a particular speaker. This generally requires
that the speaker had previously provided the MTA of the hearing-impaired subscriber
with a speech sample, i.e., a prior call, whose characteristics were stored as a reference.
The identification, once made, is stored in a voice and speech pattern database of
storage device
90 (Fig. 3). Storage of speech samples for later recall is typically accomplished by
a series of prompts generated by processor
70. For example, processor
70 may generate prompts on the monitor
61 (Fig. 2) requesting the hearing-impaired subscriber to respond through keypad or
keyboard inputs in order to store a speech sample (e.g., voice pattern) in a database
of storage device
90 for later recall, and to associate the stored sample with a name or other identification
by inputting other information. When the same party later engages in a telephone conversation
with the hearing-impaired individual, processor
70 effects visual presentation of the caller's identity on monitor
61 (Fig. 2), as shown in Fig. 4, based upon the previously provided speech sample which,
in the illustrated example, is identified as "Mom" and/or "Dad." Processor
70 may also distinguish separate callers on a conference or "extension phone."
[0025] With reference to Fig. 5, processor
70 may separate and identify different speakers' voices based on sex, gender, or other
characteristics. For example, text can be labeled as Voice 1: <spoken text> [female]
and Voice 2: <spoken text> [male] [laughing], as depicted in Fig. 5. In addition,
processor
70 may, without limitation, annotate textual presentations, such as providing an annotation
whether the speaker is a male or female voice, child or adult, hard-or soft-spoken,
or whether the speaker is laughing, shouting, or other attributes of voice based on
known characteristics of speech. To provide feedback of action ordered by the subscriber
or action taken by the system, monitor
61 may display certain commands or prompts, as illustrated in Fig. 6, e.g., "DTMF 3
Pressed." In addition, textual presentations associated with commonly used audible
signals of the network such as ringing, busy, all circuits busy, misdial warnings,
etc., are displayed as exemplified in Fig. 7.
[0026] As previously indicated, the functionality provided by MTA
50 of Fig. 2 may reside at various elements of network
51 or
53, of Fig. 2 as opposed to being resident in MTA 50 located at a subscriber's premises.
Having some of the functionality reside in the network of an existing network may
benefit deployment of the inventive methods and apparatuses, and also may enable providing
a service offering to hearing-impaired individuals not having ready access to an MTA
50.
[0027] When implemented in a network, aspects of the present invention may additionally
support language translation at each end of a call when there is interaction with
network elements performing such functions through, for example, common gate interchange
(CGI) protocols. Furthermore, tonal inflections are easier for a mechanical translator
to add, symbolically, in text form than in a direct verbal translation using synthetic
voice. A conventional language database can be available as a download from the network
and stored on the voice and speech pattern database
90.
[0028] The invention advantageously allows a subscriber to remotely "bridge" to a home unit
(e.g., via wireless phone) and obtain transcription capability for a call. The transcription
capability may be used for other business services (e.g., e-commerce). If combined
with a PC, the present invention allows a subscriber to create his or her own voice-to-email
application.
[0029] If two or more speakers simultaneously confer, the speech-to-text processor 70 (indicated
in Fig. 3 as a digital signal processor) indicates in real time on monitor 61 which
speaker is speaking using voice recognition data from the voice and speech pattern
database 90 (indicated in Fig. 3 as "mass storage"). Whenever the database 90 has
identified a speaker, based on speech samples previously analyzed by MTA 50, it displays
the name of the speaker along with their associated text on monitor 61.
[0030] The above-described embodiments are merely illustrative of methods and apparatuses
of the invention. Based on the teachings herein, various modifications and changes
may be made thereto by those skilled in the art and therefore fall within the scope
of the invention, as defined by the appended claims.
1. A speech-to-text device (50) for use in a network that includes a modem (57) to communicate
with the network, said device comprising:
an interface (94) that enables communication with the modem (57), a display interface
(56) that communicates with a visual display device (61) to display information, a
telephone interface (63) that enables communication with a telephone (62) to convey
voice information of a user, and characterised by further comprising:
a processor (70) adapted to decode the user voice information received from the network
(51) and adapted to display the user voice information as text on the display device
(61), the processor analyzing tone and inflections within speech segments of the voice
information to modify the text by adding punctuation.
2. The speech-to-text translation device as recited in claim 1, wherein said tonal and
inflection analyzer is adapted to detect a variation in at least one of tone, volume,
and inflection to modify said text.
3. The speech-to-text translation device as recited in claim 1, further including a storage
device (91) that stores voice patterns of a prior caller and said processor includes
a speech analyzer that recognizes an incoming voice pattern based on information stored
in said storage device (91) and effects a display of an identity of said prior caller
on said display device.
4. The speech-to-text translation device as recited in claim 1, wherein said processor
includes a detector that is adapted to respond to subscriber inputs to activate and
deactivate said tonal and inflection analyzer.
5. The speech-to-text translation device as recited in claim 4, wherein said detector
comprises a DTMF tone detector and said user inputs comprise DTMF tones of a telephone.
6. The speech-to-text translation device as recited in claim 1, wherein said tonal and
inflection analyzer of said processor is adapted to analyze exclamatory characteristics
of said speech that includes at least one of gender, soft-spoken words, hard-spoken
words, shouting, laughter, and human expression and said processor effects visual
modification of said text on said display device to denote said exclamatory characteristics.
7. A method of providing automated speech-to-text translation, the method comprising:
receiving speech packets from a network; and
storing the speech packets; and characterised by:
displaying textual representations of said speech packets including punctuation added
based on an analysis of tone and inflections associated with said speech packets.
8. The method as recited in claim 7, further comprising:
analyzing characteristics of the stored speech packets to insert punctuation in displayed
textual representations of said speech packets.
9. The method as recited in claim 8, wherein said analyzing is based on at least one
of changes in tone, volume, or inflection.
10. The method as recited in claim 9, further comprising:
responding to a command from the individual to activate and deactivate said analyzing.
11. A computer-readable medium storing instructions for controlling a computing device
to perform the steps:
receiving speech packets from a network; and
storing the speech packets; and characterised by:
displaying textual representations of said speech packets including punctuation added
based on an analysis of tone and inflections associated with said speech packets.
12. The computer-readable medium of claim 11, wherein the instructions further comprise:
analyzing characteristics of the stored speech packets to insert punctuation in displayed
textual representations of said speech packets.
13. The computer-readable medium of claim 12, wherein said analyzing is based on at least
one of changes in tone, volume, or inflection.
14. The computer-readable medium of claim 13, wherein the instructions further comprise:
responding to a command from the individual to activate and deactivate said analyzing.
1. Sprache-in-Text-Umwandlungsvorrichtung (50) zur Verwendung in einem Netzwerk, die
ein Modem (57) zur Kommunikation mit dem Netzwerk aufweist, wobei die Vorrichtung
umfaßt:
eine Schnittstelle (94), die Kommunikation mit dem Modem (57) ermöglicht, eine Anzeigeschnittstelle
(56), die mit einer optischen Anzeigevorrichtung (61) kommuniziert, um Information
anzuzeigen, eine Telefonschnittstelle (63), die Kommunikation mit einem Telefon (62)
ermöglicht, um Sprachinformation eines Anwenders zu übertragen, und dadurch gekennzeichnet, daß sie ferner umfaßt:
einen Prozessor (70), der dafür eingerichtet ist, die vom Netzwerk (51) empfangene
Anwendersprachinformation zu decodieren, und dafür eingerichtet ist, die Anwendersprachinformation
als Text auf der Anzeigevorrichtung (61) anzuzeigen, wobei der Prozessor Klang und
Tonfalländerungen in Sprachsegmenten der Sprachinformation analysiert, um den Text
durch Hinzufügung von Interpunktion zu modifizieren.
2. Sprache-in-Text-Umwandlungsvorrichtung nach Anspruch 1, wobei der Klang- und Tonfallanalysator
dafür eingerichtet ist, eine Änderung des Klangs, der Lautstärke und/oder des Tonfalls
zu erkennen, um den Text zu modifizieren.
3. Sprache-in-Text-Umwandlungsvorrichtung nach Anspruch 1, ferner mit einer Speichervorrichtung
(91), die Sprachmuster eines vorherigen Anrufers speichert, und wobei der Prozessor
einen Sprachanalysator aufweist, der ein eingehendes Sprachmuster auf der Grundlage
der in der Speichervorrichtung (91) gespeicherten Information wiedererkennt und eine
Anzeige einer Identität des vorherigen Anrufers auf der Anzeigevorrichtung bewirkt.
4. Sprache-in-Text-Umwandlungsvorrichtung nach Anspruch 1, wobei der Prozessor einen
Detektor aufweist, der dafür eingerichtet ist, auf Teilnehmereingaben zu reagieren,
um den Klang- und Tonfallanalysator zu aktivieren und zu deaktivieren.
5. Sprache-in-Text-Umwandlungsvorrichtung nach Anspruch 4, wobei der Detektor einen DTMF-Tondetektor
umfaßt und die Anwendereingaben DTMF-Töne eines Telefons umfassen.
6. Sprache-in-Text-Umwandlungsvorrichtung nach Anspruch 1, wobei der Klang- und Tonfallanalysator
des Prozessors dafür eingerichtet ist, Exklamationscharakteristika der Sprache zu
analysieren, die Geschlecht, sanft gesprochene Wörter, kräftig gesprochene Wörter,
Rufen, Lachen und menschliche Äußerung aufweisen, und der Prozessor eine optische
Modifikation des Textes auf der Anzeigevorrichtung bewirkt, um die Exklamationscharakteristika
zu bezeichnen.
7. Verfahren zur Bereitstellung einer automatischen Sprache-in-Text-Umwandlung, wobei
das Verfahren umfaßt:
Empfangen von Sprachpaketen aus einem Netzwerk; und
Speichern der Sprachpakete; und gekennzeichnet durch:
Anzeigen von Textdarstellungen der Sprachpakete mit Interpunktion, die auf der Grundlage
einer Analyse von Klang und Tonfalländerungen, die den Sprachpaketen zugeordnet sind,
hinzugefügt ist.
8. Verfahren nach Anspruch 7, ferner umfassend:
Analysieren von Charakteristika der gespeicherten Sprachpakete, um die Interpunktion
in die angezeigten Textdarstellungen der Sprachpakete einzufügen.
9. Verfahren nach Anspruch 8, wobei das Analysieren auf mindestens einem beruht, nämlich
auf Veränderungen des Klangs, der Läutstärke oder des Tonfalls.
10. Verfahren nach Anspruch 9, ferner umfassend:
Reagieren auf einen Befehl von der Person, um das Analysieren zu aktivieren und zu
deaktivieren.
11. Computerlesbares Medium, das Anweisungen zur Steuerung einer Computervorrichtung speichert,
um die folgenden Schritte auszuführen:
Empfangen von Sprachpaketen aus einem Netzwerk; und
Speichern der Sprachpakete; und gekennzeichnet durch:
Anzeigen von Textdarstellungen der Sprachpakete mit Interpunktion, die auf der Grundlage
einer Analyse von Klang und Tonfalländerungen, die den Sprachpaketen zugeordnet sind,
hinzugefügt ist.
12. Computerlesbares Medium nach Anspruch 11, wobei die Anweisungen ferner umfassen:
Analysieren von Charakteristika der gespeicherten Sprachpakete, um die Interpunktion
in die angezeigten Textdarstellungen der Sprachpakete einzufügen.
13. Computerlesbares Medium nach Anspruch 12, wobei das Analysieren auf mindestens einem
beruht, nämlich auf Veränderungen des Klangs, der Lautstärke oder des Tonfalls.
14. Computerlesbares Medium nach Anspruch 13, wobei die Anweisungen ferner umfassen:
Reagieren auf einen Befehl von der Person, um das Analysieren zu aktivieren und zu
deaktivieren.
1. Dispositif de traduction de la parole en texte (50) à utiliser dans un réseau qui
comporte un modem (57) pour communiquer avec le réseau, ledit dispositif comportant
:
une interface (94) qui autorise une communication avec le modem (57), une interface
d'affichage (56) qui communique avec un dispositif d'affichage visuel (61) pour afficher
des informations, une interface de téléphone (63) qui autorise une communication avec
un téléphone (62) afin d'acheminer des informations vocales d'un utilisateur, et caractérisé en ce qu'il comporte en outre :
un processeur (70) apte à décoder les informations vocales d'utilisateur reçues du
réseau (51) et apte à afficher les informations vocales d'utilisateur sous la forme
d'un texte sur le dispositif d'affichage (61), le processeur analysant des intonations
et inflexions dans des segments de parole des informations vocales pour modifier le
texte en ajoutant de la ponctuation.
2. Dispositif de traduction de la parole en texte selon la revendication 1, dans lequel
ledit analyseur d'intonation et d'inflexion est apte à détecter une variation dans
l'un des éléments parmi le timbre, le volume et l'inflexion pour modifier ledit texte.
3. Dispositif de traduction de la parole en texte selon la revendication 1, comportant
en outre un dispositif de stockage (91) lequel stocke des modèles vocaux d'un appelant
précédent et dans lequel ledit processeur comporte un analyseur vocal qui reconnaît
un modèle vocal entrant sur la base d'informations stockées dans ledit dispositif
de stockage (91) et exécute un affichage d'une identité dudit appelant précédent sur
ledit dispositif d'affichage.
4. Dispositif de traduction de la parole en texte selon la revendication 1, dans lequel
ledit processeur comporte un détecteur qui est apte à répondre à des entrées d'abonné
pour activer et désactiver ledit analyseur d'intonation et d'inflexion.
5. Dispositif de traduction de la parole en texte selon la revendication 4, dans lequel
ledit détecteur comporte un détecteur de tonalité MF et dans lequel lesdites entrées
d'utilisateur comportent des tonalités MF d'un téléphone.
6. Dispositif de traduction de la parole en texte selon la revendication 1, dans lequel
ledit analyseur d'intonation et d'inflexion dudit processeur est apte à analyser des
caractéristiques d'exclamation de ladite parole qui comporte au moins un genre, des
mots prononcés doux, des mots prononcés durs, des cris, des rires et une expression
humaine et ledit processeur effectue une modification visuelle dudit texte sur ledit
dispositif d'affichage pour indiquer lesdites caractéristiques exclamatoires.
7. Procédé de fourniture d'une traduction automatisée de la parole en texte, le procédé
comportant les étapes consistant à :
recevoir des paquets de parole d'un réseau ; et
stocker les paquets de parole ; et caractérisé par l'étape consistant à :
afficher des représentations textuelles desdits paquets de parole comportant une ponctuation
ajoutée sur la base d'une analyse d'intonations et d'inflexions associées audit paquets
de parole.
8. Procédé selon la revendication 7, comportant en outre l'étape consistant à :
analyser des caractéristiques des paquets de parole stockés afin d'intégrer de la
ponctuation dans les représentations textuelles affichées desdits paquets de parole.
9. Procédé selon la revendication 8, dans lequel ladite analyse est basée sur au moins
un élément parmi des changements d'intonation, de volume ou d'inflexion.
10. Procédé selon la revendication 9, comportant en outre l'étape consistant à :
répondre à une commande d'un individu destinée à activer et à désactiver ladite analyse.
11. Support lisible par un ordinateur stockant des instructions pour commander un dispositif
de calcul informatique afin de mettre en oeuvre les étapes consistant à :
recevoir des paquets de parole d'un réseau ; et
stocker les paquets de parole ; et caractérisé par l'étape consistant à :
afficher des représentations textuelles desdits paquets de parole comportant une ponctuation
ajoutée sur la base d'une analyse d'intonations et d'inflexions associées audit paquets
de parole.
12. Support lisible par un ordinateur selon la revendication 11, dans lequel les instructions
incluent en outre l'étape consistant à :
analyser des caractéristiques des paquets de parole stockés afin d'intégrer de la
ponctuation dans les représentations textuelles affichées desdits paquets de parole.
13. Support lisible par un ordinateur selon la revendication 12, dans lequel ladite analyse
est basée sur au moins un élément parmi des changements d'intonation, de volume ou
d'inflexion.
14. Support lisible par un ordinateur selon la revendication 13, dans lequel les instructions
comportent en outre l'étape consistant à :
répondre à une commande d'un individu destinée à activer et à désactiver ladite analyse.