[0001] The present invention relates to a system and method for generating surround sound.
In particular the present invention relates to surround environment independent from
number of loudspeakers and configuration/placement of the respective loudspeakers.
[0002] Prior art defines surround sound systems such as Dolby Digital and DTS multichannel
based transmission and presentation of sound. A disadvantage of this solution is a
dependence of the obtained effect on loudspeakers placement and room acoustics. Both
technologies suggest optimal loudspeakers placement, which, however, is often infeasible
due to the shape and arrangement of the room.
[0003] There are known sound correction systems which, however, most often are based on
a suitable delay of signals destined to each loudspeaker. The problem of sound reflections
off the walls is similarly corrected.
[0004] Reflections may be used to generate virtual surround sound. This is the case in so-called
sound projectors (an array of loudspeakers in a single casing - a so called sound
bar).
[0005] Problems with surround sound arise from the fact that the data in the stream of acoustic
assume specific locations of each loudspeaker relative to the listener. Even the names
of channels define particular arrangements i.e.: central, left front, right front,
left rear, right rear. In these prior art surround systems the same sound data stream
is sent to the speakers of each listener, regardless of the actual position of the
speakers in the presentation room.
[0006] It would be advantageous to provide a surround sound solution that would be independent
from number of loudspeakers and configuration/placement of the respective loudspeakers.
[0007] Prior art discloses Ambisonics system, which is a full-sphere surround sound technique:
in addition to the horizontal plane, it covers sound sources above and below the listener.
[0008] Unlike other multichannel surround formats, its transmission channels do not carry
speaker signals. Instead, they contain a speaker-independent representation of a sound
field called B-format, which is then decoded to the listener's speaker setup. This
extra step allows the producer to think in terms of source directions rather than
loudspeaker positions, and offers the listener a considerable degree of flexibility
as to the layout and number of speakers used for playback (source: Wikipedia).
[0009] The aim of the development of the present invention is a surround system and method
that is independent from number of loudspeakers and configuration/placement of the
respective loudspeakers.
SUMMARY AND OBJECTS OF THE PRESENT INVENTION
[0010] An object of the present invention is a signal according to claim 1.
[0011] These and other objects of the invention presented herein, are accomplished by providing
a system and method for generating surround sound. Further details and features of
the present invention, its nature and various advantages will become more apparent
from the following detailed description of the preferred embodiments shown in a drawing,
in which:
- Fig. 1
- presents a diagram of a sound event;
- Fig. 2
- presents a diagram of the method according to the present invention;
- Fig. 3
- presents a diagram of the system according to the present invention;
- Figs 4A - 5B
- depict audio data packets
NOTATION AND NOMENCLATURE
[0012] Some portions of the detailed description which follows are presented in terms of
data processing procedures, steps or other symbolic representations of operations
on data bits that can be performed on computer memory. Therefore, a computer executes
such logical steps thus requiring physical manipulations of physical quantities.
[0013] Usually these quantities take the form of electrical or magnetic signals capable
of being stored, transferred, combined, compared, and otherwise manipulated in a computer
system. For reasons of common usage, these signals are referred to as bits, packets,
messages, values, elements, symbols, characters, terms, numbers, or the like.
[0014] Additionally, all of these and similar terms are to be associated with the appropriate
physical quantities and are merely convenient labels applied to these quantities.
Terms such as "processing" or "creating" or "transferring" or "executing" or "determining"
or "detecting" or "obtaining" or "selecting" or "calculating" or "generating" or the
like, refer to the action and processes of a computer system that manipulates and
transforms data represented as physical (electronic) quantities within the computer's
registers and memories into other data similarly represented as physical quantities
within the memories or registers or other such information storage.
[0015] A computer-readable (storage) medium, such as referred to herein, typically may be
non-transitory and/or comprise a non-transitory device. In this context, a non-transitory
storage medium may include a device that may be tangible, meaning that the device
has a concrete physical form, although the device may change its physical state. Thus,
for example, non-transitory refers to a device remaining tangible despite a change
in state.
DESCRIPTION OF EMBODIMENTS
[0016] The present invention is independent from loudspeakers placement due to the fact
that an acoustic stream is not divided into channels but rather sound events present
in a three-dimensional space.
[0017] Fig. 1 presents a diagram of a sound event according to the present invention. The
sound event 101 represents the fact of presence of a sound source in an acoustic space.
Each such event has an associated set of parameters such as: time of event 102, location
in space with respect to a reference location point 103. The location may be given
as x,y,z coordinates (alternatively spherical coordinates r,α,β may be used).
[0018] The sound event 101 comprises further a movement trajectory in space (for example
in case of a vehicle changing its location) 104. The movement trajectory may be defined
as n,
Δt1, x1, y1, z1, γ1, δ1,
Δt2, x2, y2, z2, γ2, δ2, ...,
Δtn, xn, yn, zn, γs, δs which is a definition of a curve on which the sound source
moves. n is a number of points of the curve while the xi, yi, zi are points in space
and γ,δ is temporary orientation of the sound source (azimuth and elevation) and
Δt is an increase in time.
[0019] The sound event 101 comprises further orientation (γ,δ - direction in which the highest
sound amplitude is generated; azimuth and elevation are defined relative to orientation
of a coordination system) 105.
[0020] Additionally, the sound event 101 comprises spatial characteristic of the source
of the event (a shape of a curve of the sound amplitude with respect to emission angle
- zero angle means emission in the direction of the highest amplitude) 106. This parameter
may be provided as s, λ1, u1, v1, λ2, u2, v2, λ3, u3, v3, γ3, δ3, ..., λs, us, vs
where the characteristic is symmetrical and described with s points whereas u
i describe a shape of the sound beam in the horizontal plane while v
i respective shape in the vertical plane.
[0021] The sound event 101 comprises further information on sampling frequency (in case
it is different from the base sampling frequency of the sound stream) 107, signal
resolution (the number of bits per sample; this parameter is present if a given source
has a different span than a standard span of the sound stream) 108 and a set of acoustic
samples 109 of the given frequency, resolution.
[0022] A plurality of sound events will typically be encoded into an output audio data stream.
[0023] The samples are always monophonic and are present as long as a given sound source
emits a sound. In case of speech it means that a sound source appears and disappears
in the sound stream. This is the reason for naming such event a sound event. In case
of a recording of an orchestra there will occur appear/disappear events of respective
instruments. As can be easily seen such an approach to sound data stream results in
variable bitrate, wherein the changes may be substantial. When there are not any sound
events the bitrate will be close to zero while in case of multiple sound events the
bitrate may be higher (even higher than in case of prior art surround systems).
[0024] The loudspeakers may be located in an arbitrary way however preferably they should
not be all placed in a single place, for example a single wall. According to the present
invention the plurality of loudspeakers may be considered a cloud of loudspeakers.
The more the loudspeakers the better spatial effect may be achieved. Preferably the
loudspeakers are scattered in the presentation location, preferably on different walls
of a room.
[0025] The loudspeakers may be either wired or wireless and be communicatively coupled to
a sound decoder according to the present invention. The decoder may use loudspeakers
of other electronic devices as long as communication may be established with controllers
of such speakers (eg, bluetooth or wi-fi communication with loudspeakers of a TV set
or mobile device).
[0026] The sound decoder according to the present invention may obtain information on location
and characteristic of a given loudspeaker by sending to its controller a test sound
stream and subsequently recording the played back test sound stream and analyzing
the relevant acoustic response.
[0027] For the purpose of obtaining information on location and characteristic of a given
loudspeaker there may be used an array of omnidirectional microphones, for example
spaced from each other by 10cm and positioned on vertices of a cube or a tetrahedron.
By measuring delays in a signal reaching respective microphones, one may estimate
sound location. The characteristics of a given loudspeaker may be obtained by analyzing
recorded sound at different frequencies.
[0028] Other methods for obtaining information on location and characteristic of a given
loudspeaker include solutions presented in
US20140112484 or in "Analysis of Loudspeaker Placement and Loudspeaker-Room Interaction, and Correction
of Associated Effects" by Michael Hlatky of University of Applied Sciences Offenburg,
Media and Information Technology, Bang & Olufsen a/s, Department of Acoustics, August
2007.
[0029] According to the present invention there are used sound reflections in order to generate
sounds from directions where there is not any loudspeaker present. To this end the
sound decoder executes sound location analysis aimed at using reflective surfaces
(such as walls) to generate reflected sounds. All sound reflecting surfaces are divided
into triangles and each of the triangles is treated by the decoder as a virtual sound
source. Each triangle has an associated function defining dependence of a sound virtually
emitted by this triangle on sounds emitted by physical loudspeakers. This function
defines the amplitude as well as spatial characteristics of emission, which may be
different for each physical loudspeaker. In order for the system to operate properly
it is necessary to place, at a sound presentation location, microphones used by the
sound decoder for constant measurements of compliance of the emitted sounds with expected
sounds and for fine tuning the system.
[0030] Such a function is a sum of reflected signals emitted by all loudspeakers in a room,
wherein a signal reflected from a given triangle depends on the triangle location,
loudspeaker(s) location(s), loudspeaker(s) emission characteristics, acoustic pressure
emitted by the loudspeaker(s). The signal virtually emitted by the triangle will be
a sum of reflection generated by all loudspeakers. A spatial acoustic emission characteristics
of such triangle will depend on physical loudspeakers whereas each physical loudspeaker
will influence it partially. Such characteristics may be discrete, comprising narrow
beams generated by different loudspeakers. Therefore, in order to eliminate sound
reflected at a given location, there has to be selected an appropriate loudspeaker
or a linear combination of loudspeakers (appropriate means in line with the acoustic
target eg. generating, from a given plane, a reflection in the direction of the listener
such that other reflections do not ruin the effect).
[0031] The most important module of the system is a local sound renderer. This means that
the renderer receives separate sound events and composes from them acoustic output
streams that are subsequently sent to loudspeakers.
[0032] Due to the fact that the sound events comprise information on location of sound sources
with respect to a reference location (for example the listener), the renderer shall
select a speaker or speakers, which is/are closest to the location in space where
the sound was emitted from. In case a speaker is not present in that location, speakers
adjacent to this location shall be used, preferably speakers located at opposite sides
of the location so that they may be configured in order to create an impression for
the listener that the sound is emitted from its original location in space.
[0033] More than two loudspeakers may be used for one sound event in particular when a virtual
sound source is to be positioned between them.
[0034] In case there are not any physical loudspeakers in the vicinity of the location (direction)
of the sound of a sound event, reflections from adjacent planes (such as walls) may
be used to position the sound. Knowing a sound reflection function for a given reflective
section optimal physical loudspeakers need to be chosen for generating the reflection
effect.
[0035] The reference point location may be differently selected for a given sound rendering
location or room. For example one may listen to the music in an armchair and watch
television sitting on a sofa. Therefore, there are two different reference locations
depending on circumstances. Consequently, the coordinates system changes. The reference
location may be automatically obtained by different sensors such as an infrared camera
or manually input by the listener. Such solution is possible only because of local
sound rendering.
[0036] An exemplary normalized characteristics of a physical loudspeaker is shown in Fig.
1 B. The characteristic is usually symmetrical and described with s points whereas
u describes a shape of the sound beam in the horizontal plane while
v respective shape in the vertical plane. Such characteristics may be determined using
an array of microphones as previously described.
[0037] In case of reflection, characteristic can be asymmetrical and discontinuous.
[0038] Fig. 2 presents a diagram of the method according to the present invention. The method
starts, after receiving a sound data stream according to Fig. 1, at step 201 from
accessing a database of loudspeakers present at sound presentation location. Subsequently,
at step 202, there is executed calculating, which loudspeakers may be used from the
available loudspeakers so as to achieve the effect closest to a perfect arrangement.
This may be effected by location thresholding based on the database of loudspeakers
records.
[0039] Such calculation needs to be executed for each sound event because sound events may
run in parallel and the same loudspeaker(s) may be needed to emit them. Data for each
loudspeaker has to be added by applying superposition approach (all sound events at
a given moment of time that affect a selected loudspeaker).
[0040] In case a loudspeaker is close to a location in which a sound source is located,
this loudspeaker will be used. In case the sound source is located between physical
loudspeakers then the closest loudspeakers will be used in order to simulate a virtual
loudspeaker, located where the sound source is located. A superposition principle
may be applied for this purpose. It is necessary to take into account, during this
process, the emission characteristics of the loudspeakers.
[0041] The physical loudspeakers selected for simulating a virtual loudspeaker, will emit
sound in direction of the listener at predefined angles of azimuth and elevation.
For these angles there is to be read attenuation level from the emission characteristic
of the loudspeaker (the characteristics is normalized and therefore it will be a number
from a range of 0 ... 1) and multiplied by emission strength of the loudspeaker (acoustic
pressure). Only after that, superposition may be executed. The signals are to be added
by assigning weights to loudspeakers, the weights arising from location of a virtual
loudspeaker with respect to these used to its generation (based on proportionality
rule).
[0042] The calculations shall include not only the direction from which a sound event is
emitted but also a distance from the listener (i.e. a delay of the signal in such
a way so as to simulate the correct distance from the listener to the sound event).
The properly selected loudspeakers surround the sound event location. There may be
more than two selected loudspeakers that will emit a particular sound event data.
[0043] At step 203 there is calculated an angular difference between sound source location
and positions of the candidate loudspeakers in spherical coordinates. The sound event
location is:
rssi - a distance of the i-th sound event location from the listener;
γi - azimuth on the i-th sound event location
δi - elevation angle of the i-th sound event
and the loudspeaker location is:
rsj- a distance of the j-th loudspeaker location from the listener;
γj - azimuth on the j-th loudspeaker location
δj- elevation angle of the j-th loudspeaker
[0044] Thus the angular difference is as follows:
A set of loudspeakers that have the lowest distance from the sound event location
are selected at step 204. The loudspeakers are to be located at opposite sides (when
facing the reference location of a user) with respect to the sound event location
so that the listener has an impression that the sound arrives from the sound event
location.
[0045] Subsequently, at step 205, in case of insufficient number of physical loudspeakers
there may be created one or more virtual loudspeaker(s). Reflection of sound is utilized
for this purpose. The reflections are generated by physical loudspeakers so that they
imitate a physical loudspeaker in a given location of the sound presentation location.
The generated sound will reflect from a selected surface and be directed towards the
listener.
[0046] Knowing the location of the virtual loudspeaker, a straight line is to be virtually
drawn from the listener to this location and further to a reflective plane (such as
a wall). A point indicated as an intersection of this line with the reflective plane
will indicate a triangle on the reflective plane, which is to be used in order to
generate a reflected sound. From the characteristics of emission of that triangle
it needs to be read which physical loudspeakers are to be used. Subsequently, there
needs to be used a function defining dependency of emission of the triangle from particular
loudspeakers in order to generate data streams 206 that are to be sent to physical
loudspeakers in order to achieve a reflected sound from that particular triangle.
These data stream are to be added to other data emitted by the respective loudspeakers
207.
[0047] Fig. 3 presents a diagram of the system according to the present invention. The system
may be realized using dedicated components or custom made FPGA or ASIC circuits. The
system comprises a data bus 301 communicatively coupled to a memory 304. Additionally,
other components of the system are communicatively coupled to the system bus 301 so
that they may be managed by a controller 305.
[0048] The memory 304 may store computer program or programs executed by the controller
305 in order to execute steps of the method according to the present invention.
[0049] The system comprises a sound input interface 303, such as an audio/video communication
connector eg. HDMI or communication connector such as Ethernet. The received sound
data is processed by a sound renderer 302 managing the presentation of sounds using
the listener's premises loudspeakers setup. The management of the presentation of
sounds includes virtual loudspeakers management that is effected by a virtual loudspeakers
module 307 operating according to the method described above.
[0050] Figs 4A - 5B depict audio data packets that are multiplexed in an output audio data
stream by a suitable encoder. The audio data stream may comprise a header and packets
of acoustic data (for example sound event 101 data packet). The packets are preferably
multiplexed in a chronological order but some shifts of data encoding/decoding time
versus presentation time are allowable since each packet of acoustic data comprises
information regarding its presentation time and must be received sufficiently ahead
of that presentation.
[0051] The header may for example define a global sampling frequency and samples resolution.
[0052] Audio data stream may comprise acoustic events as shown in Fig. 4A. All properties
of a sound event 101 are maintained with an addition of a language field that identifies
audio language, for example with a use of an appropriate identifier. In case more
than one language version is present, the acoustic event packets of different languages
audio will differ by language identifier 401 and audio samples data 107, 108, 109.
The remaining packet data fields will be identical between the respective audio language
versions. An audio renderer will output only packets related to a language selected
by a user.
[0053] Fig. 4B presents a special sound event packet which is a textual event packet. Instead
of sound samples this packet comprises a library identifier 401 and a textual data
field 403. Such textual data may be used to generate sound by a speech synthesizer.
The library identifier may select a suitable voice of speech synthesizer to be used
by the sound renderer as well as provide processing parameters for the renderer.
[0054] Optionally, the textual event packet may comprise a field specifying emotions in
the textually defined event such as whisper, scream, cry or the like. Further, a field
of a person's characteristics may be defined such as gender, age, accent or the like.
Thus, the generation of sound may be more accurate.
[0055] As another option, the textual event packet may comprise a field defining tempo.
In particular this field may define speech synthesis timing, such as length of different
syllables and/or pauses between words.
[0056] The aforementioned has the advantage of data reduction since textual data consume
far less data than compressed audio samples data.
[0057] Similarly, Fig. 5A defines a Synthetic Non-verbal Event Packet. Instead of sound
samples and language field, this packet comprises at least one code in the data field
408 and a library selection field 402 referring to a music synthesizer library. The
codes configure a music synthesizer. Thereby sounds are generated locally based on
codes thus saving transmission bandwidth.
[0058] Such synthesizers are usually based on built in sound libraries used for synthesis.
By their nature such libraries are limited, therefore it may be necessary to transmit
to a receiver such a library so that a local library may be changed. This allows for
achieving an optimal acoustic effect. Such a Synthetic Library Packet has been presented
in Fig. 5B. The library comprises an identifier 404, language identifier 405 and audio
samples data 406. The library may further be extended with additional data depending
on applied synthesizers. A synthetic non-verbal event packet may reference such library
by identifying a specific sample and its parameters if applicable.
[0059] Optionally, the textual event packets and/or synthetic non-verbal event packets may
comprise a filed defining volume of the sound to be synthesized.
[0060] In one embodiment, in case of textual event packets and synthetic non-verbal event
packets, the renderer interprets data (text or command) with built-in synthesizers
and creates dynamic acoustic events packets that are subject to final sound rendering
just as regular acoustic event packets.
[0061] The presence of textual event packets and synthetic non-verbal event packets allows
for radical decrease of bandwidth required for transmission of audio data. In turn
the synthetic library packet requires some bandwidth but allows to increase synthesis
quality and still does not require as much data as regular audio samples recorded
in real time.
[0062] The present invention related to recording, encoding and decoding of sound in order
to provide for surround playback independent of loudspeakers setup at the sound presentation
location. Therefore, the invention provides a useful, concrete and tangible result.
[0063] The aforementioned recording, encoding and decoding of sound takes place in special
systems and processes sound data. Therefore the machine or transformation test is
fulfilled and that the idea is not abstract.
[0064] It can be easily recognized, by one skilled in the art, that the aforementioned method
for generating surround sound may be performed and/or controlled by one or more computer
programs. Such computer programs are typically executed by utilizing the computing
resources in a computing device. Applications are stored on a non-transitory medium.
An example of a non-transitory medium is a non-volatile memory, for example a flash
memory while an example of a volatile memory is RAM. The computer instructions are
executed by a processor. These memories are exemplary recording media for storing
computer programs comprising computer-executable instructions performing all the steps
of the computer-implemented method according the technical concept presented herein.
[0065] While the invention presented herein has been depicted, described, and has been defined
with reference to particular preferred embodiments, such references and examples of
implementation in the foregoing specification do not imply any limitation on the invention.
It will, however, be evident that various modifications and changes may be made thereto
without departing from the broader scope of the technical concept. The presented preferred
embodiments are exemplary only, and are not exhaustive of the scope of the technical
concept presented herein.
[0066] Accordingly, the scope of protection is not limited to the preferred embodiments
described in the specification, but is only limited by the claims that follow.
1. Signal umfassend Klangereignisse (101) wobei das Klangereignis (101) umfasst:
• Zeit der Ereignisinformation (102);
• Information hinsichtlich des räumlichen Ort in Bezug auf einen Bezugsortpunkt (103);
• einen Bewegungsverlauf im Raum (104);
• Ausrichtungsinformation (105);
dadurch gekennzeichnet, dass das Signal ferner mindestens drei Klangereignisdaten umfasst, umfassend mindestens
eine akustische Klangereignisdaten, umfassend
• räumliche Merkmale der Quelle des Ereignisses (106), umfassend räumliche Merkmale
der Klangausgabe einer verknüpften Klangquelle, definiert als eine Reihe von Punkten
der räumlichen Merkmale in horizontaler und vertikaler Ebene;
• Information zu Abtastfrequenz (107);
• Information zu Signalauflösung (108); und
• eine Reihe von akustischen Proben (109) der Abtastfrequenz (107) und mit der Signalauflösung
(108);
mindestens ein textliches Klangereignis, die Daten ferner umfassend
• einen Bibliothek-Identifikator (402) und ein textliches Datenfeld (403), wobei die
textlichen Daten verwendet werden sollen, um Klang durch einen Sprachgenerator zu
erzeugen;
und mindestens ein synthetisches nicht-verbales Klangereignis, die Daten ferner umfassend
• mindestens eine Codedaten (408) und ein Bibliothek-Auswahlfeld (402) in Bezug auf
eine Musikgenerator-Bibliothek, wobei der mindestens eine Code zum Konfigurieren eines
Musikgenerators ist.
2. Signal nach Anspruch 1, dadurch gekennzeichnet, dass es ferner ein synthetisches Bibliothek-Paket umfasst, umfassend einen Identifikator
(404), Sprachidentifikator (405) und Audioprobendaten (406), die durch mindestens
ein synthetisches nicht-verbales Klangereignis referenziert sind.
3. Signal nach Anspruch 1, dadurch gekennzeichnet, dass das mindestens eine textliche Klangdatenereignis ferner ein Feld umfasst, das Emotionen
im textlich definierten Ereignis detailliert.
4. Signal nach Anspruch 1, dadurch gekennzeichnet, dass die mindestens einen textlichen Klangereignisdaten ferner ein Feld von Merkmalen
einer Person umfasst.
5. Signal nach Anspruch 1, dadurch gekennzeichnet, dass die mindestens einen textlichen Klangereignisdaten textlichen Klangereignisdaten
und/oder die mindestens einen synthetischen nicht-verbalen Ereignisdaten ferner eine
abgelegte Definitionslautstärke des Klangs umfasst, der synthetisiert werden soll.
6. Signal nach Anspruch 1, dadurch gekennzeichnet, dass die mindestens einen textlichen Klangereignisdaten ferner ein Feld umfassen, das
das Tempo definiert, umfassend Information zu Sprachsynthese-Timing, einschließlich
Länge von Silben und/oder Pausen zwischen Wörtern.
1. Signal comprenant des évènements sonores (101), chacun desdits évènements sonores
(101) comprenant :
• des informations sur l'heure de l'événement (102) ;
• des informations sur le lieu dans l'espace relativement à un emplacement de référence
(103) ;
• une trajectoire de déplacement dans l'espace (104) ;
• des informations sur l'orientation (105) ;
caractérisé en ce que le signal comprend en outre au moins trois données d'évènement sonore, comprenant
au moins des données sur un événement sonore acoustique, y compris :
• une caractéristique spatiale de l'origine de l'événement (106), comprenant des caractéristiques
spatiales de l'émission sonore d'une origine sonore connexe, définies comme un ensemble
de points de la caractéristique spatiale dans des plans horizontaux et verticaux ;
• des informations sur la fréquence d'échantillonnage (107) ;
• des informations sur la résolution du signal (108) ; et
• un ensemble d'échantillons acoustiques (109) de la fréquence d'échantillonnage (107)
et à la résolution du signal (108) ;
au moins un événement sonore textuel, les données comprenant en outre :
• un identifiant de bibliothèque (402) et un champ de données textuelles (403), les
données textuelles devant être utilisées pour la production d'un son par un synthétiseur
de parole ;
et au moins des données sur un événement sonore synthétique non verbal comprenant
en outre
• au moins des données sur un code (408) et un champ de sélection de bibliothèque
(402) se rapportant à une bibliothèque de synthétiseurs musicaux, le code au nombre
d'au moins un servant à configurer un synthétiseur musical.
2. Signal selon la revendication 1, caractérisé en ce qu'il comprend un paquet de bibliothèque synthétique, comprenant un identifiant (404),
un identifiant linguistique (405), des données d'échantillons audio (406) référencées
par au moins un événement sonore non verbal synthétique.
3. Signal selon la revendication 1, caractérisé en ce que les données d'un événement sonore textuel au nombre d'au moins un comprennent en
outre un champ spécifiant des émotions dans l'événement défini textuellement.
4. Signal selon la revendication 1, caractérisé en ce que les données d'un événement sonore textuel au nombre d'au moins un comprennent en
outre un champ spécifiant les caractéristiques d'un sujet.
5. Signal selon la revendication 1, caractérisé en ce que les données d'un événement sonore textuel au nombre d'au moins un et/ou les données
d'un événement non verbal synthétique au nombre d'au moins un comprennent un champ
définissant le volume du son à synthétiser.
6. Signal selon la revendication 1, caractérisé en ce que les données d'un événement sonore textuel comprennent un champ définissant une cadence,
comprenant des informations sur le minutage de synthèse de la parole, y compris la
longueur des syllabes et/ou les pauses entre les mots.