[0001] The present invention relates to a system and method for generating surround sound.
               In particular the present invention relates to surround environment independent from
               number of loudspeakers and configuration/placement of the respective loudspeakers.
 
            [0002] Prior art defines surround sound systems such as Dolby Digital and DTS multichannel
               based transmission and presentation of sound. A disadvantage of this solution is a
               dependence of the obtained effect on loudspeakers placement and room acoustics. Both
               technologies suggest optimal loudspeakers placement, which, however, is often infeasible
               due to the shape and arrangement of the room.
 
            [0003] There are known sound correction systems which, however, most often are based on
               a suitable delay of signals destined to each loudspeaker. The problem of sound reflections
               off the walls is similarly corrected.
 
            [0004] Reflections may be used to generate virtual surround sound. This is the case in so-called
               sound projectors (an array of loudspeakers in a single casing - a so called sound
               bar).
 
            [0005] Problems with surround sound arise from the fact that the data in the stream of acoustic
               assume specific locations of each loudspeaker relative to the listener. Even the names
               of channels define particular arrangements i.e.: central, left front, right front,
               left rear, right rear. In these prior art surround systems the same sound data stream
               is sent to the speakers of each listener, regardless of the actual position of the
               speakers in the presentation room.
 
            [0006] It would be advantageous to provide a surround sound solution that would be independent
               from number of loudspeakers and configuration/placement of the respective loudspeakers.
 
            [0007] Prior art discloses Ambisonics system, which is a full-sphere surround sound technique:
               in addition to the horizontal plane, it covers sound sources above and below the listener.
 
            [0008] Unlike other multichannel surround formats, its transmission channels do not carry
               speaker signals. Instead, they contain a speaker-independent representation of a sound
               field called B-format, which is then decoded to the listener's speaker setup. This
               extra step allows the producer to think in terms of source directions rather than
               loudspeaker positions, and offers the listener a considerable degree of flexibility
               as to the layout and number of speakers used for playback (source: Wikipedia).
 
            [0009] The aim of the development of the present invention is a surround system and method
               that is independent from number of loudspeakers and configuration/placement of the
               respective loudspeakers.
 
            SUMMARY AND OBJECTS OF THE PRESENT INVENTION
[0010] An object of the present invention is a signal according to claim 1.
 
            [0011] These and other objects of the invention presented herein, are accomplished by providing
               a system and method for generating surround sound. Further details and features of
               the present invention, its nature and various advantages will become more apparent
               from the following detailed description of the preferred embodiments shown in a drawing,
               in which:
               
               
                  - Fig. 1
 
                  - presents a diagram of a sound event;
 
                  - Fig. 2
 
                  - presents a diagram of the method according to the present invention;
 
                  - Fig. 3
 
                  - presents a diagram of the system according to the present invention;
 
                  - Figs 4A - 5B
 
                  - depict audio data packets
 
               
 
            NOTATION AND NOMENCLATURE
[0012] Some portions of the detailed description which follows are presented in terms of
               data processing procedures, steps or other symbolic representations of operations
               on data bits that can be performed on computer memory. Therefore, a computer executes
               such logical steps thus requiring physical manipulations of physical quantities.
 
            [0013] Usually these quantities take the form of electrical or magnetic signals capable
               of being stored, transferred, combined, compared, and otherwise manipulated in a computer
               system. For reasons of common usage, these signals are referred to as bits, packets,
               messages, values, elements, symbols, characters, terms, numbers, or the like.
 
            [0014] Additionally, all of these and similar terms are to be associated with the appropriate
               physical quantities and are merely convenient labels applied to these quantities.
               Terms such as "processing" or "creating" or "transferring" or "executing" or "determining"
               or "detecting" or "obtaining" or "selecting" or "calculating" or "generating" or the
               like, refer to the action and processes of a computer system that manipulates and
               transforms data represented as physical (electronic) quantities within the computer's
               registers and memories into other data similarly represented as physical quantities
               within the memories or registers or other such information storage.
 
            [0015] A computer-readable (storage) medium, such as referred to herein, typically may be
               non-transitory and/or comprise a non-transitory device. In this context, a non-transitory
               storage medium may include a device that may be tangible, meaning that the device
               has a concrete physical form, although the device may change its physical state. Thus,
               for example, non-transitory refers to a device remaining tangible despite a change
               in state.
 
            DESCRIPTION OF EMBODIMENTS
[0016] The present invention is independent from loudspeakers placement due to the fact
               that an acoustic stream is not divided into channels but rather sound events present
               in a three-dimensional space.
 
            [0017] Fig. 1 presents a diagram of a sound event according to the present invention. The
               sound event 101 represents the fact of presence of a sound source in an acoustic space.
               Each such event has an associated set of parameters such as: time of event 102, location
               in space with respect to a reference location point 103. The location may be given
               as x,y,z coordinates (alternatively spherical coordinates r,α,β may be used).
 
            [0018] The sound event 101 comprises further a movement trajectory in space (for example
               in case of a vehicle changing its location) 104. The movement trajectory may be defined
               as n, 
Δt1, x1, y1, z1, γ1, δ1, 
Δt2, x2, y2, z2, γ2, δ2, ..., 
Δtn, xn, yn, zn, γs, δs which is a definition of a curve on which the sound source
               moves. n is a number of points of the curve while the xi, yi, zi are points in space
               and γ,δ is temporary orientation of the sound source (azimuth and elevation) and 
Δt is an increase in time.
 
            [0019] The sound event 101 comprises further orientation (γ,δ - direction in which the highest
               sound amplitude is generated; azimuth and elevation are defined relative to orientation
               of a coordination system) 105.
 
            [0020] Additionally, the sound event 101 comprises spatial characteristic of the source
               of the event (a shape of a curve of the sound amplitude with respect to emission angle
               - zero angle means emission in the direction of the highest amplitude) 106. This parameter
               may be provided as s, λ1, u1, v1, λ2, u2, v2, λ3, u3, v3, γ3, δ3, ..., λs, us, vs
               where the characteristic is symmetrical and described with s points whereas u
i describe a shape of the sound beam in the horizontal plane while v
i respective shape in the vertical plane.
 
            [0021] The sound event 101 comprises further information on sampling frequency (in case
               it is different from the base sampling frequency of the sound stream) 107, signal
               resolution (the number of bits per sample; this parameter is present if a given source
               has a different span than a standard span of the sound stream) 108 and a set of acoustic
               samples 109 of the given frequency, resolution.
 
            [0022] A plurality of sound events will typically be encoded into an output audio data stream.
 
            [0023] The samples are always monophonic and are present as long as a given sound source
               emits a sound. In case of speech it means that a sound source appears and disappears
               in the sound stream. This is the reason for naming such event a sound event. In case
               of a recording of an orchestra there will occur appear/disappear events of respective
               instruments. As can be easily seen such an approach to sound data stream results in
               variable bitrate, wherein the changes may be substantial. When there are not any sound
               events the bitrate will be close to zero while in case of multiple sound events the
               bitrate may be higher (even higher than in case of prior art surround systems).
 
            [0024] The loudspeakers may be located in an arbitrary way however preferably they should
               not be all placed in a single place, for example a single wall. According to the present
               invention the plurality of loudspeakers may be considered a cloud of loudspeakers.
               The more the loudspeakers the better spatial effect may be achieved. Preferably the
               loudspeakers are scattered in the presentation location, preferably on different walls
               of a room.
 
            [0025] The loudspeakers may be either wired or wireless and be communicatively coupled to
               a sound decoder according to the present invention. The decoder may use loudspeakers
               of other electronic devices as long as communication may be established with controllers
               of such speakers (eg, bluetooth or wi-fi communication with loudspeakers of a TV set
               or mobile device).
 
            [0026] The sound decoder according to the present invention may obtain information on location
               and characteristic of a given loudspeaker by sending to its controller a test sound
               stream and subsequently recording the played back test sound stream and analyzing
               the relevant acoustic response.
 
            [0027] For the purpose of obtaining information on location and characteristic of a given
               loudspeaker there may be used an array of omnidirectional microphones, for example
               spaced from each other by 10cm and positioned on vertices of a cube or a tetrahedron.
               By measuring delays in a signal reaching respective microphones, one may estimate
               sound location. The characteristics of a given loudspeaker may be obtained by analyzing
               recorded sound at different frequencies.
 
            [0028] Other methods for obtaining information on location and characteristic of a given
               loudspeaker include solutions presented in 
US20140112484 or in "Analysis of Loudspeaker Placement and Loudspeaker-Room Interaction, and Correction
               of Associated Effects" by Michael Hlatky of University of Applied Sciences Offenburg,
               Media and Information Technology, Bang & Olufsen a/s, Department of Acoustics, August
               2007.
 
            [0029] According to the present invention there are used sound reflections in order to generate
               sounds from directions where there is not any loudspeaker present. To this end the
               sound decoder executes sound location analysis aimed at using reflective surfaces
               (such as walls) to generate reflected sounds. All sound reflecting surfaces are divided
               into triangles and each of the triangles is treated by the decoder as a virtual sound
               source. Each triangle has an associated function defining dependence of a sound virtually
               emitted by this triangle on sounds emitted by physical loudspeakers. This function
               defines the amplitude as well as spatial characteristics of emission, which may be
               different for each physical loudspeaker. In order for the system to operate properly
               it is necessary to place, at a sound presentation location, microphones used by the
               sound decoder for constant measurements of compliance of the emitted sounds with expected
               sounds and for fine tuning the system.
 
            [0030] Such a function is a sum of reflected signals emitted by all loudspeakers in a room,
               wherein a signal reflected from a given triangle depends on the triangle location,
               loudspeaker(s) location(s), loudspeaker(s) emission characteristics, acoustic pressure
               emitted by the loudspeaker(s). The signal virtually emitted by the triangle will be
               a sum of reflection generated by all loudspeakers. A spatial acoustic emission characteristics
               of such triangle will depend on physical loudspeakers whereas each physical loudspeaker
               will influence it partially. Such characteristics may be discrete, comprising narrow
               beams generated by different loudspeakers. Therefore, in order to eliminate sound
               reflected at a given location, there has to be selected an appropriate loudspeaker
               or a linear combination of loudspeakers (appropriate means in line with the acoustic
               target eg. generating, from a given plane, a reflection in the direction of the listener
               such that other reflections do not ruin the effect).
 
            [0031] The most important module of the system is a local sound renderer. This means that
               the renderer receives separate sound events and composes from them acoustic output
               streams that are subsequently sent to loudspeakers.
 
            [0032] Due to the fact that the sound events comprise information on location of sound sources
               with respect to a reference location (for example the listener), the renderer shall
               select a speaker or speakers, which is/are closest to the location in space where
               the sound was emitted from. In case a speaker is not present in that location, speakers
               adjacent to this location shall be used, preferably speakers located at opposite sides
               of the location so that they may be configured in order to create an impression for
               the listener that the sound is emitted from its original location in space.
 
            [0033] More than two loudspeakers may be used for one sound event in particular when a virtual
               sound source is to be positioned between them.
 
            [0034] In case there are not any physical loudspeakers in the vicinity of the location (direction)
               of the sound of a sound event, reflections from adjacent planes (such as walls) may
               be used to position the sound. Knowing a sound reflection function for a given reflective
               section optimal physical loudspeakers need to be chosen for generating the reflection
               effect.
 
            [0035] The reference point location may be differently selected for a given sound rendering
               location or room. For example one may listen to the music in an armchair and watch
               television sitting on a sofa. Therefore, there are two different reference locations
               depending on circumstances. Consequently, the coordinates system changes. The reference
               location may be automatically obtained by different sensors such as an infrared camera
               or manually input by the listener. Such solution is possible only because of local
               sound rendering.
 
            [0036] An exemplary normalized characteristics of a physical loudspeaker is shown in Fig.
               1 B. The characteristic is usually symmetrical and described with s points whereas
               u describes a shape of the sound beam in the horizontal plane while 
v respective shape in the vertical plane. Such characteristics may be determined using
               an array of microphones as previously described.
 
            [0037] In case of reflection, characteristic can be asymmetrical and discontinuous.
 
            [0038] Fig. 2 presents a diagram of the method according to the present invention. The method
               starts, after receiving a sound data stream according to Fig. 1, at step 201 from
               accessing a database of loudspeakers present at sound presentation location. Subsequently,
               at step 202, there is executed calculating, which loudspeakers may be used from the
               available loudspeakers so as to achieve the effect closest to a perfect arrangement.
               This may be effected by location thresholding based on the database of loudspeakers
               records.
 
            [0039] Such calculation needs to be executed for each sound event because sound events may
               run in parallel and the same loudspeaker(s) may be needed to emit them. Data for each
               loudspeaker has to be added by applying superposition approach (all sound events at
               a given moment of time that affect a selected loudspeaker).
 
            [0040] In case a loudspeaker is close to a location in which a sound source is located,
               this loudspeaker will be used. In case the sound source is located between physical
               loudspeakers then the closest loudspeakers will be used in order to simulate a virtual
               loudspeaker, located where the sound source is located. A superposition principle
               may be applied for this purpose. It is necessary to take into account, during this
               process, the emission characteristics of the loudspeakers.
 
            [0041] The physical loudspeakers selected for simulating a virtual loudspeaker, will emit
               sound in direction of the listener at predefined angles of azimuth and elevation.
               For these angles there is to be read attenuation level from the emission characteristic
               of the loudspeaker (the characteristics is normalized and therefore it will be a number
               from a range of 0 ... 1) and multiplied by emission strength of the loudspeaker (acoustic
               pressure). Only after that, superposition may be executed. The signals are to be added
               by assigning weights to loudspeakers, the weights arising from location of a virtual
               loudspeaker with respect to these used to its generation (based on proportionality
               rule).
 
            [0042] The calculations shall include not only the direction from which a sound event is
               emitted but also a distance from the listener (i.e. a delay of the signal in such
               a way so as to simulate the correct distance from the listener to the sound event).
               The properly selected loudspeakers surround the sound event location. There may be
               more than two selected loudspeakers that will emit a particular sound event data.
 
            [0043] At step 203 there is calculated an angular difference between sound source location
               and positions of the candidate loudspeakers in spherical coordinates. The sound event
               location is:
               
               
rssi - a distance of the i-th sound event location from the listener;
               γi - azimuth on the i-th sound event location
               δi - elevation angle of the i-th sound event
               and the loudspeaker location is:
               
               
rsj- a distance of the j-th loudspeaker location from the listener;
               γj - azimuth on the j-th loudspeaker location
               δj- elevation angle of the j-th loudspeaker
 
            [0044] Thus the angular difference is as follows: 
 
 A set of loudspeakers that have the lowest distance from the sound event location
               are selected at step 204. The loudspeakers are to be located at opposite sides (when
               facing the reference location of a user) with respect to the sound event location
               so that the listener has an impression that the sound arrives from the sound event
               location.
 
            [0045] Subsequently, at step 205, in case of insufficient number of physical loudspeakers
               there may be created one or more virtual loudspeaker(s). Reflection of sound is utilized
               for this purpose. The reflections are generated by physical loudspeakers so that they
               imitate a physical loudspeaker in a given location of the sound presentation location.
               The generated sound will reflect from a selected surface and be directed towards the
               listener.
 
            [0046] Knowing the location of the virtual loudspeaker, a straight line is to be virtually
               drawn from the listener to this location and further to a reflective plane (such as
               a wall). A point indicated as an intersection of this line with the reflective plane
               will indicate a triangle on the reflective plane, which is to be used in order to
               generate a reflected sound. From the characteristics of emission of that triangle
               it needs to be read which physical loudspeakers are to be used. Subsequently, there
               needs to be used a function defining dependency of emission of the triangle from particular
               loudspeakers in order to generate data streams 206 that are to be sent to physical
               loudspeakers in order to achieve a reflected sound from that particular triangle.
               These data stream are to be added to other data emitted by the respective loudspeakers
               207.
 
            [0047] Fig. 3 presents a diagram of the system according to the present invention. The system
               may be realized using dedicated components or custom made FPGA or ASIC circuits. The
               system comprises a data bus 301 communicatively coupled to a memory 304. Additionally,
               other components of the system are communicatively coupled to the system bus 301 so
               that they may be managed by a controller 305.
 
            [0048] The memory 304 may store computer program or programs executed by the controller
               305 in order to execute steps of the method according to the present invention.
 
            [0049] The system comprises a sound input interface 303, such as an audio/video communication
               connector eg. HDMI or communication connector such as Ethernet. The received sound
               data is processed by a sound renderer 302 managing the presentation of sounds using
               the listener's premises loudspeakers setup. The management of the presentation of
               sounds includes virtual loudspeakers management that is effected by a virtual loudspeakers
               module 307 operating according to the method described above.
 
            [0050] Figs 4A - 5B depict audio data packets that are multiplexed in an output audio data
               stream by a suitable encoder. The audio data stream may comprise a header and packets
               of acoustic data (for example sound event 101 data packet). The packets are preferably
               multiplexed in a chronological order but some shifts of data encoding/decoding time
               versus presentation time are allowable since each packet of acoustic data comprises
               information regarding its presentation time and must be received sufficiently ahead
               of that presentation.
 
            [0051] The header may for example define a global sampling frequency and samples resolution.
 
            [0052] Audio data stream may comprise acoustic events as shown in Fig. 4A. All properties
               of a sound event 101 are maintained with an addition of a language field that identifies
               audio language, for example with a use of an appropriate identifier. In case more
               than one language version is present, the acoustic event packets of different languages
               audio will differ by language identifier 401 and audio samples data 107, 108, 109.
               The remaining packet data fields will be identical between the respective audio language
               versions. An audio renderer will output only packets related to a language selected
               by a user.
 
            [0053] Fig. 4B presents a special sound event packet which is a textual event packet. Instead
               of sound samples this packet comprises a library identifier 401 and a textual data
               field 403. Such textual data may be used to generate sound by a speech synthesizer.
               The library identifier may select a suitable voice of speech synthesizer to be used
               by the sound renderer as well as provide processing parameters for the renderer.
 
            [0054] Optionally, the textual event packet may comprise a field specifying emotions in
               the textually defined event such as whisper, scream, cry or the like. Further, a field
               of a person's characteristics may be defined such as gender, age, accent or the like.
               Thus, the generation of sound may be more accurate.
 
            [0055] As another option, the textual event packet may comprise a field defining tempo.
               In particular this field may define speech synthesis timing, such as length of different
               syllables and/or pauses between words.
 
            [0056] The aforementioned has the advantage of data reduction since textual data consume
               far less data than compressed audio samples data.
 
            [0057] Similarly, Fig. 5A defines a Synthetic Non-verbal Event Packet. Instead of sound
               samples and language field, this packet comprises at least one code in the data field
               408 and a library selection field 402 referring to a music synthesizer library. The
               codes configure a music synthesizer. Thereby sounds are generated locally based on
               codes thus saving transmission bandwidth.
 
            [0058] Such synthesizers are usually based on built in sound libraries used for synthesis.
               By their nature such libraries are limited, therefore it may be necessary to transmit
               to a receiver such a library so that a local library may be changed. This allows for
               achieving an optimal acoustic effect. Such a Synthetic Library Packet has been presented
               in Fig. 5B. The library comprises an identifier 404, language identifier 405 and audio
               samples data 406. The library may further be extended with additional data depending
               on applied synthesizers. A synthetic non-verbal event packet may reference such library
               by identifying a specific sample and its parameters if applicable.
 
            [0059] Optionally, the textual event packets and/or synthetic non-verbal event packets may
               comprise a filed defining volume of the sound to be synthesized.
 
            [0060] In one embodiment, in case of textual event packets and synthetic non-verbal event
               packets, the renderer interprets data (text or command) with built-in synthesizers
               and creates dynamic acoustic events packets that are subject to final sound rendering
               just as regular acoustic event packets.
 
            [0061] The presence of textual event packets and synthetic non-verbal event packets allows
               for radical decrease of bandwidth required for transmission of audio data. In turn
               the synthetic library packet requires some bandwidth but allows to increase synthesis
               quality and still does not require as much data as regular audio samples recorded
               in real time.
 
            [0062] The present invention related to recording, encoding and decoding of sound in order
               to provide for surround playback independent of loudspeakers setup at the sound presentation
               location. Therefore, the invention provides a useful, concrete and tangible result.
 
            [0063] The aforementioned recording, encoding and decoding of sound takes place in special
               systems and processes sound data. Therefore the machine or transformation test is
               fulfilled and that the idea is not abstract.
 
            [0064] It can be easily recognized, by one skilled in the art, that the aforementioned method
               for generating surround sound may be performed and/or controlled by one or more computer
               programs. Such computer programs are typically executed by utilizing the computing
               resources in a computing device. Applications are stored on a non-transitory medium.
               An example of a non-transitory medium is a non-volatile memory, for example a flash
               memory while an example of a volatile memory is RAM. The computer instructions are
               executed by a processor. These memories are exemplary recording media for storing
               computer programs comprising computer-executable instructions performing all the steps
               of the computer-implemented method according the technical concept presented herein.
 
            [0065] While the invention presented herein has been depicted, described, and has been defined
               with reference to particular preferred embodiments, such references and examples of
               implementation in the foregoing specification do not imply any limitation on the invention.
               It will, however, be evident that various modifications and changes may be made thereto
               without departing from the broader scope of the technical concept. The presented preferred
               embodiments are exemplary only, and are not exhaustive of the scope of the technical
               concept presented herein.
 
            [0066] Accordingly, the scope of protection is not limited to the preferred embodiments
               described in the specification, but is only limited by the claims that follow.
 
          
         
            
            1. Signal umfassend Klangereignisse (101) wobei das Klangereignis (101) umfasst:
               
               
• Zeit der Ereignisinformation (102);
               
               • Information hinsichtlich des räumlichen Ort in Bezug auf einen Bezugsortpunkt (103);
               
               • einen Bewegungsverlauf im Raum (104);
               
               • Ausrichtungsinformation (105);
               dadurch gekennzeichnet, dass das Signal ferner mindestens drei Klangereignisdaten umfasst, umfassend mindestens
               eine akustische Klangereignisdaten, umfassend
               
               
• räumliche Merkmale der Quelle des Ereignisses (106), umfassend räumliche Merkmale
                  der Klangausgabe einer verknüpften Klangquelle, definiert als eine Reihe von Punkten
                  der räumlichen Merkmale in horizontaler und vertikaler Ebene;
               
               • Information zu Abtastfrequenz (107);
               
               • Information zu Signalauflösung (108); und
               
               • eine Reihe von akustischen Proben (109) der Abtastfrequenz (107) und mit der Signalauflösung
                  (108);
               mindestens ein textliches Klangereignis, die Daten ferner umfassend
               
               
• einen Bibliothek-Identifikator (402) und ein textliches Datenfeld (403), wobei die
                  textlichen Daten verwendet werden sollen, um Klang durch einen Sprachgenerator zu
                  erzeugen;
               und mindestens ein synthetisches nicht-verbales Klangereignis, die Daten ferner umfassend
               
               
• mindestens eine Codedaten (408) und ein Bibliothek-Auswahlfeld (402) in Bezug auf
                  eine Musikgenerator-Bibliothek, wobei der mindestens eine Code zum Konfigurieren eines
                  Musikgenerators ist.
  
            2. Signal nach Anspruch 1, dadurch gekennzeichnet, dass es ferner ein synthetisches Bibliothek-Paket umfasst, umfassend einen Identifikator
               (404), Sprachidentifikator (405) und Audioprobendaten (406), die durch mindestens
               ein synthetisches nicht-verbales Klangereignis referenziert sind.
 
            3. Signal nach Anspruch 1, dadurch gekennzeichnet, dass das mindestens eine textliche Klangdatenereignis ferner ein Feld umfasst, das Emotionen
               im textlich definierten Ereignis detailliert.
 
            4. Signal nach Anspruch 1, dadurch gekennzeichnet, dass die mindestens einen textlichen Klangereignisdaten ferner ein Feld von Merkmalen
               einer Person umfasst.
 
            5. Signal nach Anspruch 1, dadurch gekennzeichnet, dass die mindestens einen textlichen Klangereignisdaten textlichen Klangereignisdaten
               und/oder die mindestens einen synthetischen nicht-verbalen Ereignisdaten ferner eine
               abgelegte Definitionslautstärke des Klangs umfasst, der synthetisiert werden soll.
 
            6. Signal nach Anspruch 1, dadurch gekennzeichnet, dass die mindestens einen textlichen Klangereignisdaten ferner ein Feld umfassen, das
               das Tempo definiert, umfassend Information zu Sprachsynthese-Timing, einschließlich
               Länge von Silben und/oder Pausen zwischen Wörtern.
 
          
         
            
            1. Signal comprenant des évènements sonores (101), chacun desdits évènements sonores
               (101) comprenant :
               
               
• des informations sur l'heure de l'événement (102) ;
               
               • des informations sur le lieu dans l'espace relativement à un emplacement de référence
                  (103) ;
               
               • une trajectoire de déplacement dans l'espace (104) ;
               
               • des informations sur l'orientation (105) ;
               caractérisé en ce que le signal comprend en outre au moins trois données d'évènement sonore, comprenant
               au moins des données sur un événement sonore acoustique, y compris :
               
               
• une caractéristique spatiale de l'origine de l'événement (106), comprenant des caractéristiques
                  spatiales de l'émission sonore d'une origine sonore connexe, définies comme un ensemble
                  de points de la caractéristique spatiale dans des plans horizontaux et verticaux ;
               
               • des informations sur la fréquence d'échantillonnage (107) ;
               
               • des informations sur la résolution du signal (108) ; et
               
               • un ensemble d'échantillons acoustiques (109) de la fréquence d'échantillonnage (107)
                  et à la résolution du signal (108) ;
               au moins un événement sonore textuel, les données comprenant en outre :
               
               
• un identifiant de bibliothèque (402) et un champ de données textuelles (403), les
                  données textuelles devant être utilisées pour la production d'un son par un synthétiseur
                  de parole ;
               et au moins des données sur un événement sonore synthétique non verbal comprenant
               en outre
               
               
• au moins des données sur un code (408) et un champ de sélection de bibliothèque
                  (402) se rapportant à une bibliothèque de synthétiseurs musicaux, le code au nombre
                  d'au moins un servant à configurer un synthétiseur musical.
  
            2. Signal selon la revendication 1, caractérisé en ce qu'il comprend un paquet de bibliothèque synthétique, comprenant un identifiant (404),
               un identifiant linguistique (405), des données d'échantillons audio (406) référencées
               par au moins un événement sonore non verbal synthétique.
 
            3. Signal selon la revendication 1, caractérisé en ce que les données d'un événement sonore textuel au nombre d'au moins un comprennent en
               outre un champ spécifiant des émotions dans l'événement défini textuellement.
 
            4. Signal selon la revendication 1, caractérisé en ce que les données d'un événement sonore textuel au nombre d'au moins un comprennent en
               outre un champ spécifiant les caractéristiques d'un sujet.
 
            5. Signal selon la revendication 1, caractérisé en ce que les données d'un événement sonore textuel au nombre d'au moins un et/ou les données
               d'un événement non verbal synthétique au nombre d'au moins un comprennent un champ
               définissant le volume du son à synthétiser.
 
            6. Signal selon la revendication 1, caractérisé en ce que les données d'un événement sonore textuel comprennent un champ définissant une cadence,
               comprenant des informations sur le minutage de synthèse de la parole, y compris la
               longueur des syllabes et/ou les pauses entre les mots.