Audio signal, method and apparatus for encoding or transmitting the same and method and apparatus for processing the same

(19)

(11)

EP 2 094 032 A1

(12)	EUROPEAN PATENT APPLICATION

(43)	Date of publication:
	26.08.2009 Bulletin 2009/35

(21)	Application number: 08101732.9

(22)	Date of filing: 19.02.2008

(51)

International Patent Classification (IPC):

H04S 3/02^(2006.01)

(84)	Designated Contracting States:
	AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MT NL NO PL PT RO SE SI SK TR
	Designated Extension States:
	AL BA MK RS

(71)	Applicant: Deutsche Thomson OHG
	30625 Hannover (DE)

(72)	Inventors:
	Batke, Johann-Markus 30161 Hannover (DE) Eilts-Grimm, Klaus 21339 Lüneburg (DE) Schmidt, Jürgen 31515 Wunstorf (DE)

(74)	Representative: Rittner, Karsten
	Deutsche Thomson OHG European Patent Operations Karl-Wiechert-Allee 74 30625 Hannover 30625 Hannover (DE)

(54)	Audio signal, method and apparatus for encoding or transmitting the same and method and apparatus for processing the same

(57) Commonly used audio file formats relate directly to the audio channels, i.e. the audio information is stored like it is fed to the loudspeaker. However, for obtaining a general description of audio information, a paradigm shift is necessary. Instead of storing the loudspeaker channels, a new audio format may use the description of the spatial sound field. A method for encoding or transmitting an audio signal comprises the steps of providing one or more audio source signals (X_src), determining for each of said audio source signals a specific position to which it relates, generating data sets containing the determined positions (r_src) of the audio sources, and encoding or transmitting (62) the data sets together with said audio source signals. Advantageously, this allows conversion into Higher Order Ambisonics (HOA) format and subsequent re-conversion into conventional audio data that can be adapted to a given loudspeaker configuration.

Description

Field of the invention

[0001] This invention relates to a generic audio signal format, a method and an apparatus for encoding or transmitting and a method and an apparatus for processing the same.

Background

[0002] Commonly used audio file formats relate directly to the audio channels, i.e. the audio information is stored like it is fed to the loudspeaker. The playback of such files is strictly bound to correct positioning of the loudspeakers. Audio formats like 2.0 (stereo) and 5.1 (surround sound) are able to reproduce a spatial impression of the audio content. However, this spatial impression is strictly two-dimensional (2D). Also extensions of these audio formats with higher number of audio channels like 7.1 or 9.1 stick to 2D sound field representation. The newly developed 22.2 format¹ is a format capable of representing audio content with height information.
¹ K.Hamasaki, T. Nishiguchi, R.Okumaura, Y.Nakayama: "Wide listening area with exceptional spatial sound quality of a 22.2 multichannel sound system" (Audio Engineering Society Preprints, Vienna, Austria, May 2007)

[0003] For obtaining a general description of audio information, a paradigm shift is necessary. Instead of storing the loudspeaker channels, a new audio format may use the description of the spatial sound field. A known solution for spatial sound field description is based on Higher Order Ambisonics (HOA), a technology that describes spatial sounds fields using the coefficients of the FOURIER-BESSEL series (also known under different names²). The possible spatial resolution using this description is determined by the order N of the series. This representation is very flexible and can hold any type of audio information, e.g. traditional stereo signals or surround sound. Loudspeaker channels are treated as a source at a distinct position (e.g. the loudspeaker's position). It is generally known in the art how to convert conventional audio signals into HOA representations and vice versa, whereby audio signals' positioning information is required. However, HOA files containing traditional audio formats may result in a higher number of audio channels than the original file. Therefore traditional sound representations instead of HOA are usually used.
² e.g. Jerome Daniel: Spatial Sound Encoding Including Near Field Effect: Introducing distance Coding Filters and a Viable, New Ambisonic Format. AES 23rd International Conference, Copenhagen, Denmark, 2003

Summary of the Invention

[0004] It would be desirable to reproduce sound as close as possible to the original sound source, using a given loudspeaker configuration, and optimizing the possibilities of the given loudspeaker configuration. Further, it would be desirable to have a sound representation that can be adapted to different actual loudspeaker configurations, so that sound can be reproduced optimally in any case. According to one aspect of the invention, an Ambisonics representation and particularly a Higher Order Ambisonics representation would enable such optimized playback of sound. It would also be desirable to minimize the effort for encoding, decoding and transcoding to and from Ambisonics representations, which also would minimize a loss of quality.

[0005] According to one aspect of the invention, a conventional audio signal is enhanced by additional data or metadata, wherein the additional data comprise sound source position information that enables conversion of the conventional audio signal into a Higher Order Ambisonics (HOA) representation of the sound field. Advantageously, this allows subsequent re-conversion into conventional audio data that can be adapted to a given loudspeaker configuration (e.g. during said re-conversion).

[0006] According to another aspect of the invention, a method for encoding or transmitting an audio signal comprises steps of providing one or more audio source signals, determining respective position information and encoding or transmitting the audio source signals together with metadata that comprise the determined audio source position information.

[0007] According to yet another aspect of the invention, the method for encoding or transmitting an audio signal further comprises steps of determining the number of source channels that are required for Ambisonics encoding, said number being (2N+1) for the 2D case and (N+1)² for the 3D case (N is the order of Ambisonics encoding), determining the number of available transmission or storage channels, and comparing the number of source channels required for Ambisonics encoding of the order N with the number of available transmission or storage channels, depending on said comparison, generating a mode decision information having either a first value if the number of source channels required for Ambisonics encoding of the order N is not less than the number of available transmission or storage channels, or having a different second value otherwise, and generating, and storing or transmitting, the Ambisonics encoded version of the audio signal if the mode decision information has said first value, or otherwise storing or transmitting the received or retrieved audio signal. In one embodiment where the Ambisonics encoded version of the audio signal is transmitted, the mode decision information will also be transmitted. The order N may be determined as a function of a target number of reproduction channels, or from the number of available transmission or storage channels.

[0008] According to a further aspect of the invention, a method for processing an audio signal comprises steps of receiving or retrieving from storage an encoded audio signal, extracting first audio source signals and additional information from the received or retrieved signal, wherein the first audio source signals relate to first audio source positions provided by the additional information, transforming the first audio source signals relating to first audio source positions into second audio source signals relating to different second audio source positions, and supplying said second audio source signals for storage or playback.

[0009] According to one aspect of the invention, in the method for processing an audio signal, the step of transforming the first audio source signals into second audio source signals comprises generating an Ambisonics representation of the sound field from a conventional audio source signal and said additional information describing the positions of sound sources, wherein the Ambisonics signal can be of higher order (HOA).

[0010] Corresponding apparatuses that utilize the methods are disclosed in the following detailed description.

[0011] Advantageous embodiments of the invention are disclosed in the dependent claims, the following description and the figures.

Brief description of the drawings

[0012] Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in

Fig.1 a general audio production chain;

Fig.2 conventional loudspeaker setup for playback in stereo and surround sound;

Fig.3 the principle of Ambisonics encoding;

Fig.4 an encoder according to one embodiment of the invention;

Fig.5 a decoder according to one embodiment of the invention;

Fig.6a an audio processing system according to one embodiment;

Fig.6b an audio processing system according to another embodiment; and

Fig.7 an audio transmission system according to one embodiment.

Detailed description of the invention

[0013] In a general audio production chain, as shown in Fig.1, acquisition of audio signals is achieved by one or more microphones M. The audio signals are encoded E, stored S and later decoded D for reproduction via one or more loudspeakers LS. Conventionally, each of the audio signals in the decoded signal relates to a particular loudspeaker. E.g. in a stereo setup, also denominated as 2.0 (since it has two direction-related audio channels and audio no channel that is not direction-related), the audio signals relate to the left and right microphones and loudspeakers.

[0014] Fig.2 a) shows a conventional loudspeaker setup for playback in stereo, and Fig.2 b) a conventional loudspeaker setup for surround sound, also known as 5.0 format. It is a convention that an angle of 60° must be between the two stereo loudspeaker boxes in order to reproduce the audio signal in the best possible manner. Similarly, in the 5.0 format the optimal angles between loudspeakers are subject to convention. Thus, the respective audio signals relate to specific relative positions that cannot be changed for a given reproduction system. That is, loudspeaker boxes need to be positioned according to these fixed relative positions in order to optimize the sound reproduction.

[0015] Using Ambisonics, and particularly HOA, as general representation of audio content has the following advantages. First, the audio content is independent from the loudspeaker setup. Thus, it has to be processed to match a given setup, wherein it will be optimized to match this setup. Second, 3D representation and high spatial resolution of audio content is fully supported.

[0016] A general Ambisonics based system is shown in Fig.3. A microphone array MA acquires the signals in a spatial manner. Position information P describing the microphone positions is added in an encoder E, which generates an Ambisonics representation 30 of given order N of the signals. This signal can be transcoded TR into a conventional audio signal having a desired number of channels that relate to desired positions of the loudspeakers. The localization of the different loudspeaker channels is the better, the higher the order N of the Ambisonics representation was. However, the order N and the spatial positions (2-dimensional or 3-dimensional) have also an impact on the number of channels that the Ambisonics signal 30 requires, as described below. The conventional audio signal can be reproduced LSA on a loudspeaker array that may but need not correspond to the microphone array. However, positions of the loudspeakers must be known for transcoding the signal.

[0017] There are various aspects of the invention. In one aspect, a receiver performs a conversion from a given audio source arrangement to a required audio target arrangement, such as an individual loudspeaker arrangement. An encoder or transmitter provides a conventional audio signal with one or more microphone/loudspeaker related channels (such as 5.0) and attached position information that defines the positions of the microphone/loudspeaker of each channel. At the receiver side this signal can be converted to an Ambisonics representation, and in particular to a HOA (Higher Order Ambisonics) representation. This signal can be stored or transcoded/re-mapped for a desired channel and position configuration (e.g. according to an actual loudspeaker configuration or a particular configuration desired for other reasons). It is possible to select at the receiver side whether the Ambisonics representation or the conventional representation with additional position information shall be stored or further processed.

[0018] Advantages of this aspect are that the transmitted/received signal is backward compatible, since it can be decoded by conventional receivers that ignore the additional metadata information, and that the transmitted/received signal uses practically the same bandwidth than a conventional audio signal, since the additional metadata information is very little compared to the audio information (although it may be transmitted frequently, e.g. in fixed time intervals such as once every second, or every k audio frames).

[0019] In another aspect of the invention, the conversion from a given audio source arrangement into a HOA representation can be performed before transmitting, so that either the conventional audio signal plus position information or the generic HOA representation of the audio signal is transmitted. The latter is preferred if the required number of transmission channels is equal for both formats. The transmission signal comprises a mode indication showing whether the audio format is HOA or conventional, because two different formats are possible. The receiver extracts and evaluates the indication and performs the further processing according to the mode indication.

[0020] An audio processing system according to one embodiment of the invention is shown in Fig.6 a). A signal as described above, having one or more audio source signals X_src and position information r_src giving the positions of the audio source signals is received, and multiplexed 62 into a common signal 60. This signal can be stored (not shown) or transmitted, e.g. between different devices in a network. The signal is demultiplexed 63, wherein the audio source signals X'_src and position information r'_src are regained, and these are input into an Ambisonics encoder 64 that generates an Ambisonics signal 61. The order of this signal may be determined according to the number of available positions r'_src, but can also be influenced by available storage area and/or processing bandwidth. In particular, it is advantageous that the Ambisonics encoder 64 can be a HOA encoder, i.e. N>1. The Ambisonics signal is fed into a transcoder TR and there re-mapped to a given loudspeaker configuration, and output to conventional multi-channel audio processing and loudspeakers LSA.

[0021] One advantage of this processing system is that the step of re-mapping can easily be adapted to the actual loudspeaker configuration, so that after a change in this configuration the optimization can also be changed according to the new loudspeaker number and/or positions. E.g. when a new loudspeaker is added to the reproduction system, its position information is provided to the transcoder and the Ambisonics signal can be re-mapped to match the new configuration.

[0022] The position information can be provided by user input (e.g. using a GUI), or by automatic loudspeaker position measuring systems. E.g. relative loudspeaker positions can be determined by reproducing a reference signal at a known position and measuring the different signal run-times. For the 3D case, reproduction of three reference signals at three distinct known positions can be used for automatic loudspeaker location.

[0023] An audio processing system according to another embodiment of the invention is shown in Fig.6 b). The signal 60 being composed of one or more audio source signals X_src and position information r_src giving the positions of the audio source signals is received and demultiplexed 62 into its components. It is possible to select S₆ whether these components or the HOA encoded signal representation 61 shall be used for the further processing, such as optional storing 66, transcoding and multi-channel audio processing. As described above, it may be advantageous to store the generic HOA representation, depending on the application. The selection signal 65 may depend on the parameters mentioned below, such as required order N, number of source positions or number of target (loudspeaker) positions.

[0024] The following section gives a brief overview on encoding and decoding HOA signals.

[0025] In the following, the spatial positions refer to a spherical coordinate system. The distance of sources (audio signals on the encoding side, loudspeakers on the playback side) is not taken into account for the sake of clarity. However, it is easily integrated to this encoding scheme using a known distance coding scheme, e.g. that of Jerome Daniel (reference cited above).

[0026] HOA encoding of audio signals is be done using

where Ψ is the mode matrix, w holds the speaker signals and A are the resulting HOA coefficients. The HOA coefficients in A are arranged in this order:

Vector A holds

elements. The speaker signals w are arranged as

where L is the number of loudspeakers. As an example, a stereo signal is simply described as w = [W₁ (t), W₂ (t)]^T with left and right channel respectively. The mode matrix Ψ finally contains

where Ψ_i with i = 1...L are the mode vectors for the individual speaker positions containing

[0027] The directional position of the individual speakers is given by θ_i ,φ_i in spherical coordinates,

is the spherical harmonic function. The position of the speaker is referred as r_i = (r_i ,θ_i ,φ_i). As an example, the stereo setup of two loudspeakers is described by r_left = (r, 90°, -30°) and r_right = (r, 90°, 30°), where r denotes the speaker distance in meter, 90° is the declination angle and 30° is the azimuth angle.

[0028] The decoding of the HOA coefficients A is done using

where D is the decoding matrix. It is chosen to pseudo inverse matrix

where t denotes the conjugate complex matrix transform. The property of the pseudo inverse

with I denoting the identity matrix ensures the proper reconstruction of w.

[0029] The HOA coefficient transmission is usually done by transmission of the individual vector elements of A resulting from the Fourier-Bessel representation. This results in a possibly higher number of channels than formerly required and very high numbers of channels for high orders. Existing ideas for HOA usage inside audio formats are therefore limited to an order of N = 1.

[0030] The invention results in a new audio format that is backward compatible to existing audio content. It is capable of holding audio content with full 3D information and any high spatial resolution, and therefore it is forward compatible with any audio content.

[0031] Using a pure HOA signal representation to carry standard audio formats like 2.0 and 5.1 and some others has the disadvantage of a higher number of channels required for sound field representation. Therefore, in one embodiment of the invention, a parametric approach is suggested driven by the following: Stereo (2.0) is 2D and two audio channels are required, transport using HOA however requires O = 3 channels. Surround sound (5.1) is 2D and 6 audio channels are required, transport using HOA however requires O = 7 channels.
New formats are capable of representing 3D information. As an example, 22.2 format requires 24 audio channels. For 3D HOA representation a number of O = 25 is necessary. The invention aims to provide an efficient solution to this problem: for 2.0 and 5.1 audio content it is generally less expensive in terms of channel/storage capacity to transmit/store the original audio information and additionally the locations of the sources. The result is a parameterised HOA signal representation at lowest cost. The full HOA representation is calculated on the receiver side, if necessary. Thus, a smooth mode selection is proposed that allows selection of the best possible format. The HOA representation also provides the advantage of scalability. A format like 22.2 requires 24 audio channels, as stated above. The order N defines the spatial resolution. To reduce the number of channels, the spatial resolution could be diminished. E.g. a HOA representation of order N = 3 would require only O = 16 channels. Generally, the HOA representation is scalable in terms of the spatial resolution.

[0032] An audio channel of a traditional audio format like stereo is viewed upon here as an audio source with a distinct position. An exemplary audio file format for HOA coefficients carrying several audio sources can be generated as follows:

1. Determine whether the arrangement of signal positions is plain (2D) or spatial (3D). Also the listener's position can be taken into account. The signal comprises O_src channels, which is 2 in the stereo case. Depending on this information the necessary order N is calculated using

Using this order N in turn yields

[0033] O_s source channels are required for HOA coefficient representation (see eq.3).

2. The minimum size of channels for transport or storage is achieved as follows:

[0034] If O_s < O_c, then store a file containing

(a) position of sources (example see above)
(b) is the arrangement 2D or 3D (implicitly given by source positions)
(c) order N (implicitly given by number of sources)
(d) unprocessed audio signals
otherwise do HOA encoding of the audio signals using order N as described above and store HOA coefficients.
The result is an audio file with the minimum number of required channels. Depending on the playback device, the audio content is HOA encoded using the additionally stored parameters and then transcoded to the loudspeaker setup, or it is played back unprocessed (e.g. stereo content on a stereo device). In one embodiment, the file is converted into a signal for transmission using the following steps:

3. If O_s < Oc, the signal is multiplexed using a first multiplexer MX1, else the signal is HOA encoded.

4. The result of the former step is multiplexed with a mode indication indicating the result of condition O_s < Oc (i.e. the encoding mode) using a multiplexer MX2.

[0035] In one embodiment, decoding of this new signal is done as follows:

1. To extract the encoding mode information (i.e. O_s < O_c), DMX2 is used.
2. If O_s < Oc is true (i.e. HOA encoding mode), the signal is demultiplexed using DMX1, after that it is HOA encoded. In the other case it can be transcoded directly.

[0036] Another aspect of usage of the invention is described in the following. The goal is to use a given number of channels available for transport in an optimal way. It is assumed that the number of source channels O_src is higher than the number of available channels.

[0037] A vector r_src holds all positions of source channels, e.g. L sources are described using the positions

with positions r_i = (r_i ,θ_i ,φ_i) of source number i. All positions are assumed to be different from each other (otherwise the situation is trivial, since two sources with same position can be added into one). Using spherical coordinates is not mandatory, though.

[0038] A vector x_src holds all channels belonging to the source, e.g. L time signals

[0039] The integer number O_chan defines the number of available channels. The order N available for a HOA description of signal vector x_src is calculated using

[0040] Using this order N in turn yields

[0041] This is the number of channels to use to describe a HOA signal with order N as calculated above. Encoding of a signal x_src according to eq.11 with positional description r_src, as described by eq.10, is done as follows. In this case the signal is adapted to the channel properties.

1. Use all positions in r_src to determine if the arrangement is 2D or 3D.
2. Using this result and the given number O_chan of transport channels, the HOA representation of the source with maximum spatial resolution requires Oc signals following eq.13.
If Os > Oc, this encoding ensures usage of given channel with maximum possible spatial resolution of the audio sources.

[0042] Mixing different HOA representations with different orders is possible. This is useful if different audio contents are encoded with different effort regarding spatial resolution. E.g. for a computer game, environment noise needs only low spatial resolution, whereas the audio information of the actor in the game should be encoded with high resolution.

[0043] An encoder according to one embodiment of the invention is shown in Fig.4. Audio source signals X_src and position information r_src are provided, as described above, and can be encoded in two different modes: either they are multiplexed MX1 into a common data stream, so that a receiver is enabled to generate a HOA representation (since all the data necessary for a HOA representation are included), or a HOA representation is generated HOA_e1 before transmission. Depending on a mode selection signal MD it is possible to select S₁,S₂,S₃ one of the two modes. This mode decision signal is obtained by the above-described comparison CMP between the required number of channels O_S and the available number of channels O_C. The latter may be fixed or given. The required number of channels O_S is determined in a block eq4 according to equation 4 above, using the result of the block eq3 that performs equation 3 above, and a spatial arrangement information 2D3D indicating whether the spatial arrangement is 2-dimensional or 3-dimensional. The spatial arrangement information 2D3D is also an input to the block eq3 that performs equation 3. Finally, the mode decision information MD is multiplexed MX2 into the output data stream A_enc so that all necessary information for proper decoding is contained.

[0044] A decoder according to one embodiment of the invention is shown in Fig.5. A signal A_enc as encoded by the encoder of Fig.4 is demultiplexed DMX2 so that the mode decision information MD' is obtained. Depending on this information MD' the remaining signal is either demultiplexed into its audio and position components X'_src,r'_src and then HOA encoded HOA_e (if it was not HOA encoded), or it is directly used if it is already HOA encoded. Switching means S₄,S₅ controlled by the mode decision information MD' switch between these modes. The HOA encoded signal is provided to a transcoder, as described above.

[0045] In one embodiment, a device for encoding or transmitting an audio signal, comprises means for providing one or more audio source signals, means for determining for each of said audio source signals a specific position to which it relates, means for generating data sets containing the determined positions of the audio sources, and means for encoding or transmitting the data sets together with said audio source signals.

[0046] In one embodiment, said data sets are suitable for calculating a generic audio field representation based on Ambisonics representation.

[0047] In one embodiment, the device further comprises means for determining the number (O_s) of source channels that are required for Ambisonics encoding, said number being (2N+1) for the 2D case and (N+1)² for the 3D case, where N is the order of Ambisonics encoding, means for determining the number (O_c) of available transmission or storage channels,
means for comparing the number (O_s) of source channels required for Ambisonics encoding of the order N with the number (O_c) of available transmission or storage channels; means for generating, depending on said comparison, a mode decision information (MD), having a first value if the number (O_s) of source channels required for Ambisonics encoding of the order N is not less than the number of available transmission or storage channels (O_c), or having a different second value otherwise, and means HOA_e for generating, and means for storing or transmitting, the Ambisonics encoded version of the audio signal if the mode decision information has said first value, or otherwise storing or transmitting the received or retrieved audio signal.

[0048] In another embodiment, a device for processing audio signals comprises means for receiving or retrieving from storage encoded audio signals, means for extracting first audio source signals and additional information from the received or retrieved signals, wherein the first audio source signals relate to first audio source positions provided by the additional information, means (TRC) for transforming the first audio source signals relating to first audio source positions into second audio source signals relating to different second audio source positions, and means (LSA) for supplying said second audio source signals for storage or playback.

[0049] The invention can be used for all kinds of audio processing devices. These may be targeting music reproduction, but also voice reproduction, such as multi-channel teleconferencing systems. Advantageously, spatial information can be added to conventional multi-channel audio signals, and scalability in terms of spatial resolution can be provided.

[0050] It will be understood that the present invention has been described purely by way of example, and modifications of detail can be made without departing from the scope of the invention.
Each feature disclosed in the description and (where appropriate) the claims and drawings may be provided independently or in any appropriate combination. Features may, where appropriate be implemented in hardware, software, or a combination of the two. Connections may, where applicable, be implemented as wireless connections or wired, not necessarily direct or dedicated, connections. Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims.

Claims

1. Audio signal (60) comprising one or more audio source signals (X_src) relating to specific positions and additional data (r_src), characterized in that the additional data define said specific positions to which the audio source signals relate.

2. Audio signal according to claim 1, wherein said additional data (r_src) are suitable for calculating a generic audio field representation based on Ambisonics representation.

3. Method for encoding or transmitting an audio signal, comprising the steps of

- providing one or more audio source signals (X_src);

characterized in the further steps of

- determining for each of said audio source signals a specific position to which it relates;

- generating data sets containing the determined positions (r_src) of the audio sources; and

- encoding or transmitting (62) the data sets together with said audio source signals.

4. Method according to claim 3, wherein said data sets are suitable for calculating a generic audio field representation based on Ambisonics representation.

5. Method according to claim 3 or 4, further comprising the steps of

- determining the number (O_s) of source channels that are required for Ambisonics encoding, said number being (2N+1) for the 2D case and (N+1)² for the 3D case, where N is the order of Ambisonics encoding;

- determining the number (O_c) of available transmission or storage channels; and

- comparing the number (O_s) of source channels required for Ambisonics encoding of the order N with the number (O_c) of available transmission or storage channels;

- depending on said comparison, generating a mode decision information (MD), having a first value if the number (O_S) of source channels required for Ambisonics encoding of the order N is not less than the number of available transmission or storage channels (O_C), or having a different second value otherwise; and

- generating (HOA_e, 64), and storing or transmitting, the Ambisonics encoded version (61) of the audio signal if the mode decision information (MD,65) has said first value, or otherwise storing or transmitting the received or retrieved audio signal (60).

6. Method according to claim 5, wherein the Ambisonics encoded version (61) of the audio signal is transmitted, further comprising the step of transmitting said mode decision information (MD).

7. Method according to claim 5, further comprising the steps of

- determining a number of sources, such as microphones, or target reproduction channels, such as loudspeakers;

- selecting N being the order of HOA encoding as a function of the determined number of sources or reproduction channels.

8. Method according to claim 5, wherein N is determined from the number (O_c) of available transmission or storage channels.

9. Method for processing an audio signal, comprising the steps of

- receiving or retrieving from storage an encoded audio signal (60);

characterized in the further steps of

- extracting (63) first audio source signals (X'_src) and additional information (r'_src) from the received or retrieved signal, wherein the first audio source signals (X'_src) relate to first audio source positions provided by the additional information (r'_src) ;

- transforming (TRC) the first audio source signals relating to first audio source positions into second audio source signals relating to different second audio source positions; and

- supplying (LSA) said second audio source signals for storage or playback.

10. Method according to claim 9, wherein a generic audio field representation is calculated.

11. Method according to claim 10, wherein the generic audio field representation is an Ambisonics representation (HOA) of an order higher than one.

12. Apparatus for encoding or transmitting an audio signal, using a method according to claims 3-8.

13. Apparatus for processing an audio signal, using a method according to any of the claims 9-11.

Drawing

Search report

Cited references

REFERENCES CITED IN THE DESCRIPTION

This list of references cited by the applicant is for the reader's convenience only. It does not form part of the European patent document. Even though great care has been taken in compiling the references, errors or omissions cannot be excluded and the EPO disclaims all liability in this regard.

Non-patent literature cited in the description

K.HamasakiT. NishiguchiR.OkumauraY.NakayamaWide listening area with exceptional spatial sound quality of a 22.2 multichannel sound systemAudio Engineering Society Preprints, 2007, [0002]
Jerome DanielSpatial Sound Encoding Including Near Field Effect: Introducing distance Coding Filters and a Viable, 2003, [0003]