FIELD OF THE INVENTION
[0001] The invention relates to the field of audio signal processing. More specifically,
the invention provides a processor and a method for converting a multi-channel audio
signal, such as a sound field signal, into another type of multi-channel audio signal
suited for playback via headphones or loudspeakers, while preserving spatial information
in the original signal.
BACKGROUND OF THE INVENTION
[0002] The use of B-format measurements, recordings and playback in the provision of more
ideal acoustic reproductions which capture part of the spatial characteristics of
an audio reproduction are well known.
[0003] In the case of conversion of B-format signals to multiple loudspeakers in a loudspeaker
array, there is a well recognized problem due to the spreading of individual virtual
sound sources over a large number of playback speaker elements. In the case of binaural
playback of B-format signals, the approximations inherent in the B-format sound field
can lead to less precise localization of sound sources, and a loss of the out-of-head
sensation that is an important part of the binaural playback experience.
[0004] In prior art, filter banks are used to split each component of the spatial sound
field set into a set of frequency bands. The short-term correlation between the W
(omni) channel and each of the three other bands is then used to estimate the direction
of arrival of sound within each frequency band. The input signal is split in two parts:
One consisting of the frequency bands where a clear direction of arrival was detected,
and one consisting of the remainder of the frequency bands. The first part of the
signal is processed through head-related transfer functions corresponding to the estimated
direction of arrival in each frequency band. The second part is processed through
a linear decoding matrix and a fixed set of head-related transfer functions corresponding
to a virtual loudspeaker array.
[0005] US 6,628,787 by Lake Technology Ltd. describes a specific method for creating a multi-channel
of binaural signal from a B-format sound-field signal. The sound-field signal is split
into frequency bands, and in each band a direction factor is determined. Based on
the direction factor, speaker drive signals are computed foe each band by panning
the signals to drive the nearest speaker. In addition, residual signal components
are apportioned to the speaker signals by means of known decoding techniques.
[0006] There are different problems with the known methods. Firstly, the direction estimate
is generally missing or incorrect in the case where more than a single sound source
emits sound at the same time and within the same frequency band. This leads to imprecise
or incorrect localization when there is more than one sound source present and when
echoes interfere with the direct sound from a single source. Secondly, the use of
head-related transfer functions from different directions in different frequency bands
leads to a phase mismatch which increases proportionally with frequency. This in turn
leads to two problems: Firstly, the group delay of high-frequency input signals does
not correspond to the group delay encoded in the head-related transfer functions.
This gives the wrong inter-aural time difference and therefore inaccurate localization.
Secondly, the temporal evolution of the input signal is distorted, leading to poor
reproduction of such transient sounds as applause and percussion instruments.
SUMMARY OF THE INVENTION
[0007] In view of the above, it may be seen as an object of the present invention to provide
a processor and a method for converting a multi-channel audio input, such as a B-format
sound field input into an audio output suited for playback over headphones or via
loudspeakers, while still preserving the substantial spatial information contained
in the original multi-channel input.
[0008] In a first aspect, the invention provides an audio processor arranged to convert
a multi-channel audio input signal comprising at least two channels, such as a B-format
Sound Field signal, into a set of audio output signals, such as a set of two audio
output signals arranged for headphone reproduction, the audio processor comprising
- a filter bank arranged to separate the input signal into a plurality of frequency
bands, such as partially overlapping frequency bands,
- a sound source separation unit arranged, for at least a part of the plurality of frequency
bands, to
- perform a plane wave expansion computation on the multi-channel audio input signal
so as to determine at least one dominant direction corresponding to a direction of
a dominant sound source in the audio input signal,
- determine an array of at least two, such as four, virtual loudspeaker positions selected
such that one or more of the virtual loudspeaker positions at least substantially
coincides, such as precisely coincides, with the at least one dominant direction,
and
- decode the audio input signal into virtual loudspeaker signals corresponding to each
of the virtual loudspeaker positions, and
- a summation unit arranged to sum the virtual loudspeaker signals for the at least
part of the plurality of frequency bands to arrive at the set of audio output signals.
[0009] Such audio processor provides an advantageous conversion of the multi-channel input
signal due to the combination of plane wave expansion extraction of directions for
dominant sound sources for each frequency band and the selection of at least one virtual
loudspeaker position coinciding with a direction for at least one dominant sound source.
For example, this provides a virtual loudspeaker signal highly suited for generation
of a binaural output signal by applying Head-Related Transfer Functions to the virtual
loudspeaker signals. The reason is that it is secured that a dominant sound source
is represented in the virtual loudspeaker signal by its direction, whereas prior art
systems with a fixed set of virtual loudspeaker positions will in general split such
dominant sound source between the nearest fixed virtual loudspeaker positions. When
applying Head-Related Transfer Functions, this means that the dominant sound source
will be reproduced through two sets of Head-Related Transfer Functions corresponding
to the two fixed virtual loudspeaker positions which results in a rather blurred spatial
image of the dominant sound source. According to the invention, the dominant sound
source will be reproduced through one set of Head-Related Transfer Functions corresponding
to its actual direction, thereby resulting in an optimal reproduction of the 3D spatial
information contained in the original input signal.
[0010] Thus, in a preferred embodiment, the audio processor is arranged to generate the
set of audio output signals such that it is arranged for playback over headphones,
e.g. by applying Head-Related Transfer Functions, or other known ways of creating
a spatial effects based on a single input signal and its direction.
[0011] The filter bank may comprise at least 500, such as 1000 to 5000, preferably partially
overlapping filters covering the frequency range of 0 Hz to 22 kHz. E.g. specifically,
an FFT analysis with a window length of 2048 to 8192 samples, i.e. 1024-4096 bands
covering 0-22050 Hz may be used. However, it is appreciated that the invention may
be performed also with fewer filters, in case a reduced performance is accepted.
[0012] The sound source separation unit preferably determines the at least one dominant
direction in each frequency band for each time frame, such as a time frame having
a size of 2,000 to 10,000 samples, e.g. 2048-8192, as mentioned. However, it is to
be understood that a lower update of the dominant direction may be used, in case a
reduced performance is accepted.
[0013] For audio input signals where a panning technique is used to position sound sources,
such as stereo recordings or surround sound recordings, less spatial information is
present, in comparison with a B-format input. To compensate, the filter outputs from
two consecutive time frames should be sent to the sound source separation unit. This
is preferably achieved with a plurality of delay elements.
[0014] The virtual loudspeaker positions may be selected by a rotation of a set of at least
two positions in a fixed spatial interrelation. Especially, the set of positions in
a fixed spatial interrelation comprises four positions, such as four positions arranged
in a tetrahedron, which has been found to provide an accurate localization of the
sound sources without increasing the noise level of the recorded signal.
[0015] The plane wave expansion may determine two dominant directions, and wherein the array
of at least two virtual loudspeaker positions is selected such that two of the virtual
loudspeaker positions at least substantially coincides, such as precisely coincides,
with the two dominant directions. Hereby it is ensured that the two most dominating
sound sources, in a frequency band, are as precisely spatially represented as possible,
thus leading to the best possible spatial reproduction of audio material with two
dominant sound sources spatially distributed, e.g. two singers or two musical instruments
playing at the same time.
[0016] In order to generate a binaural two-channel output signal, the audio processor may
comprise a binaural synthesizer unit arranged to generate first and second audio output
signals by applying Head-Related Transfer Functions to each of the virtual loudspeaker
signals. Especially, such audio processor may be implemented by a decoding matrix
corresponding to the determined virtual loudspeaker positions and a transfer function
matrix corresponding to the Head-Related Transfer Functions being combined into an
output transfer matrix prior to being applied to the audio input signals. Hereby a
smoothing may be performed on transfer functions of such output transfer matrix prior
to being applied to the input signals, which will serve to improve reproduction of
transient sounds.
[0017] In a preferred embodiment, the phase of the Head-Related Transfer Functions is differentiated
with respect to frequency, and after combining components of Head-Related Transfer
Functions corresponding to different directions, the phase of the combined transfer
functions is integrated with respect to frequency. This serves to preserve the group
delay of the Head-Related Transfer Functions. Even more specifically, the phase of
the Head-Related Transfer Functions may be differentiated with respect to frequency
at frequencies above a frequency limit only, such as above 1.8 kHz, and after combining
components of Head-Related Transfer Functions corresponding to different directions,
the phase of the combined transfer functions is integrated with respect to frequency
at frequencies above the frequency limit. Hereby, only the phase at higher frequencies
is manipulated, and thus at lower frequencies where the interaural phase difference
is significant, the phase is left unchanged. In a more specific embodiment, the phase
of the Head-Related Transfer Functions may be left unaltered below a first frequency
limit, such as below 1.6 kHz, and differentiated with respect to frequency at frequencies
above a second frequency limit with a higher frequency than the first frequency limit,
such as 2.0 kHz, and with a gradual transition in between, and after combining components
of HRTFs corresponding to different directions, the inverse operation is applied to
the combined function.
[0018] The audio input signal is preferably a multi-channel audio signal arranged for decomposition
into plane wave components. Especially, the input signal may be one of: a B-format
sound field signal, a higher-order ambisonics recording, a stereo recording, and a
surround sound recording.
[0019] In a second aspect, the invention provides a device comprising an audio processor
according to the first aspect. Especially, the device may be one of: a device for
recording sound or video signals, a device for playback of sound or video signals,
a portable device, a computer device, a video game device, a hi-fi device, an audio
converter device, and a headphone unit.
[0020] In a third aspect, the invention provides a method for converting a multi-channel
audio input signal comprising at least two channels, such as a B-format Sound Field
signal, into a set of audio output signals, such as a set of two audio output signals
arranged for headphone reproduction, the method comprising
- separating the input signal into a plurality of frequency bands, such as partially
overlapping frequency bands,
- performing a sound source separation for at least a part of the plurality of frequency
bands, comprising
- performing a plane wave expansion computation on the multi-channel audio input signal
so as to determine at least one dominant direction corresponding to a direction of
a dominant sound source in the audio input signal,
- determining an array of at least two, such as four, virtual loudspeaker positions
selected such that one or more of the virtual loudspeaker positions at least substantially
coincides, such as precisely coincides, with the at least one dominant direction,
and
- decoding the audio input signal into virtual loudspeaker signals corresponding to
each of the virtual loudspeaker positions, and
- summing the virtual loudspeaker signals for the at least part of the plurality of
frequency bands to arrive at the set of audio output signals.
[0021] The method may be implemented in pure software, e.g. in the form of a generic code
or in the form of a processor specific executable code. Alternatively, the method
may be implemented partly in specific analog and/or digital electronic components
and partly in software. Still alternatively, the method may be implemented in a single
dedicated chip.
[0022] It is appreciated that two or more of the mentioned embodiments can advantageously
be combined. It is also appreciated that embodiments and advantages mentioned for
the first aspect, applies as well for the second and third aspects.
BRIEF DESCRIPTION OF THE DRAWING
[0023] Embodiments of the invention will be described, by way of example only, with reference
to the drawings.
Fig. 1 illustrates basic components of one embodiment of the audio processor,
Fig. 2 illustrates details of an embodiment for converting a B-format sound field
signal into a binaural signal,
Fig. 3 illustrates a possible implementation of the transfer matrix generator referred
to in Fig. 2, and
Fig. 4 illustrates an audio device with an audio processor according to the invention.
DESCRIPTION OF EMBODIMENTS
[0024] Fig. 1 shows an audio processor component with basic components according to the
invention. Input to the audio processor is a multi-channel audio signal. This signal
is split into a plurality of frequency bands in a filter bank, e.g. in the form of
an FFT analysis performed on each of the plurality of channels. A sound source separation
unit SSS is then performed on the frequency separated signal. First, a plane wave
expansion calculation PWE is performed on each frequency band in order to determine
one or two dominant sound source directions. The one or two dominant sound source
directions are then applied to a virtual loudspeaker position calculation algorithm
VLP serving to select a set of virtual sound source or virtual loudspeaker directions,
e.g. by rotation of a fixed set of virtual loudspeaker directions, such that the one
or both, in case of two, dominant sound source directions coincide with respective
virtual loudspeaker directions. Then, the input signal is transferred or decoded DEC
according to a decoding matrix corresponding to the selected virtual loudspeaker directions,
and optionally Head-Related Transfer Functions corresponding to the virtual loudspeaker
directions are applied before the frequency components are finally combined in a summation
unit SU to form a set of output signals, e.g. two output signals in case of a binaural
implementation, or such as four, five, six, seven or even more output signals in case
of conversion to a format suitable for reproduction through a surround sound set-up
of loudspeakers.
[0025] The audio processor can be implemented in various ways, e.g. in the form of a processor
forming part of a device, wherein the processor is provided with executable code to
perform the invention.
[0026] Figs. 2 and 3 illustrate components of a preferred embodiment suited to convert an
input signal having a three dimensional characteristics and is in an "ambisonic B-format".
The ambisonic B-format system is a very high quality sound positioning system which
operates by breaking down the directionality of the sound into spherical harmonic
components termed W, X, Y and Z. The ambisonic system is then designed to utilize
a plurality of output speakers to cooperatively recreate the original directional
components. For a description of the B-format system, reference is made to: http://en.wikipedia.org/wiki/Ambisonics.
[0027] Referring to Fig. 2, the preferred embodiment is directed at providing an improved
spatialization of input audio signals. A B-format signal is input having X, Y, Z and
W components. Each component of the B-format input set is processed through a corresponding
filter bank
1-4 each of which divides the input into a number of output frequency bands (The number
of bands being implementation dependent, typically in the range of 1024 to 4096).
[0028] Elements
5, 6, 7, 8 and
10 are replicated once for each frequency band, although only one of each is shown in
Fig. 2. For each frequency band, the four signals (one from each filter bank
1-4) are processed by a plane wave expansion element
5, which determines the smallest number of plane waves necessary to recreate the local
sound field encoded in the four signals. The plane wave expansion element also calculates
the direction, phase and amplitude of these waves. The input signal is denoted
w,
x,
y, z, with subscripts
r and
i. The local sound field can in most cases be recreated by two plane waves, as expressed
in the following equations:

[0030] Equation 5 gives zero, one or two real values for cos
2ϕ, corresponding to zero, one or two solutions to the equations. Each value for cos
2ϕ corresponds to several possible values of ϕ, one in each quadrant, or the values
0 and n. Only one of these is correct. The correct quadrant can be determined from
equation 9 and the requirement that
w1 and
w2 should be positive.

[0033] As before, the quadrant of ϕ can be determined based on another equation (18) and
the requirement that
w'
1 and
w'
2 should be positive.

[0034] The values of
wo and ϕ
0 are not used in subsequent steps.
[0035] The output of
5 consists of the two vectors <
x1,
y1,
z1> and
<x2,
y2,
z2>. This output is connected to an element
6 which sorts these two vectors in accordance to the value of their
y element. In an alternative embodiment of the invention, only one of the two vectors
is passed on from element
6. The choice can be that of the longest vector or the one with the highest degree of
similarity with neighbouring vectors. The output of
6 is connected to a smoothing element
7 which suppresses rapid changes in the direction estimates. The output of
7 is connected to an element
8 which generates suitable transfer functions from each of the input signals to each
of the output signals, a total of eight transfer functions. Each of these transfer
functions are passed through a smoothing element
9. This element suppresses large differences in phase and amplitude between neighbouring
frequency bands and also suppresses rapid changes in phase and amplitude. The output
of
9 is passed to a matrix multiplier
10 which applies the transfer functions to the input signals and creates two output
signals. Elements
11 and
12 sum each of the output signals from
10 across all filter bands to produce a binaural signal.
[0036] Referring to Fig. 3, there is illustrated schematically the preferred embodiment
of the transfer matrix generator referenced in Fig. 2. While one transfer matrix generator
is provided for each frequency band, elements
3, 4 and
5 are shared across all frequency bands. An element
1 generates two new vectors whose directions are chosen so as to maximize the angles
between the four resulting vectors. In an alternative embodiment of the invention,
only one vector is passed into the transfer matrix generator. In this case, element
1 must generate three new vectors, preferably such that the resulting four vectors
point towards the vertices of a regular tetrahedron. This alternative approach is
also beneficial in cases where the two input vectors are collinear or nearly collinear.
[0037] The four vectors are used to represent the directions to four virtual loudspeakers
which will be used to play back the input signals. An element
6 calculates a decoding matrix by inverting the following matrix:

where

[0038] An element
5 stores a set of head-related transfer functions. An element
3 alters the phase of these transfer functions, leaving the amplitude unchanged. The
reason for this transformation is that any uncertainty in the direction estimates
translates to an uncertainty in the absolute phase of the head-related transfer functions
which increases proportionally with frequency. Without this alteration to the transfer
functions, too much noise would be added to the phase of high-frequency components
in the signal, resulting in poor reproduction of transient sounds. The transformation
performed by element
3 is:

[0039] The effect of this transformation is none at low frequencies. At high frequencies,
the phase of the transfer function is differentiated with respect to frequency. The
transition happens around
f=
fc, in a transition region of approximate width
fo. In this equation, Δ
f is the band-to-band frequency difference. Since the human ability to perceive inter-aural
phase difference is limited to frequencies below approx. 1200-1600 Hz, reasonable
values for
fc and
f0 are 1800 Hz and 200 Hz, respectively. Above this transition frequency, humans are
still sensitive to inter-aural group delay, which will be restored after performing
the inverse transformation, as is done in element
4: 
[0040] Element
2 uses the virtual loudspeaker directions to select and interpolate between the modified
head-related transfer functions closest to the direction of each virtual loudspeaker.
For each virtual loudspeaker, there are two head-related transfer functions; one for
each ear, providing a total of eight transfer functions which are passed to element
4.
[0041] The outputs of elements
4 and
6 are multiplied in a matrix multiplication
7 to produce the suitable transfer matrix.
[0042] In the arrangement shown in Fig. 3, the decoding matrix is multiplied with the transfer
function matrix before their product is multiplied with the input signals. In an alternative
embodiment of the invention, the input signals are first multiplied with the decoding
matrix and their product subsequently multiplied with the transfer function matrix.
However, this would preclude the possibility of smoothing of the overall transfer
functions. Such smoothing is advantageous for the reproduction of transient sounds.
[0043] The overall effect of the arrangement shown in Figs. 2 and 3 is to decompose the
local sound field into a large number of plane waves and to pass these plane waves
through corresponding head-related transfer functions in order to produce a binaural
signal suited for headphone reproduction.
[0044] Fig. 4 illustrates a block diagram of an audio device with an audio processor according
to the invention, e.g. the one illustrated in Figs. 2 and 3. The device may be a dedicated
headphone unit, a general audio device offering the conversion of a multi-channel
input signal to another output format as an option, or the device may be a general
computer with a sound card provided with software suited to perform the conversion
method according to the invention.
[0045] The device may be able to perform on-line conversion of the input signal, e.g. by
receiving the multi-channel input audio signal in the form of a digital bit stream.
Alternatively, e.g. if the device is a computer, the device may generate the output
signal in the form of an audio output file based on an audio file as input.
[0046] To sum up, the invention provides an audio processor for converting a multi-channel
audio input signal X, Y, Z, W, such as a B-format Sound Field signal, into a set of
audio output signals L, R, such as a set of two audio output signals L, R arranged
for headphone reproduction. A filter bank splits the input signal X, Y, Z, W into
frequency bands. A sound source separation unit uses wave expansion on the input signal
X, Y, Z, W to determine one or two dominant sound source directions. These are used
to determine virtual loudspeaker positions selected such that one or both of the virtual
loudspeaker positions coincide with one or both of the dominant directions. The input
signal X, Y, Z, W is then decoded into virtual loudspeaker signals corresponding to
each of the virtual loudspeaker positions, and finally the frequency components are
combined in a summation unit to arrive at the set of audio output signals L, R. E.g.
Head-Related Transfer Functions (HRTFs) are applied to arrive at a binaural signal
suited for headphone reproduction. A high spatial fidelity is obtained due to the
coincidence of virtual loudspeaker positions and the determined dominant sound source
direction(s).
[0047] Improved performance can be obtained by differentiating the phase of a high frequency
part of the HRTFs before with respect to frequency, followed by a corresponding integration
of this part with respect to frequency after combining the components of HRTFs.
[0048] In the claims, the term "comprising" does not exclude the presence of other elements
or steps. Additionally, although individual features may be included in different
claims, these may possibly be advantageously combined, and the inclusion in different
claims does not imply that a combination of features is not feasible and/or advantageous.
In addition, singular references do not exclude a plurality. Thus, references to "a",
"an", "first", "second" etc. do not preclude a plurality. Reference signs are included
in the claims however the inclusion of the reference signs is only for clarity reasons
and should not be construed as limiting the scope of the claims.
1. An audio processor arranged to convert a multi-channel audio input signal (X, Y, Z,
W) comprising at least two channels, such as a B-format Sound Field signal, into a
set of audio output signals (L, R), such as a set of two audio output signals (L,
R) arranged for headphone reproduction, the audio processor comprising
- a filter bank arranged to separate the input signal (X, Y, Z, W) into a plurality
of frequency bands, such as partially overlapping frequency bands,
- a sound source separation unit arranged, for at least a part of the plurality of
frequency bands, to
- perform a plane wave expansion computation on the multi-channel audio input signal
(X, Y, Z, W) so as to determine at least one dominant direction corresponding to a
direction of a dominant sound source in the audio input signal (X, Y, Z, W),
- determine an array of at least two, such as four, virtual loudspeaker positions
selected such that one or more of the virtual loudspeaker positions at least substantially
coincides, such as precisely coincides, with the at least one dominant direction,
and
- decode the audio input signal (X, Y, Z, W) into virtual loudspeaker signals corresponding
to each of the virtual loudspeaker positions, and
- a summation unit arranged to sum the virtual loudspeaker signals for the at least
part of the plurality of frequency bands to arrive at the set of audio output signals
(L, R).
2. Audio processor according to claim 1, wherein the filter bank comprises at least 500,
such as 1000 to 5000, partially overlapping filters covering a frequency range of
0 Hz to 22 kHz.
3. Audio processor according to claim 1 or 2, wherein the virtual loudspeaker positions
are selected by a rotation of a set of at least three positions in a fixed spatial
interrelation.
4. Audio processor according to claim 3, wherein the set of positions in a fixed spatial
interrelation comprises four positions, such as four positions arranged in a tetrahedron.
5. Audio processor according to any of the preceding claims, wherein the wave expansion
determines two dominant directions, and wherein the array of at least two virtual
loudspeaker positions is selected such that two of the virtual loudspeaker positions
at least substantially coincides, such as precisely coincides, with the two dominant
directions.
6. Audio processor according to any of the preceding claims, comprising a binaural synthesizer
unit arranged to generate first and second audio output signals (L, R) by applying
Head-Related Transfer Functions (HRTF) to each of the virtual loudspeaker signals.
7. Audio processor according to claim 6, wherein a decoding matrix corresponding to the
determined virtual loudspeaker positions and a transfer function matrix corresponding
to the Head-Related Transfer Functions (HRTF) are being combined into an output transfer
matrix prior to being applied to the audio input signals (X, Y, Z, W).
8. Audio processor according to claim 7, wherein a smoothing is performed on transfer
functions of the output transfer matrix prior to being applied to the input signals
(X, Y, Z, W).
9. Audio processor according to any of claims 6-8, wherein the phase of the Head-Related
Transfer Functions (HRTF) is differentiated with respect to frequency, and after combining
components of Head-Related Transfer Functions (HRTF) corresponding to different directions,
the phase of the combined transfer functions is integrated with respect to frequency.
10. Audio processor according to any of claim 9, wherein the phase of the Head-Related
Transfer Functions (HRTF) is left unaltered below a first frequency limit, such as
below 1.6 kHz, and differentiated with respect to frequency at frequencies above a
second frequency limit with a higher frequency than the first frequency limit, such
as 2.0 kHz, and with a gradual transition in between, and after combining components
of Head-Related Transfer Functions (HRTF) corresponding to different directions, the
inverse operation is applied to the combined function.
11. Audio processor according to any of the preceding claims, wherein the audio input
signal is a multi-channel audio signal arranged for decomposition into plane wave
components, such as one of: a B-format sound field signal, a higher-order ambisonics
recording, a stereo recording, and a surround sound recording.
12. Audio processor according to any of the preceding claims, wherein the sound source
separation unit determines the at least one dominant direction in each frequency band
for each time frame, wherein a time frame has a size of 2,000 to 10,000 samples.
13. Audio processor according to any of the preceding claims, wherein the set of audio
output signals (L, R) is arranged for playback over headphones.
14. Device comprising an audio processor according to any of claims 1-13, such as the
device being one of: a device for recording sound or video signals, a device for playback
of sound or video signals, a portable device, a computer device, a video game device,
a hi-fi device, an audio converter device, and a headphone unit.
15. Method for converting a multi-channel audio input signal (X, Y, Z, W) comprising at
least two channels, such as a B-format Sound Field signal, into a set of audio output
signals (L, R), such as a set of two audio output signals (L, R) arranged for headphone
reproduction, the method comprising
- separating the input signal (X, Y, Z, W) into a plurality of frequency bands, such
as partially overlapping frequency bands,
- performing a sound source separation for at least a part of the plurality of frequency
bands, comprising
- performing a plane wave expansion computation on the multi-channel audio input signal
(X, Y, Z, W) so as to determine at least one dominant direction corresponding to a
direction of a dominant sound source in the audio input signal (X, Y, Z, W),
- determining an array of at least two, such as four, virtual loudspeaker positions
selected such that one or more of the virtual loudspeaker positions at least substantially
coincides, such as precisely coincides, with the at least one dominant direction,
and
- decoding the audio input signal (X, Y, Z, W) into virtual loudspeaker signals corresponding
to each of the virtual loudspeaker positions, and
- summing the virtual loudspeaker signals for the at least part of the plurality of
frequency bands to arrive at the set of audio output signals (L, R).