TECHNOLOGICAL FIELD
[0001] Embodiments of the present invention relate to control of audio rendering. In particular,
they relate to control of audio rendering of a sound scene comprising at least one
sound object.
BACKGROUND
[0002] A sound scene in this document is used to refer to the arrangement of one or more
sound sources in a three-dimensional space. When a sound source changes position,
the sound scene changes. When the sound source changes its audio properties such as
its audio output, then the sound scene changes.
[0003] A sound scene may be defined in relation to recording sounds (a recorded sound scene)
and in relation to rendering sounds (a rendered sound scene).
[0004] Some current technology focuses on accurately reproducing a recorded sound scene
as a rendered sound scene either in real time or at a distance in time and/or space
from the recorded sound scene. The recorded sound scene is encoded for storage and/or
transmission and/or rendering.
[0005] A sound object within a sound scene may be a source sound object that represents
a sound source within the sound scene or may be a recorded sound object which represents
sounds recorded at a particular microphone. In this document, reference to a sound
object refers to both a recorded sound object and a source sound object. However,
in some examples, the sound object(s) may be only source sound objects and in other
examples the sound object(s) may be only recorded sound objects.
[0006] By using audio processing it may be possible, in some circumstances, to convert a
recorded sound object into a source sound object and/or to convert a source sound
object into a recorded sound object.
[0007] It may be desirable in some circumstances to record a sound scene using multiple
microphones. Some microphones, such as Lavalier microphones, or other portable microphones,
may be attached to or may follow a sound source in the sound scene. Other microphones
may be static in the sound scene.
[0008] The combination of outputs from the various microphones defines a recorded sound
scene. However, it may not always be possible to render the sound scene exactly as
it was when it was recorded.
BRIEF SUMMARY
[0009] According to various, but not necessarily all, embodiments of the invention there
is provided a method comprising: detecting a change in position of rendering a sound
object from a first position at a first time to a second position, different to the
first position, at a second time immediately after the first time; and at the second
time, generating a visual distraction.
[0010] According to various, but not necessarily all, embodiments of the invention there
is provided a computer program when run on a processor causes performance of:
detecting a change in position of rendering a sound object from a first position at
a first time to a second position, different to the first position, at a second time
immediately after the first time; and at the second time, generating a visual distraction.
[0011] According to various, but not necessarily all, embodiments of the invention there
is provided a system or apparatus comprising means for performing: detecting a change
in position of rendering a sound object from a first position at a first time to a
second position, different to the first position, at a second time immediately after
the first time; and at the second time, generating a visual distraction.
[0012] According to various, but not necessarily all, embodiments of the invention there
is provided an apparatus comprising: at least one processor; and at least one memory
including computer program code, the at least one memory and the computer program
code configured to, with the at least one processor, cause the apparatus at least
to perform: detecting a change in position of rendering a sound object from a first
position at a first time to a second position, different to the first position, at
a second time immediately after the first time; and at the second time, generating
a visual distraction.
[0013] According to various, but not necessarily all, embodiments of the invention there
is provided examples as claimed in the appended claims.
BRIEF DESCRIPTION
[0014] For a better understanding of various examples that are useful for understanding
the detailed description, reference will now be made by way of example only to the
accompanying drawings in which:
Fig. 1 illustrates an example of a system and also an example of a method for recording
and encoding a sound scene;
Fig. 2 schematically illustrates relative positions of a portable microphone (PM)
and static microphone (SM) relative to an arbitrary reference point (REF);
Fig. 3 illustrates a system as illustrated in Fig. 1, modified to rotate the rendered
sound scene relative to the recorded sound scene;
Figs. 4A and 4B illustrate a change in relative orientation between a listener and
the rendered sound scene so that the rendered sound scene remains fixed in space;
Fig. 5 illustrates a module which may be used, for example, to perform the functions
of the positioning block, orientation block and distance block of the system;
Fig. 6A and 6B illustrate examples of a direct processing block and an indirect processing
block for use in the module of Fig. 5;
Fig. 7 illustrates an example of the system implemented using an apparatus;
Fig. 8A and 8B illustrate an example of a first mode of operation of the system;
Fig. 9A and 9B illustrate an example of a second mode of operation of the system;
Fig. 10 illustrates an example of a method;
Figs. 11A, 11B and 11C illustrate an example of a sound scene visually modified by
the method of Fig 10; and
Figs. 11A, 11B and 11C illustrate an example of a sound scene visually modified by
the method of Fig 10.
DETAILED DESCRIPTION
[0015] Fig. 1 illustrates an example of a system 100 and also an example of a method 200.
The system 100 and method 200 record a sound scene 10 and process the recorded sound
scene to enable an accurate rendering of the recorded sound scene as a rendered sound
scene for a listener at a particular position (the origin) within the recorded sound
scene 10.
[0016] The system 100 comprises one or more portable microphones 110 and may comprise one
or more static microphones 120.
[0017] In this example, but not necessarily all examples, the origin of the sound scene
is at a microphone. In this example, the microphone at the origin is a static microphone
120. It may record one or more channels, for example it may be a microphone array.
[0018] In this example, only a single static microphone 120 is illustrated. However, in
other examples multiple static microphones 120 may be used independently. In such
circumstances the origin may be at any one of these static microphones 120 and it
may be desirable to switch, in some circumstances, the origin between static microphones
120 or to position the origin at an arbitrary position within the sound scene.
[0019] The system 100 comprises one or more portable microphones 110. The portable microphone
110 may, for example, move with a sound source within the recorded sound scene 10.
The portable microphone may, for example, be an 'up-close' microphone that remains
close to a sound source. This may be achieved, for example, using a boom microphone
or, for example, by attaching the microphone to the sound source, for example, by
using a Lavalier microphone. The portable microphone 110 may record one or more recording
channels.
[0020] Fig. 2 schematically illustrates the relative positions of the portable microphone
(PM) 110 and the static microphone (SM) 120 (if present) relative to an arbitrary
reference point (REF). The position of the static microphone 120 relative to the reference
point REF is represented by the vector
x. The position of the portable microphone PM relative to the reference point REF is
represented by the vector
y. The relative position of the portable microphone PM 110 from the static microphone
SM 120 is represented by the vector
z. It will be understood that
z =
y -
x. The vector
z gives the relative position of the portable microphone 110 relative to the static
microphone 120 which, in this example, is the origin of the sound scene 10. The vector
z therefore positions the portable microphone 110 relative to a notional listener of
the recorded sound scene 10. As the origin is static, the vector
x is constant. Therefore, if one has knowledge of
x and tracks variations in
y, it is possible to also track variations in
z, the relative position of the portable microphone 110 relative to the origin of the
sound scene 10.
[0021] When the sound scene 10 as recorded is rendered to a user (listener) by the system
100 in Fig. 1, it is rendered to the listener as if the listener is positioned at
the origin of the recorded sound scene 10. It is therefore important that, as the
portable microphone 110 moves in the recorded sound scene 10, its position
z relative to the origin of the recorded sound scene 10 is tracked and is correctly
represented in the rendered sound scene. The system 100 is configured to achieve this.
[0022] In the example of Fig. 1, the audio signals 122 output from the static microphone
120 are coded by audio coder 130 into a multichannel audio signal 132. If multiple
static microphones were present, the output of each would be separately coded by an
audio coder into a multichannel audio signal.
[0023] The audio coder 130 may be a spatial audio coder such that the multichannels 132
represent the sound scene 10 as recorded by the static microphone 120 and can be rendered
giving a spatial audio effect. For example, the audio coder 130 may be configured
to produce multichannel audio signals 132 according to a defined standard such as,
for example, binaural coding, 5.1 surround sound coding, 7.1 surround sound coding
etc. If multiple static microphones were present, the multichannel signal of each
static microphone would be produced according to the same defined standard such as,
for example, binaural coding, 5.1 surround sound coding, and 7.1 surround sound coding
and in relation to the same common rendered sound scene.
[0024] The multichannel audio signals 132 from one or more the static microphones 120 are
mixed by mixer 102 with multichannel audio signals 142 from the one or more portable
microphones 110 to produce a multi-microphone multichannel audio signal 103 that represents
the recorded sound scene 10 relative to the origin and which can be rendered by an
audio decoder corresponding to the audio coder 130 to reproduce a rendered sound scene
to a listener that corresponds to the recorded sound scene when the listener is at
the origin.
[0025] The multichannel audio signal 142 from the, or each, portable microphone 110 is processed
before mixing to take account of any movement of the portable microphone 110 relative
to the origin at the static microphone 120.
[0026] The audio signals 112 output from the portable microphone 110 are processed by the
positioning block 140 to adjust for movement of the portable microphone 110 relative
to the origin. The positioning block 140 takes as an input the vector
z or some parameter or parameters dependent upon the vector
z. The vector
z represents the relative position of the portable microphone 110 relative to the origin
at the static microphone 120 in this example.
[0027] The positioning block 140 may be configured to adjust for any time misalignment between
the audio signals 112 recorded by the portable microphone 110 and the audio signals
122 recorded by the static microphone 120 so that they share a common time reference
frame. This may be achieved, for example, by correlating naturally occurring or artificially
introduced (non-audible) audio signals that are present within the audio signals 112
from the portable microphone 110 with those within the audio signals 122 from the
static microphone 120. Any timing offset identified by the correlation may be used
to delay/advance the audio signals 112 from the portable microphone 110 before processing
by the positioning block 140.
[0028] The positioning block 140 processes the audio signals 112 from the portable microphone
110, taking into account the relative orientation (Arg(
z)) of that portable microphone 110 relative to the origin at the static microphone
120.
[0029] The audio coding of the static microphone audio signals 122 to produce the multichannel
audio signal 132 assumes a particular orientation of the rendered sound scene relative
to an orientation of the recorded sound scene and the audio signals 122 are encoded
to the multichannel audio signals 132 accordingly.
[0030] The relative orientation Arg (
z) of the portable microphone 110 in the recorded sound scene 10 is determined and
the audio signals 112 representing the sound object are coded to the multichannels
defined by the audio coding 130 such that the sound object is correctly oriented within
the rendered sound scene at a relative orientation Arg (
z) from the listener. For example, the audio signals 112 may first be mixed or encoded
into the multichannel signals 142 and then a transformation T may be used to rotate
the multichannel audio signals 142, representing the moving sound object, within the
space defined by those multiple channels by Arg (
z).
[0031] Referring to Figs. 4A and 4B, in some situations, for example when the sound scene
is rendered to a listener through a head-mounted audio output device 300, for example
headphones using binaural audio coding, it may be desirable for the rendered sound
scene 310 to remain fixed in space 320 when the listener turns their head 330 in space.
This means that the rendered sound scene 310 needs to be rotated relative to the audio
output device 300 by the same amount in the opposite sense to the head rotation.
[0032] In Figs. 4A and 4B, the relative orientation between the listener and the rendered
sound scene 310 is represented by an angle θ. The sound scene is rendered by the audio
output device 300 which physically rotates in the space 320. The relative orientation
between the audio output device 300 and the rendered sound scene 310 is represented
by an angle α. As the audio output device 300 does not move relative to the user's
head 330 there is a fixed offset between θ and α of 90° in this example. When the
user turns their head θ changes. If the sound scene is to be rendered as fixed in
space then α must change by the same amount in the same sense.
[0033] Moving from Fig. 4A to 4B, the user turns their head clockwise increasing θ by magnitude
Δ and increasing α by magnitude Δ. The rendered sound scene is rotated relative to
the audio device in an anticlockwise direction by magnitude Δ so that the rendered
sound scene 310 remains fixed in space.
[0034] The orientation of the rendered sound scene 310 tracks with the rotation of the listener's
head so that the orientation of the rendered sound scene 310 remains fixed in space
320 and does not move with the listener's head 330.
[0035] Fig. 3 illustrates a system 100 as illustrated in Fig. 1, modified to rotate the
rendered sound scene 310 relative to the recorded sound scene 10. This will rotate
the rendered sound scene 310 relative to the audio output device 300 which has a fixed
relationship with the recorded sound scene 310.
[0036] An orientation block 150 is used to rotate the multichannel audio signals 142 by
Δ, determined by rotation of the user's head.
[0037] Similarly, an orientation block 150 is used to rotate the multichannel audio signals
132 by Δ, determined by rotation of the user's head.
[0038] The functionality of the orientation block 150 is very similar to the functionality
of the orientation function of the positioning block 140.
[0039] The audio coding of the static microphone signals 122 to produce the multichannel
audio signals 132 assumes a particular orientation of the rendered sound scene relative
to the recorded sound scene. This orientation is offset by Δ. Accordingly, the audio
signals 122 are encoded to the multichannel audio signals 132 and the audio signals
112 are encoded to the multichannel audio signals 142 accordingly. The transformation
T may be used to rotate the multichannel audio signals 132 within the space defined
by those multiple channels by Δ. An additional transformation T may be used to rotate
the multichannel audio signals 142 within the space defined by those multiple channels
by Δ.
[0040] In the example of Fig. 3, the portable microphone signals 112 are additionally processed
to control the perception of the distance D of the sound object from the listener
in the rendered sound scene, for example, to match the distance |
z| of the sound object from the origin in the recorded sound scene 10. This can be
useful when binaural coding is used so that the sound object is, for example, externalized
from the user and appears to be at a distance rather than within the user's head,
between the user's ears. The distance block 160 processes the multichannel audio signal
142 to modify the perception of distance.
[0041] Fig. 5 illustrates a module 170 which may be used, for example, to perform the functions
of the positioning block 140, orientation block 150 and distance block 160 in Fig.
3. The module 170 may be implemented using circuitry and/or programmed processors.
[0042] The Figure illustrates the processing of a single channel of the multichannel audio
signal 142 before it is mixed with the multichannel audio signal 132 to form the multi-microphone
multichannel audio signal 103. A single input channel of the multichannel signal 142
is input as signal 187.
[0043] The input signal 187 passes in parallel through a "direct" path and one or more "indirect"
paths before the outputs from the paths are mixed together, as multichannel signals,
by mixer 196 to produce the output multichannel signal 197. The output multichannel
signal 197, for each of the input channels, are mixed to form the multichannel audio
signal 142 that is mixed with the multichannel audio signal 132.
[0044] The direct path represents audio signals that appear, to a listener, to have been
received directly from an audio source and an indirect path represents audio signals
that appear to a listener to have been received from an audio source via an indirect
path such as a multipath or a reflected path or a refracted path.
[0045] The distance block 160 by modifying the relative gain between the direct path and
the indirect paths, changes the perception of the distance D of the sound object from
the listener in the rendered sound scene 310.
[0046] Each of the parallel paths comprises a variable gain device 181, 191 which is controlled
by the distance block 160.
[0047] The perception of distance can be controlled by controlling relative gain between
the direct path and the indirect (decorrelated) paths. Increasing the indirect path
gain relative to the direct path gain increases the perception of distance.
[0048] In the direct path, the input signal 187 is amplified by variable gain device 181,
under the control of the distance block 160, to produce a gain-adjusted signal 183.
The gain-adjusted signal 183 is processed by a direct processing module 182 to produce
a direct multichannel audio signal 185.
[0049] In the indirect path, the input signal 187 is amplified by variable gain device 191,
under the control of the distance block 160, to produce a gain-adjusted signal 193.
The gain-adjusted signal 193 is processed by an indirect processing module 192 to
produce an indirect multichannel audio signal 195.
[0050] The direct multichannel audio signal 185 and the one or more indirect multichannel
audio signals 195 are mixed in the mixer 196 to produce the output multichannel audio
signal 197.
[0051] The direct processing block 182 and the indirect processing block 192 both receive
direction of arrival signals 188. The direction of arrival signal 188 gives the orientation
Arg(
z) of the portable microphone 110 (moving sound object) in the recorded sound scene
10 and the orientation Δ of the rendered sound scene 310 relative to the audio output
device 300.
[0052] The position of the moving sound object changes as the portable microphone 110 moves
in the recorded sound scene 10 and the orientation of the rendered sound scene 310
changes as the head-mounted audio output device, rendering the sound scene rotates.
[0053] The direct processing block 182 may, for example, include a system 184 similar to
that illustrated in Figure 6A that rotates the single channel audio signal, gain-adjusted
input signal 183, in the appropriate multichannel space producing the direct multichannel
audio signal 185.
[0054] The system 184 uses a transfer function to performs a transformation T that rotates
multichannel signals within the space defined for those multiple channels by Arg(
z) and by Δ, defined by the direction of arrival signal 188. For example, a head related
transfer function (HRTF) interpolator may be used for binaural audio. As another example,
Vector Base Amplitude Panning (VBAP) may be used for loudspeaker format (e.g. 5.1)
audio.
[0055] The indirect processing block 192 may, for example, be implemented as illustrated
in Fig. 6B. In this example, the direction of arrival signal 188 controls the gain
of the single channel audio signal, the gain-adjusted input signal 193, using a variable
gain device 194. The amplified signal is then processed using a static decorrelator
196 and then a system 198 that applies a static transformation T to produce the indirect
multichannel audio signal 195. The static decorrelator in this example uses a pre-delay
of at least 2 ms. The transformation T rotates multichannel signals within the space
defined for those multiple channels in a manner similar to the system 184 but by a
fixed amount. For example, a static head related transfer function (HRTF) interpolator
may be used for binaural audio.
[0056] It will therefore be appreciated that the module 170 can be used to process the portable
microphone signals 112 and perform the functions of:
- (i) changing the relative position (orientation Arg(z) and/or distance |z|) of a sound object, represented by a portable microphone audio signal 112, from
a listener in the rendered sound scene and
- (ii) changing the orientation of the rendered sound scene (including the sound object
positioned according to (i)) relative to a rotating rendering audio output device
300.
[0057] It should also be appreciated that the module 170 may also be used for performing
the function of the orientation block 150 only, when processing the audio signals
122 provided by the static microphone 120. However, the direction of arrival signal
will include only Δ and will not include Arg(
z). In some but not necessarily all examples, gain of the variable gain devices 191
modifying the gain to the indirect paths may be put to zero and the gain of the variable
gain device 181 for the direct path may be fixed. In this instance, the module 170
reduces to the system 184 illustrated in Fig. 6A that rotates the recorded sound scene
to produce the rendered sound scene according to a direction of arrival signal that
includes only Δ and does not include Arg(
z).
[0058] Fig. 7 illustrates an example of the system 100 implemented using an apparatus 400.
The apparatus 400 may, for example, be a static electronic device, a portable electronic
device or a hand-portable electronic device that has a size that makes it suitable
to carried on a palm of a user or in an inside jacket pocket of the user.
[0059] In this example, the apparatus 400 comprises the static microphone 120 as an integrated
microphone but does not comprise the one or more portable microphones 110 which are
remote. In this example, but not necessarily all examples, the static microphone 120
is a microphone array. However, in other examples, the apparatus 400 does not comprise
the static microphone 120.
[0060] The apparatus 400 comprises an external communication interface 402 for communicating
externally with external microphones, for example, the remote portable microphone(s)
110. This may, for example, comprise a radio transceiver.
[0061] A positioning system 450 is illustrated as part of the system 100. This positioning
system 450 is used to position the portable microphone(s) 110 relative to the origin
of the sound scene e.g. the static microphone 120. In this example, the positioning
system 450 is illustrated as external to both the portable microphone 110 and the
apparatus 400. It provides information dependent on the position
z of the portable microphone 110 relative to the origin of the sound scene to the apparatus
400. In this example, the information is provided via the external communication interface
402, however, in other examples a different interface may be used. Also, in other
examples, the positioning system may be wholly or partially located within the portable
microphone 110 and/or within the apparatus 400.
[0062] The position system 450 provides an update of the position of the portable microphone
110 with a particular frequency and the term 'accurate' and 'inaccurate' positioning
of the sound object should be understood to mean accurate or inaccurate within the
constraints imposed by the frequency of the positional update. That is accurate and
inaccurate are relative terms rather than absolute terms.
[0063] The apparatus 400 wholly or partially operates the system 100 and method 200 described
above to produce a multi-microphone multichannel audio signal 103.
[0064] The apparatus 400 provides the multi-microphone multichannel audio signal 103 via
an output communications interface 404 to an audio output device 300 for rendering.
[0065] In some but not necessarily all examples, the audio output device 300 may use binaural
coding. Alternatively or additionally, in some but not necessarily all examples, the
audio output device 300 may be a head-mounted audio output device.
[0066] In this example, the apparatus 400 comprises a controller 410 configured to process
the signals provided by the static microphone 120 and the portable microphone 110
and the positioning system 450. In some examples, the controller 410 may be required
to perform analogue to digital conversion of signals received from microphones 110,
120 and/or perform digital to analogue conversion of signals to the audio output device
300 depending upon the functionality at the microphones 110, 120 and audio output
device 300. However, for clarity of presentation no converters are illustrated in
Fig. 7.
[0067] Implementation of a controller 410 may be as controller circuitry. The controller
410 may be implemented in hardware alone, have certain aspects in software including
firmware alone or can be a combination of hardware and software (including firmware).
[0068] As illustrated in Fig. 7 the controller 410 may be implemented using instructions
that enable hardware functionality, for example, by using executable instructions
of a computer program 416 in a general-purpose or special-purpose processor 412 that
may be stored on a computer readable storage medium (disk, memory etc) to be executed
by such a processor 412.
[0069] The processor 412 is configured to read from and write to the memory 414. The processor
412 may also comprise an output interface via which data and/or commands are output
by the processor 412 and an input interface via which data and/or commands are input
to the processor 412.
[0070] The memory 414 stores a computer program 416 comprising computer program instructions
(computer program code) that controls the operation of the apparatus 400 when loaded
into the processor 412. The computer program instructions, of the computer program
416, provide the logic and routines that enables the apparatus to perform the methods
illustrated in Figs. 1-12. The processor 412 by reading the memory 414 is able to
load and execute the computer program 416.
[0071] As illustrated in Fig. 7, the computer program 416 may arrive at the apparatus 400
via any suitable delivery mechanism 430. The delivery mechanism 430 may be, for example,
a non-transitory computer-readable storage medium, a computer program product, a memory
device, a record medium such as a compact disc read-only memory (CD-ROM) or digital
versatile disc (DVD), an article of manufacture that tangibly embodies the computer
program 416. The delivery mechanism may be a signal configured to reliably transfer
the computer program 416. The apparatus 400 may propagate or transmit the computer
program 416 as a computer data signal.
[0072] Although the memory 414 is illustrated as a single component/circuitry it may be
implemented as one or more separate components/circuitry some or all of which may
be integrated/removable and/or may provide permanent/semi-permanent/ dynamic/cached
storage.
[0073] Although the processor 412 is illustrated as a single component/circuitry it may
be implemented as one or more separate components/circuitry some or all of which may
be integrated/removable. The processor 412 may be a single core or multi-core processor.
[0074] The position system 450 enables a position of the portable microphone 110 to be determined.
The position system 450 may receive positioning signals and determine a position which
is provided to the processor 412 or it may provide positioning signals or data dependent
upon positioning signals so that the processor 412 may determine the position of the
portable microphone 110.
[0075] There are many different technologies that may be used by a position system 450 to
position an object including passive systems where the positioned object is passive
and does not produce a positioning signal and active systems where the positioned
object produces one or more positioning signals. An example of system, used in the
Kinect ™ device, is when an object is painted with a non-homogenous pattern of symbols
using infrared light and the reflected light is measured using multiple cameras and
then processed, using the parallax effect, to determine a position of the object.
An example of an active radio positioning system is when an object has a transmitter
that transmits a radio positioning signal to multiple receivers to enable the object
to be positioned by, for example, trilateration or triangulation. An example of a
passive radio positioning system is when an object has a receiver or receivers that
receive a radio positioning signal from multiple transmitters to enable the object
to be positioned by, for example, trilateration or triangulation. Trilateration requires
an estimation of a distance of the object from multiple, non-aligned, transmitter/receiver
locations at known positions. A distance may, for example, be estimated using time
of flight or signal attenuation. Triangulation requires an estimation of a bearing
of the object from multiple, non-aligned, transmitter/receiver locations at known
positions. A bearing may, for example, be estimated using a transmitter that transmits
with a variable narrow aperture, a receiver that receives with a variable narrow aperture,
or by detecting phase differences at a diversity receiver.
[0076] Other positioning systems may use dead reckoning and inertial movement or magnetic
positioning.
[0077] The object that is positioned may be the portable microphone 110 or it may an object
worn or carried by a person associated with the portable microphone 110 or it may
be the person associated with the portable microphone 110.
[0078] A problem can arise is relation to positioning using transmission and/or reception
of radio signals, particularly indoors, because of multi-path effects arising from
reflections. While the high accuracy indoor positioning system (HAIP) is a radio positioning
system that addresses such problems, problems can still arise in consistently and
accurately positioning a portable microphone 110.
[0079] It is possible for one or more positioning signals received or transmitted by the
position system 450 to be subject to noise, resulting in an incorrect position of
a portable microphone 110 (incorrect
y and incorrect z). In a rendered sound scene, this would result in an incorrect positioning
of the rendered sound object associated with the portable microphone 110 which can
be disconcerting to a listener.
[0080] In some circumstances, for example during the first mode described below, it is possible
to correct for noise in the position of the portable microphone 110, for example,
because there is enough confidence as when a position of the portable microphone 110
is incorrect.
[0081] In other circumstances, for example during the second mode described below, it is
more difficult to correct for noise in the position of the portable microphone 110,
for example, because there is not enough confidence as when a position of the portable
microphone 110 is incorrect. There may, for example, be confidence that there is an
error in one of two positions but there may not be confidence as to which position
is erroneous at that time.
[0082] Figs. 8A and 9A both illustrate a plot of how a determined position p
i(t
i) of the portable microphone 110 varies with time t
i. The determined position p
i(t
i) of the portable microphone 110 is the position that is determined or which would
be determined based on the original positioning signals by the position system 450.
It is the position of the portable microphone 110 based on the measurements made.
[0083] The determined position p
i(t
i) of the portable microphone 110 suffers from noise (deviations from a true position
value). There is, in the examples illustrated, frequent low intensity deviation of
the determined position p
i(t
i) from a position p
1 Some of the deviation may arise from small variations in the actual position of the
portable microphone 110 but some of the deviation may be modeled as an unpredictable
small amplitude noise n
i(t
i).
[0084] There may be larger intensity deviations of the determined position p
i(t
i) from the position p
1 In the plots illustrated, there are larger intensity deviations when, for example,
p
i(t
i)= p
2 and p
i(t
i)= p
3. Some of the deviation may arise from variations in the actual position of the portable
microphone 110 but some of the deviation may be modeled as an unpredictable error
E
i(t
i).
[0085] In a first mode of operation, the signal processing may, for example, use filtering
to remove noise from the determined positions p
i(t
i).
[0086] For example a filter may average the determined positions of the portable microphone
over a time window, which may be variable. For example, a number N of the immediately
preceding determined positions p
n(t
n) for n = i-1, i-2..i-N may be averaged with the determined position p
i(t
i) to provide a processed position P
i(t
i). This filter may be used to remove the unpredictable small amplitude noise n
i(t
i).
[0087] For example a filter may ignore a change in position because it occurs at greater
than a threshold speed. If a change in position is Δp
i = |p
i(t
i)- p
i-m(t
i-m)| which occurs over a time interval Δt
i = t
i - t
i-m, then the speed v
i is given by Δp
i / Δt
i. For example, m may equal 1, and if v
i > T
i then p
i(t
i) = p
i(t
i-1) else p
i(t
i) = p
i(t
i). The threshold T
i may be variable. This filter may be used to remove the unpredictable errors E
i(t
i).
[0088] For example a filter may ignore a change in position because it occurs at greater
than a threshold distance. If a change in position is Δp
i = |p
i(t
i)- p
i-1(t
i-1)| exceeds a threshold X
i then p
i(t
i) = p
i(t
i-1) else p
i(t
i) = p
i(t
i). The threshold X
i may be variable. This filter may be used to remove the unpredictable errors E
i(t
i).
[0089] In a rendered sound scene, this would result in a correct positioning of the rendered
sound object associated with the portable microphone 110 despite inaccurate determined
positions p
i(t
i).
[0090] Figs. 9A and 9B will be used to explain a second mode of operation of the system
100. In the second mode, some of the deviation (noise) illustrated in Fig. 8A is modeled
using a second model as arising from variations in the actual position of the portable
microphone 110, some of the deviation is modeled as arising from unpredictable small
amplitude noise n
i(t
i).
[0091] However, whereas the first model models all or most large intensity deviations as
unpredictable errors E
i(t
i) that do not change the processed position, the second model does not and large intensity
deviations in the determined position are more likely to cause a large intensity deviation
of the processed position.
[0092] The system 100, in the second mode, may use this second model to attempt to remove
the deviation arising from unpredictable small amplitude noise n
i(t
i) in a manner similar to that described for the first model using a filter. The resultant
processed position P
i(t
i) of the portable microphone 110 should according to the second model correspond to
the actual position of the portable microphone 110.
[0093] Fig. 9B illustrates a plot of how a processed position P
i(t
i) of the portable microphone 110 varies with time t
i. This processed position P
i(t
i) is used to control rendering of the sound object associated with the portable microphone
110. The processed position P
i(t
i), not the determined position p
i(t
i) is used to provide
z used by the positioning block 140 to adjust for movement of the sound object associated
with the portable microphone 110 relative to the origin.
[0094] It will be appreciated that a consequence of the processing of the determined position
p
i(t
i) to produce the usable processed position P
i(t
i), in the second mode, does not necessarily reduce a variance in the position of the
portable microphone 110.
[0095] In the second mode of operation, the signal processing used in the first mode to
remove the errors E
i(t
i) is not used on the determined positions p
i(t
i) and, in a rendered sound scene, a change in the processed positions P
i(t
i) positioning the rendered sound object associated with the portable microphone 110
occurs when there is a change in the determined position of the portable microphone
110. In the example of Fig. 9B, a change in the processed positions P
i(t
i) from P
1 to P
2 occurs when the determined position p
i(t
i) of the portable microphone 110 changes from p
1 to p
2. In some example P
1 is the same as or very similar to p
1. In some example P
2 is the same as or very similar to p
2. In the example of Fig. 9B, a change in the processed positions P
i(t
i) from P
1 to P
3 occurs when the determined position p
i(t
i) of the portable microphone 110 changes from p
1 to p
3. In some example P
1 is the same as or very similar to p
1 and/or P
3 is the same as or very similar to p
2.
[0096] Fig. 10, illustrates an example of a method 600 suitable for use during the second
mode or on transition from the first mode to the second mode. The method generates
a distraction, for example a visual distraction when there is a discontinuous or abrupt
change in the processed position P
i(t) of the rendered sound object.
[0097] When there is a discontinuous or abrupt change in the processed position P
i(t) of the rendered sound object there is some likelihood that the change in processed
position is erroneous. The generation of a visual distraction, simultaneously with
the change in position distracts the listener from the error.
[0098] The method 600 comprises, at block 602, providing a process for detecting a change
in position P
i(t) of rendering a sound object from a first position at a first time to a second
position, different to the first position, at a second time immediately after the
first time. If block 602 detects a change in position P
i(t) of rendering the sound object from a first position at a first time to a second
position, different to the first position, at a second time immediately after the
first time, then the method moves to block 604.
[0099] The method 600 comprises, at block 604, providing a process for generating a visual
distraction at the second time.
[0100] Referring back to Fig. 9B, thus when the processed position P
i(t) changes 502 between P
1(t) and P
3(t) in Fig. 9B, a visual distraction 504 is generated.
[0101] Also when the processed position P
i(t) changes 502 between P
1(t) and P
2(t) in Fig. 9B, a visual distraction 504 is generated.
[0102] In the example illustrated, a visual distraction 504 is generated on each transition
between P
1(t) and P
3(t) in a first sense only (away from P
1(t)). That is, a visual distraction 504 is generated on each transition from P
1(t) to P
3(t). However, in other examples (not illustrated), a visual distraction 504 is generated
on each transition between P
1(t) and P
3(t) in a first sense and a second opposite sense (away from and towards P
1(t)). That is, a visual distraction 504 is generated on each transition from P
1(t) to P
3(t) and a visual distraction 504 is generated on each transition to P
1(t) from P
3(t).
[0103] In some examples (illustrated), a visual distraction 504 is generated on every qualifying
transition between two processed positions P
i(t
i) in a first sense only and in in some examples (not illustrated), a visual distraction
504 is generated on every qualifying transition between two processed positions P
i(t
i) irrespective of the sense of transition, that is, in the first sense and in the
opposite second sense.
[0104] A qualifying transition may be a transition in processed position that satisfies
a qualifying criterion or criteria.
[0105] One example of a qualifying condition for classifying a change in processed position
as a qualifying transition is that the change in position occurs at greater than a
threshold speed. If a change in position is ΔP
i = P
i(t
i)- P
i-m(t
i-m) which occurs over a time interval Δt
i = t
i - t
i-m, then the speed V
i is given by ΔP
i / Δt
i. For example, m may equal 1, and if V
i > Y
i then the transition P
i-1(t
i-1) to P
i(t
i) is a qualifying transition and a distraction, for example a visual distraction,
is generated at time t
i. The threshold Y
i may be variable.
[0106] One example of a qualifying condition for classifying a change in processed position
as a qualifying transition is that the change in position occurs with a greater than
a threshold distance, that is, it is a gross change in position. If a change in position
is ΔP
i = P
i(t
i)- P
i-m(t
i-m) which occurs over a time interval, then the distance D
i is given by |(ΔP
i)|. For example, m may equal 1, and if D
i > Z
i then the transition P
i-1(t
i-1) to P
i(t
i) is a qualifying transition and a distraction, for example a visual distraction,
is generated at time t
i. The threshold Z
i may be variable.
[0107] Referring to Fig. 9B, in the second mode, when there is a qualifying transition a
visual distraction is generated, whereas, referring to Fig. 8B, in the first mode,
visual distractions are not generated. Changes in position in the first mode that
would qualify for generation of a visual distraction in the second mode are removed
by filtering, whereas in the second mode the changes in position are retained but
accompanied by a visual distraction.
[0108] The classification of a change in position as a qualifying transition may identify
the change as an anomalous change in a processed position of the portable microphone.
That is a change in position that cannot physically occur because, for example, the
speed of position change is too great.
[0109] Another example of a qualifying condition for classifying a change in processed position
as a qualifying transition is that the change in position occurs to a second position
at which there is not stage lighting or some other stage effect. The classification
of a change in position as a qualifying transition, in this example, may identify
the change as an incorrect change in a processed position of the portable microphone.
The generated visual distraction may provide the stage lighting or stage effect at
the second position, previously determined to be absent.
[0110] In the second mode, a visual distraction may be generated with each qualifying transition,
until the second mode is exited.
[0111] In some examples, the visual distractions generated may be generated in real time
within an audio scene of the sound object.
[0112] A visual distraction generated may comprises a stage effect for example a lighting
effect, a smoke effect, pyrotechnics and/or moving stage objects.
[0113] A visual distraction generated may comprise a lighting effect. A lighting effect
may for example comprise a change in a lighting property or lighting properties.
[0114] A change in a lighting effect may comprise one or more of: changing a position of
a spotlight, changing a beam width of a spotlight, adding a spot light, removing a
spotlight, changing a color, intensity and/or number of spotlights, and changing a
lighting pattern.
[0115] In some examples, a visual distraction generated may be dependent upon a classification
of a qualifying transition e.g. as anomalous or incorrect.
[0116] In some examples, a visual distraction generated may be dependent upon a property
of a qualifying transition, for example, a size of a change in processed position
of the rendered sound object.
[0117] In some examples, a visual distraction generated may be dependent upon a history
of previous stage effects and/or visual distractions.
[0118] Figs. 11A, 11B and 11C illustrate one example of a changing stage effect 610, in
this example a lighting effect, in accordance with the rendering of a sound object
based on the processed position of the portable microphone illustrated in Fig. 9B.
[0119] As illustrated in Fig. 11A, when the processed position P is p
1, a spotlight 612 is trained on the position p
1
[0120] As illustrated in Fig. 11B, when there is a qualifying transition and the processed
position P changes from p
1 to p
3, a visual distraction is generated by moving the spotlight 612 so that is no longer
trained on p
1 but is trained on p
3, while processed position P is p
3.
[0121] As illustrated in Fig. 11C, when there is a qualifying transition and the processed
position P changes from p
1 to p
2, a visual distraction is generated by moving the spotlight 612 so that is no longer
trained on p
1 but is trained on p
2, while processed position P is p
2.
[0122] In the examples of Figs. 11A-11C, the spotlight 612 follows the processed position
P. The distraction generation corresponds to a gross change in position of the spotlight.
[0123] Figs. 12A, 12B and 12C illustrate an example of a changing stage effect 610, in this
example a lighting effect, in accordance with the rendering of a sound object based
on the processed position of the portable microphone illustrated in Fig. 9B.
[0124] As illustrated in Fig. 12A, when the processed position P is p
1, a spotlight 612 is trained on the position p
1.
[0125] As illustrated in Fig. 12B, when there is a qualifying transition and the processed
position P changes from position p
1 to position p
3, a visual distraction is generated by training an additional spotlight 612' on the
position p
3, while processed position P is position p
3.
[0126] As illustrated in Fig. 12C, when there is a qualifying transition and the processed
position P changes from position p
1 to position p
2, a visual distraction is generated by training an additional spotlight 612' on position
p
2, while processed position P is position p
2.
[0127] In the examples of Figs. 12A-12C, the distraction generation corresponds to a new
additional spotlight.
[0128] It may, at this stage, be informative to compare and contrast operation of the system
100 under the first mode and the second mode.
[0129] The modes differ in how large intensity deviations in the determined position of
a portable microphone (processed positions of a sound object) are handled.
[0130] In the first mode, a position of the sound object (portable microphone 110) is compensated
to prevent rapid changes in a position of the rendered sound object. The first mode
rejects gross variances in position of the sound object. This compensation removes
the unpredictable error E
i(t
i).
[0131] In the second mode, the unpredictable error E
i(t
i) is not modeled or removed and a position of the sound object (portable microphone
110) is not compensated and there may be a rapid changes in a position of the rendered
sound object. The second mode accepts gross variances in position of the sound object.
[0132] The first mode may be suitable when it is possible to discriminate between a correct
and incorrect position and to correct the incorrect position. That is it is possible
to confidently identify an error and confidently remove the error. There is a high
level of confidence as to what is an accurate position.
[0133] The second mode may be suitable when it is not possible to discriminate between a
correct and incorrect position and/or it is not possible to correct the incorrect
position. That is it is not possible to confidently identify an error and confidently
remove the error. There is a low level of confidence as to what is an accurate position.
There may, for example, be a high degree of confidence that at least some positions
are not possible, anomalous, incorrect and arise from error but a lower degree of
confidence as to which positions are not possible, anomalous, incorrect and arise
from error, that is it is difficult to discriminate between a correct and incorrect
position.
[0134] The transition between the first mode and the second mode may be based on processing
the determined positions of the portable microphone.
[0135] When there is a level of confidence at or above a threshold value that it is possible
to remove an error, the first mode is used.
[0136] When there is a level of confidence below a threshold value that it is possible to
remove an error, the second mode is used.
[0137] The system 100 may therefore automatically transition between the first mode and
the second mode.
[0138] Thus as noise levels increase and a level of confidence as to what is an accurate
position and what is an inaccurate position decreases, the system changes from the
first mode to the second mode and generates visual distractions.
[0139] The description of processing positioning signals described should be understood
to also encompass processing of data dependent upon the positioning signals.
[0140] The processing of positioning signals to determine a processed position at which
a sound object is rendered may or may not occur, wholly or partially, within the position
system 450. The processing of positioning signals to determine a processed position
at which a sound object is rendered may or may not occur, wholly or partially, within
the processor 412 of the apparatus 400.
[0141] The processing of positioning signals to determine a mode for rendering a sound object
may or may not occur, wholly or partially, within the position system 450. The processing
of positioning signals to determine a mode for rendering a sound object may or may
not occur, wholly or partially, within the processor 412 of the apparatus 400.
[0142] The processing to cause a visual distraction to accompany rendering of a sound object
may or may not occur, wholly or partially, within the position system 450. The processing
to cause a visual distraction to accompany rendering of a sound object may or may
not occur, wholly or partially, within the processor 412 of the apparatus 400.
[0143] The method 600 may, for example, be performed by the system 100, for example, using
the controller 410 of the apparatus 400.
[0144] It will be appreciated from the foregoing that the various methods 600 described
may be performed by an apparatus 400, for example an electronic apparatus 400.
[0145] The electronic apparatus 400 may in some examples be a part of an audio output device
300 such as a head-mounted audio output device or a module for such an audio output
device 300. The electronic apparatus 400 may in some examples additionally or alternatively
be a part of a head-mounted apparatus 800 comprising the display 420 that displays
images to a user.
[0146] It will be appreciated from the foregoing that the various methods 600 described
may be performed by a computer program used by such an apparatus 400.
[0147] For example, an apparatus 400 may comprise:
at least one processor 412; and
at least one memory 414 including computer program code
the at least one memory 414 and the computer program code configured to, with the
at least one processor 412, cause the apparatus 400 at least to perform:
causing or performing, detecting a change in position of rendering a sound object
from a first position at a first time to a second position, different to the first
position, at a second time immediately after the first time; and
causing, at the second time, generating a visual distraction.
[0148] References to 'computer-readable storage medium', 'computer program product', 'tangibly
embodied computer program' etc. or a 'controller', 'computer', 'processor' etc. should
be understood to encompass not only computers having different architectures such
as single /multi- processor architectures and sequential (Von Neumann)/parallel architectures
but also specialized circuits such as field-programmable gate arrays (FPGA), application
specific circuits (ASIC), signal processing devices and other processing circuitry.
References to computer program, instructions, code etc. should be understood to encompass
software for a programmable processor or firmware such as, for example, the programmable
content of a hardware device whether instructions for a processor, or configuration
settings for a fixed-function device, gate array or programmable logic device etc.
[0149] As used in this application, the term 'circuitry' refers to all of the following:
- (a) hardware-only circuit implementations (such as implementations in only analog
and/or digital circuitry) and
- (b) to combinations of circuits and software (and/or firmware), such as (as applicable):
(i) to a combination of processor(s) or (ii) to portions of processor(s)/software
(including digital signal processor(s)), software, and memory(ies) that work together
to cause an apparatus, such as a mobile phone or server, to perform various functions
and
- (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s),
that require software or firmware for operation, even if the software or firmware
is not physically present.
This definition of 'circuitry' applies to all uses of this term in this application,
including in any claims. As a further example, as used in this application, the term
"circuitry" would also cover an implementation of merely a processor (or multiple
processors) or portion of a processor and its (or their) accompanying software and/or
firmware. The term "circuitry" would also cover, for example and if applicable to
the particular claim element, a baseband integrated circuit or applications processor
integrated circuit for a mobile phone or a similar integrated circuit in a server,
a cellular network device, or other network device.
[0150] The blocks illustrated in the Figs. 1-12 may represent steps in a method and/or sections
of code in the computer program 416. The illustration of a particular order to the
blocks does not necessarily imply that there is a required or preferred order for
the blocks and the order and arrangement of the block may be varied. Furthermore,
it may be possible for some blocks to be omitted.
[0151] Where a structural feature has been described, it may be replaced by means for performing
one or more of the functions of the structural feature whether that function or those
functions are explicitly or implicitly described.
[0152] As used here 'module' refers to a unit or apparatus that excludes certain parts/components
that would be added by an end manufacturer or a user.
[0153] The term 'comprise' is used in this document with an inclusive not an exclusive meaning.
That is any reference to X comprising Y indicates that X may comprise only one Y or
may comprise more than one Y. If it is intended to use 'comprise' with an exclusive
meaning then it will be made clear in the context by referring to "comprising only
one..." or by using "consisting".
[0154] In this brief description, reference has been made to various examples. The description
of features or functions in relation to an example indicates that those features or
functions are present in that example. The use of the term 'example' or 'for example'
or 'may' in the text denotes, whether explicitly stated or not, that such features
or functions are present in at least the described example, whether described as an
example or not, and that they can be, but are not necessarily, present in some of
or all other examples. Thus 'example', 'for example' or 'may' refers to a particular
instance in a class of examples. A property of the instance can be a property of only
that instance or a property of the class or a property of a sub-class of the class
that includes some but not all of the instances in the class. It is therefore implicitly
disclosed that a feature described with reference to one example but not with reference
to another example, can where possible be used in that other example but does not
necessarily have to be used in that other example.
[0155] Although embodiments of the present invention have been described in the preceding
paragraphs with reference to various examples, it should be appreciated that modifications
to the examples given can be made without departing from the scope of the invention
as claimed.
[0156] Features described in the preceding description may be used in combinations other
than the combinations explicitly described.
[0157] Although functions have been described with reference to certain features, those
functions may be performable by other features whether described or not.
[0158] Although features have been described with reference to certain embodiments, those
features may also be present in other embodiments whether described or not.
[0159] Whilst endeavoring in the foregoing specification to draw attention to those features
of the invention believed to be of particular importance it should be understood that
the Applicant claims protection in respect of any patentable feature or combination
of features hereinbefore referred to and/or shown in the drawings whether or not particular
emphasis has been placed thereon.