TECHNOLOGICAL FIELD
[0001] Embodiments of the present disclosure relate to audio. Some enable the distribution
of common content for rendering to both advanced audio output devices and less advanced
audio output devices.
BACKGROUND
[0002] Advanced audio output devices are capable to rendering multiple received audio channels
as different spatially positioned audio sources. The spatial separation of audio sources
(spatial audio) can aid hearing when the sources simultaneously provide sound.
[0003] Less advanced audio output devices are perhaps only capable of rendering one monophonic
audio channel. They cannot render multiple received audio channels as different spatially
positioned audio sources.
[0004] Content that is suitable for rendering spatial audio via an advanced audio output
device may be unsuitable for a less advanced audio output device and content that
is suitable for rendering by a less advanced audio output device may under-utilize
the spatial audio capabilities of an advanced audio output device.
BRIEF SUMMARY
[0005] According to various, but not necessarily all, embodiments there is provided an apparatus
comprising means for:
receiving at least N audio channels where each of the N audio channels can be rendered
as a different audio source;
controlling mixing of the N audio channels to produce at least an output audio channel,
wherein the mixing selects for mixing to produce the output audio channel, a sub-set
of M audio channels from the N audio channels, wherein the selection is in dependence
upon prioritization of the N audio channels, and wherein the prioritization is adaptive
depending at least upon a changing content of one or more of the N audio channels;
and
providing for rendering at least the output audio channel.
[0006] In some but not necessarily all examples, the apparatus comprises means for: automatically
controlling mixing of the N audio channels to produce at least the output audio channel,
in dependence upon time-variation of content of one or more of the N audio channels.
[0007] In some but not necessarily all examples, the N audio channels are N spatial audio
channels where each of the N spatial audio channels can be rendered as a differently
positioned audio source.
[0008] In some but not necessarily all examples, N is at least two and M is one, the output
audio channel being a monophonic audio output channel.
[0009] In some but not necessarily all examples, the apparatus comprises means for analyzing
the N audio channels to adapt a prioritization of the N audio channels in dependence
upon, at least, changing content of one or more of the N audio channels.
[0010] In some but not necessarily all examples, prioritization depends upon one or more
of:
timing of content of at least one of the N audio channels relative to timing of content
of at least another one of the N audio channels;
history of content of at least one of the N audio channels;
mapping to a particular person, an identified voice in content of at least one of
the N audio channels;
detection that content of at least one of the N audio channels is voice content;
detection that content of at least one of the N audio channels comprises an identified
word.
[0011] In some but not necessarily all examples, controlling mixing of the N audio channels
to produce at least an output audio channel, comprises:
selecting a first sub-set of the N audio channels to be mixed to provide background
audio;
selecting a second sub-set of the N audio channels to be mixed to provide foreground
audio that is for rendering at greater loudness than the background audio, wherein
the selection of the first sub-set and selection of the second sub-set is dependent
upon the prioritization of the N audio channels; and
mixing the background audio and the foreground audio to produce the output audio channel.
[0012] In some but not necessarily all examples, the apparatus comprises means for controlling
mixing of the N audio channels to produce M audio channels in response to a communication
bandwidth for receiving the audio channels or for providing output audio signals falling
beneath a threshold value.
[0013] In some but not necessarily all examples, the apparatus comprises means for controlling
mixing of the N audio channels to produce M audio channels when conflict between a
first audio channel of the N audio channels and a second audio channel of the N audio
channels, wherein the first audio channel is included within the M audio channels
and the second audio channel is not included within the M audio channels, wherein
over-talking is an example of conflict.
[0014] In some but not necessarily all examples, the audio channels of the N audio channels
that are not the selected M audio channels are available for later rendering.
[0015] In some but not necessarily all examples, the apparatus comprises a user input interface
for controlling prioritization of the N audio channels.
[0016] In some but not necessarily all examples, the apparatus comprises a user input interface,
wherein the user input interface provides a spatial representation of the N audio
channels and indicates which of the N audio channels are comprised in the sub-set
of M audio channels.
[0017] According to various, but not necessarily all, embodiments there is provided a multi-party,
live communication system that enables live audio communication between multiple remote
participants using at least the N audio channels wherein different ones of the multiple
remote participants provide audio input for different ones of the N audio channels,
wherein the system comprises the apparatus.
[0018] According to various, but not necessarily all, embodiments there is provided a method
comprising:
receiving at least N audio channels where each of the N audio channels can be rendered
as a different audio source;
control mixing of the N audio channels to produce at least an output audio channel,
wherein the mixing selects a sub-set of at least M audio channels from the N audio
channels in dependence upon prioritization of the N audio channels, wherein the prioritization
is adaptive and depends at least upon a content of one or more of the N audio channels;
and
rendering at least the output audio channel.
[0019] According to various, but not necessarily all, embodiments there is provided a computer
program that when run on one or more processors enables:
control mixing of N received audio channels, where each of the N audio channels can
be rendered as a different audio source, to produce at least an output audio channel
for rendering,
wherein the mixing selects a sub-set of at least M audio channels from the N audio
channels in dependence upon prioritization of the N audio channels, wherein the prioritization
is adaptive and depends at least upon a content of one or more of the N audio channels.
[0020] According to various, but not necessarily all, embodiments there is provided an apparatus
comprising means for:
receiving at least N audio channels where each of the N audio channels can be rendered
as a different audio source;
adapting a prioritization of the N audio channels in dependence upon, at least, changing
content of one or more of the N audio channels; and
controlling mixing of the N audio channels to produce at least an output audio channel,
wherein the mixing selects for mixing to produce the output audio channel, a sub-set
of M audio channels from the N audio channels, wherein the selection is in dependence
upon the prioritization; and
providing for rendering at least the output audio channel.
[0021] According to various, but not necessarily all, embodiments there is provided an apparatus
comprising means for:
receiving at least N audio channels where each of the N audio channels can be rendered
as a different audio source;
analyzing the N audio channels to adapt a prioritization of the N audio channels in
dependence upon, at least, changing content of one or more of the N audio channels;
and
controlling mixing of the N audio channels to produce at least an output audio channel,
wherein the mixing selects for mixing to produce the output audio channel, a sub-set
of M audio channels from the N audio channels, wherein the selection is in dependence
upon the prioritization; and
providing for rendering at least the output audio channel.
[0022] According to various, but not necessarily all, embodiments there is provided examples
as claimed in the appended claims.
BRIEF DESCRIPTION
[0023] Some examples will now be described with reference to the accompanying drawings in
which:
FIG. 1 illustrates an example of an apparatus for providing an output audio channel
for rendering;
FIG. 2 illustrates an example of an apparatus in which an analyzer is configured to
analyze the N audio channels to adapt the prioritization of the N audio channels in
dependence upon, at least, changing content of one or more of the N audio channels;
FIG. 3 illustrates another example of the apparatus;
FIG. 4 illustrates an example of a multi-party, live communication system comprising
the apparatus;
FIG. 5A and 5B illustrate alternative topologies of the system;
FIG. 6 illustrates an example of prioritization based on timing of content;
FIG. 7 illustrates an example of prioritization based on content type;
FIG. 8 illustrates an example of storage of unselected audio channels;
FIG. 9A, 9B, 9C illustrate examples of prioritization based on keywords in content;
FIG. 10 illustrates an example of informing a consumer of the output audio channel
of an option to change the audio channels included within the output audio channel;
FIG. 11A illustrates an example of spatial audio rendered, based on the N audio channels,
at an output end-point configured for rendering spatial audio;
FIG. 11B illustrates an example of audio rendered, based on the output audio channel,
at an output end-point that is not configured for rendering spatial audio;
FIG. 12, 13, 15 illustrate examples of a method;
FIG. 14 illustrates an example of changing prioritization based on timing of content;
FIG. 16 illustrates an example of a controller; and
FIG. 17 illustrates an example of a computer program.
DETAILED DESCRIPTION
[0024] The following description and the attached drawings describe various examples of
an apparatus 10 that receives at least N audio channels 20 and enables the rendering
of one or more output audio channels 52.
[0025] The set of N audio channels is referenced using reference number 20. Each audio channel
of the set of N audio channels is referenced using reference number 20
i, where i is 1, 2,...N-1, N.
[0026] The apparatus 10 comprises means for receiving at least N audio channels 20 where
each of the N audio channels 20
i can be rendered as a different audio source.
[0027] The apparatus 10 comprises means 40, 50 for controlling selection and mixing of the
N audio channels 20 to produce at least an output audio channel 52.
[0028] A selector 40 selects for mixing (to produce the output audio channel 52) a sub-set
30 of M audio channels from the N audio channels 20. The selection is dependent upon
prioritization 32 of the N audio channels 20. The prioritization 32 is adaptive depending
at least upon a changing content 34 of one or more of the N audio channels 20.
[0029] The sub-set 30 of M audio channels is referenced using reference number 30. Each
audio channel of the sub-set of M audio channels is referenced using reference number
20
j, where j is any M of the N values of i. The sub-set 30 can, for example, be varied
by changing the value of M and/or by changing which audio channels 20
j are used to comprise the M audio channel of the sub-set 30. In the description, different
sub-set 30 can, in some examples, be differentiated using the same reference 30 with
different numeric sub-scripts.
[0030] A mixer 50 mixes the sub-set 30 of M audio channels to produce the output audio channel
52 which is suitable for rendering.
[0031] An advanced spatial audio output device (an example is illustrated at FIG 11A) can
render the N audio channels 20 as multiple different spatially positioned audio sources.
A less advanced audio output device (an example is illustrated at FIG 11B) can render
the output audio channel 52.
[0032] The apparatus 10 therefore allows a common content, the N audio channels 20, to provide
audio output at both the advanced spatial audio output device and the less advanced
audio output device.
[0033] FIG. 1 illustrates an example of an apparatus 10 for providing an output audio channel
52 for rendering. The rendering of the output audio channel 52 can occur at the apparatus
10 or can occur at some other device.
[0034] The apparatus 10 receives at least N audio channels 20. An audio channel 20
i of the N audio channels 20 can be rendered as a distinct audio source.
[0035] The apparatus 10 comprises a mixer 50 for mixing a sub-set 30 of the M audio channels
20 to produce at least an output audio channel 52.
[0036] A selector 40 selects for mixing, at mixer 50, the sub-set 30 of M audio channels
from the N audio channels 20. The selection, by the selector 40, is dependent upon
prioritization 32 of the N audio channels 20. The prioritization 32 is adaptive depending
at least upon a changing content 34 of one or more of the N audio channels 20. The
apparatus 10 provides, from the mixer 50, the output audio channel 52 for rendering.
[0037] The sub-set 30 of M audio channels has less audio channels than the N audio channels
20, that is, M is less than N. N is at least two and in at least some examples is
greater than 2. In at least some examples M is one and the output audio channel 52
is a monophonic audio output channel.
[0038] The prioritization 32 is adaptive. The prioritization 32 depends at least on a changing
content 34 of one or more of the N audio channels 20.
[0039] In some but not necessarily all examples, the apparatus 10 is configured to automatically
control the mixing of the N audio channels 20 to produce at least the output audio
channel 52, in dependence upon time-variation of content 34 of one or more of the
N audio channels 20.
[0040] FIG. 2 illustrates an example of an apparatus 10 in which an analyzer 60 is configured
to analyze the N audio channels 20 to adapt the prioritization 32 of the N audio channels
20 in dependence upon, at least, changing content 34 of one or more of the N audio
channels 20.
[0041] The analysis can be performed before (or simultaneously with) the before-mentioned
selection.
[0042] In some examples, the analyzer 60 is configured to process metadata associated with
the N audio channels 20. Additionally or alternatively, in some examples, the analyzer
60 is configured to process the audio content of the audio channels 20. This processing
could, for example, comprise voice activity detection, voice recognition processing,
spectral analysis, semantic processing of speech or other processing including machine
learning and artificial intelligence processing used to identify characteristics of
the content 34 of one or more of the N audio channels 20.
[0043] The prioritization 32 can depend upon one or more parameters of the content 34.
[0044] In one example, the prioritization 32 depends upon timing of content 34
i of an audio channel 20
i relative to timing of content 34
j of an audio channel 20
j. Thus, the audio channel 20 that first satisfies a trigger condition has temporal
priority. In some examples the trigger condition may be that the audio channel 20
has activity above a threshold, and/or has activity above a threshold in a particular
spectral range and/or has voice activity and/or has voice activity associated with
a specific person and/or the voice activity comprises semantic content including a
particular keyword word or phrase.
[0045] An initial prioritization 32, can cause an initial selection of a first sub-set 30
1 of audio channels 20 that are mixed to form the output audio channel 52. A change
in prioritization 32, can cause a new selection of a second different sub-set 30
2 of audio channels 20 that are mixed to form a new, different output audio channel
52. The first sub-set 30
1 and the second sub-set 30
1 are not equal sets. Thus, apparatus 10 can prioritize one or more of the N audio
channels 20 as a sub-set 30 until a new selection by the selector 40 based on a new
prioritization 32 changes the sub-set 30.
[0046] If a person is speaking in a particular audio channel 20, first, that channel may
be prioritized ahead of a second audio channel. However, if the person speaking in
the first audio channel stops speaking then the prioritization 32 of the audio channels
can change and there can be a consequential reselection at the selector 40 of the
sub-set 30 of M audio channels provided for mixing to produce the output audio channel
52.
[0047] The apparatus 10 can flag at least one input audio channel 20 corresponding to a
first active talker, or generally active content 34, during a selection period and
prioritize this selection over other audio channels 20. The apparatus 10 can determine
whether the active talker continues before introducing content 34 from non-prioritized
channels to the mixed output audio channel 52. The introduction of such additional
content 34 from non-prioritized channels is controlled by the selector 40 during a
following selection period.
[0048] In some examples, non-prioritized audio channels 20 can be completely omitted from
the mixed output audio channel 52 and thus the mixed output audio channel 52 will
contain only the prioritized channel(s). However, in other examples, the non-prioritized
channels can be mixed with a lower gain or higher attenuation than the prioritized
channel and/or with other suitable processing to produce the output audio channel
52.
[0049] It will therefore be appreciated that in at least some examples, a history of content
34 of at least one of the N audio channels 20 can be used to control the prioritization
32. For example, it may be possible to vary the "inertia" of the system, that is control
the rate of change of the rate of change of prioritization. It is therefore possible
to make the apparatus 10 more or less responsive to short term variations in the content
34 of one or more of the N audio channels 20.
[0050] The selector 40 in making a selection of which of the N audio channels 20 to select
for mixing to produce the output audio channel 52 can, for example, use decision thresholds
for selection. A decision threshold can be changed over time and can be dependent
upon a history of the content 34. In addition, different decision thresholds can be
used for different audio channels 20.
[0051] In some examples, the prioritization 32 can be dependent upon mapping to a particular
person an identified voice in content 34 of at least one of the N audio channels 20.
The analyzer 60 can for example perform voice recognition based upon the content 34
of one or more of the N audio channels 20. Alternatively, the analyzer 60 can identify
a particular person based upon metadata comprised within the content 34 of at least
one of the N audio channels 20. It may therefore be possible to identify a particular
one of the N audio channels 20 as relating to a person whose contribution it is particularly
important to hear such as, for example, a chairman of a meeting.
[0052] In some examples, the analyzer 60 is configured to adapt the prioritization 32 when
the presence of voice content is detected within the content 34 of at least one of
the N audio channels 20. Thus, the analyzer 60 is able to prioritize the spoken word
within the output audio channel 52. It is also possible to adapt the analyzer 60 to
prioritize other types of content.
[0053] In some, but not necessarily all, examples, the analyzer 60 is configured to adapt
the prioritization 32 based upon detection that content 34 of at least one of the
N audio channels 20 comprises an identified keyword. The analyzer 60 can, for example,
listen to the content 34 and identify within the stream of content a keyword or identify
semantic meaning within the stream of content. This can be used to modify the prioritization
32. For example, it may be desirable for a consumer of the output audio channel 52
to have that output audio channel 52 personalized so that if one of the N audio channels
20 comprises content 34 that includes the consumer's name or other information associated
with the consumer then that audio channel 20 is prioritized by the analyzer 60.
[0054] In some, but not necessarily all, examples, the N audio channels 20 can represent
live content. In this example, the analysis by the analyzer 60, the selection by the
selector 40 and the mixing by the mixer 50 can occur in real time such that the output
audio channel 52 is also live.
[0055] FIG. 3 illustrates an example of the apparatus of FIG. 1 in more detail. In this
example one possible operation of the mixer 50 is illustrated in more detail. In this
example, the mixing is a weighted mixing in which different sub-sets of the sub-set
30 of selected audio channels are weighted with different attenuation/gain before
being finally mixed to produce the output audio channel 52.
[0056] In the illustrated example, the selector 40, based upon the prioritization 32, selects
a first sub-set SS1 of the M audio channels to be mixed to provide background audio
B and selects a second sub-set SS2 of the M audio channels 20 to be mixed to provide
foreground audio F that is for rendering at greater loudness than the background audio
B. The selection of the first sub-set SS1 and the selection of the second sub-set
SS2 is dependent upon the prioritization 32 of the N audio channels 20. The first
sub-set SS1 of audio channels 20 is mixed 50
1 to provide background audio B which is then amplified/attenuated G1 to adjust the
loudness of the background audio before it is provided to the mixer 50
3 for mixing to produce the output audio channel 52. The second sub-set SS2 of audio
channels 20 is mixed 50
2 to provide foreground audio F which is then amplified/attenuated G1 to adjust the
loudness of the background audio before it is provided to the mixer 50
3 for mixing to produce the output audio channel 52.
[0057] The gain/attenuation G2 applied to the foreground audio F makes it significantly
louder than the background audio B in the output audio channel 52. In some situations,
the foreground audio F is naturally louder than background audio B. Thus, it can be
but need not be that G2 > G1.
[0058] The gain/attenuation G1, G2 can, in some examples, vary with frequency.
[0059] FIG. 4 illustrates an example of a multi-party, live communication system 200 that
enables live audio communication between multiple remote participants A
i, B, C, D
i using at least the N audio channels 20. Different ones of the multiple remote participants
A
i, B, C, D
i provide audio input for different ones of the N audio channels 20.
[0060] The system 200 comprises input end-points 206 for capturing audio channels 20. The
system 200 comprises output end-points 204 for rendering audio channels. One or more
output end-points 204
s (spatial output end-points) are configured for rendering spatial audio as distinct
rendered audio sources. One or more output end-points 204
m (mono output end-points) are not configured for rendering spatial audio.
[0061] The N audio channels 20 are N spatial audio channels where each of the N spatial
audio channels is captured as a differently positioned captured audio source, and
can be rendered using spatial audio as a differently positioned rendered audio source.
In some examples the captured audio source (input end-point 206) has a fixed and stationary
position. However, in other examples it can vary in position. When such an input end-point
206 is rendered as a rendered audio source at an output end-point 204 using spatial
audio, then the rendered audio source can either be fixed or can move, for example,
in a manner corresponding to the moving input end-point 206.
[0062] In this example, the system 200 is for enabling immersive teleconferencing or telepresence
for remote terminals. The different terminals have varying device capabilities and
different (and possibly variable) network conditions.
[0063] Spatial/immersive audio refers to audio that typically has a three-dimensional space
representation or is presented (rendered) to a participant with the intention of the
participant being able to hear a specific audio source from a specific direction.
In the specific example illustrated there is a multi-participant audio/visual conference
call between remote participants. Some of the participants share a room. For example,
participants A
1, A
2, A
3, A
4 share the room A and the participants D
1, D
2, D
3, D
4, D
5 share the room D.
[0064] Some of the terminals can be characterized as "advanced spatial audio output devices"
that have an output end-point 204
s that is configured for spatial audio. However, some of the terminals are less advanced
audio output devices that have an output end-point 204
m that is not configured for spatial audio.
[0065] In a spatial audio experience, the voices of the participants A
i, B, C, D
i are spatially separated. The voices may, for example, have fixed spatial positions
relative to each other or the directions may be adaptive, for example, according to
participant movements, conference bridge settings or based upon inputs by participants.
A similar experience is available to the participants who are using the output end-points
204
s and they have the ability to interact much more naturally than traditional voice
calls and voice conferencing. For example, they can talk at the same time and still
understand each other thanks to effects such as the well-known cocktail party effect.
[0066] In rooms A and D, each of the respective participants A
i, D
i has a personal input end-point 206 which captures a personal captured audio source
as a personal audio channel 20. The personal input end-point 206 can, for example,
be provided by a directional microphone or by a Lavalier microphone.
[0067] The participants B and C each have a single personal input end-point 206 which captures
a personal audio channel 20.
[0068] In rooms A and D, the output end-points 204
s are configured for spatial audio. For example, each room can have a surround sound
system as an output end-point 204
s.
An output end point 204
s is configured to render each captured sound source represented by an audio channel
20 as a rendered sound source.
[0069] In room D, each participant A
i, B, C has a personal output audio channel 20. Each personal output audio channel
20 is rendered from a different location as a different rendered audio source. The
collection of rendered audio sources associated with the participants A
i creates a virtual room A.
[0070] In room A, each participant D
i, B, C has a personal output audio channel 20. Each personal output audio channel
20 is rendered from a different location as a different rendered sound source. The
collection of the rendered audio sources associated with the participants D
i creates a virtual room D.
[0071] For participant C, the output end-point 204
s is configured for spatial audio. For example,
as an output end-point 204
s.
An output end point 204
s is configured to render each captured sound source represented by an audio channel
20 as a rendered sound source.
[0072] The participant C has an output end-point 204
s that is configured for spatial audio. In this example, the participant C is using
a headset configured for binaural spatial audio that is suitable for virtual reality
(VR). Binauralization methods can be used to render personal audio channels 20 as
spatially positioned rendered audio sources, Each participant Ai, Di, B has a personal
output audio channel 20. Each personal output audio channel 20 is or can be rendered
from a different location as a different rendered sound source.
[0073] The participant B has an output end-point 204
m that is not configured for spatial audio. In this example it is a monophonic output
end-point. In the example illustrated, the participant B is using a mobile device
(e.g. a mobile phone) to provide the input end-point 206 and the output end-point
204
m. The mobile device has a single output end-point 204
m which provides the output audio channel 52 as previously described. The processing
to produce the output audio channel 52 can be performed at the mobile device of the
participant C or at the server 202.
[0074] The mono-capability limitation of participant B can, for example, be caused by the
device, for example it is only configured for decoding of mono audio or because of
the available audio output facilities such as a mono-only earpiece or headset.
[0075] In the preceding examples the spatial audio has been described at a high resolution.
Each of the input end-points 206 is rendered in spatial audio as a spatially distinct
rendered audio source. However, in other examples multiple ones of the input end-points
206 may be mixed together to produce a single rendered audio source. This can be used
to reduce the number of rendered audio sources using spatial audio. Therefore, in
some examples, a spatial audio device may render multiple ones of output audio channels
52.
[0076] In the example illustrated in FIG. 4, a star topology similar to that illustrated
in FIG. 5A is used. The central server 202 interconnects the input end-points 206
and the output end-points 204. In the example of FIG. 5A, the input end-points 206
provide the N audio channels 20 to a central server 202 which produces the output
audio channel 52 as previously described to the output end-point 204
m. In this example, the apparatus 10 is located in the central server 202, however,
in other examples the apparatus 10 is located at the output end-point 204
m.
[0077] FIG. 5B illustrates an alternative topology in which there is no centralized architecture
but a peer-to-peer architecture. In this example, the apparatus 10 is located at the
output end-point 204m.
[0078] The 3GPP IVAS codec is an example of a voice and audio communications codec for spatial
audio. The IVAS codec is an extension of the 3GPP EVS codec and is intended for new
immersive voice and audio services over 4G and 5G. Such immersive services include,
for example, immersive voice and audio for virtual reality (VR). The multi-purpose
audio codec is expected to handle encoding, decoding and rendering of speech, music
and generic audio. It is expected to support channel-based audio and scene-based audio
inputs including spatial information about the sound field and sound sources. It is
also expected to operate with low latency to enable conversational services as well
as support high error robustness under various transmission conditions. The audio
channels 20 can, for example, be coded/decoded using the 3GPP IVAS codec.
[0079] The spatial audio channels 20 can, for example, be provided as metadata-assisted
spatial audio (MASA), objective-based audio, channel-based audio (5.1, 7.1+4), non-parametric
scene-based audio (e.g. First Order Ambisonics, High Order Ambisonics) and any combination
of these formats. These audio formats can be binauralized for headset listening such
that a participant can hear the audio sources outside their head.
[0080] It will therefore be appreciated from the foregoing that the apparatus 10 provides
a better experience, including improved intelligibility for a mono user participating
in a spatial audio teleconference with several potentially overlapping spatial audio
inputs. The apparatus 10 means that it is not necessary, in some cases, to simplify
the spatial audio conference experience for the spatial audio users due to having
a mono-audio participant. Thus, a mono user can participate in a spatial audio conference
without compromising the experience of the other users.
[0081] FIGS 6, 7, 8 and 9A illustrate examples of an apparatus 10 that comprises a controller
70. The controller receives N audio channels 20 and performs control processing to
select the sub-set 30 of M audio channels. In the examples previously described, the
controller 70 comprises the selector 40 and, optionally, the analyzer 60. In these
examples, the mixer 50 is present but not illustrated.
[0082] In at least some of these examples, the controller 70 is configured to control mixing
of the N audio channels 20 to produce the sub-set 30 of M audio channels when a conflict
between a first audio channel of the N audio channels 20 and a second audio channel
of the N audio channels occurs. For example, the control can cause the first audio
channel 20 to be included within the sub-set 30 of M audio channels and cause the
second audio channel 20 not to be included within the sub-set 30 of M audio channels.
[0083] In some examples, at a later time, when there is no longer conflict between the first
audio channel and the second audio channel, the second audio channel is included within
the sub-set 30 of M audio channels.
[0084] One example of when there is conflict between audio channels is when there is simultaneous
activity from different prioritized sound sources. For example, overtalking (simultaneous
speech) associated with different audio channels 20 can be an example of conflict.
[0085] In the example illustrated in FIG. 6, the prioritization 32 used for the selection
of audio channels to form the sub-set 30 of M audio channels depends upon timing of
content 34 of at least one of the N audio channels 20 relative to timing of content
34 of at least another one of the N audio channels 20.
[0086] In this example, the participant 3 speaks first and the audio channel 20
3 associated with the participant 3 is selected as a 'priority' for inclusion within
the sub-set 30 of M=1 audio channels used to form the output audio channel 52. The
later speech by participants 4 and 5 is not selected for inclusion within the sub-set
30 of audio channels used to form the output audio channel 52.
[0087] The audio channel 20
3 preferentially remains prioritized and remains included within the output audio channel
52, while there is voice activity in the audio channel 20
3, whereas the audio channels 20
4, 20
5 are excluded. If voice activity is no longer detected in the audio channel 20
3 then in some examples a selection process may immediately change the identity of
the audio channel 20 selected for inclusion within the output audio channel 52. However,
in other examples there can be a selection grace period. During this grace period,
there can be a greater likelihood of selection/reselection of the original selected
audio channel 20
3. Thus, during the grace period prioritization 32 is biased in favor of the previously
selected audio channel.
[0088] It will therefore be appreciated that in at least some examples, prioritization 32
used for the selection depends upon a history of content 34 of at least one of the
N audio channels 20.
[0089] In some examples, the prioritization 32 used for the selection can depend upon mapping
to a particular person (an identifiable human), an identified voice in content 34
of at least one of the N audio channels 20. A voice can be identified using metadata
or by analysis of the content 34. The prioritization 32 would more favorably select
the particular person's audio channel 20 for inclusion within the output audio channel
52.
[0090] The particular person could, for example, be based upon service policy. A teleconference
service may have a moderator or chairman role and this participant may for example
be made audible to all participants or may be able to force themselves to be audible
to all participants. In other examples, the particular person could for example be
indicated by a user consuming the output audio channel 52. That consumer could for
example indicate which of the other participants' content 34 or audio channels 20
they wish to consume. This audio channel 20 could then be included, or be more likely
to be included, within the output audio channel 52. The inclusion of the user-selected
audio channel 20 can for example be dependent upon voice activity within the audio
channel 20, that is, the user-selected audio channel 20 is only included if there
is active voice activity within that audio channel 20. The prioritization 32 used
for the selection therefore strongly favors the user-selected audio channel 20. The
selection by the consumer of the output audio channel 52 of a particular audio channel
20 can for example be based upon an identity of the participant who is speaking or
should speak in that audio channel. Alternatively, it could be based upon a user-selection
of that audio channel because of the content 34 rendered within that audio channel.
[0091] FIG. 7 illustrates an example similar to FIG. 6. In this example, the audio channels
20 include a mixture of different audio types. The audio channel 20
3 associated with participant3 is predominantly a voice channel. The audio channels
20
4, 20
5 associated with participants 4 and 5 are predominantly instrumental/music channels.
In this example, the selection of which of the audio channels 20 is to be included
within the output audio channel 52 can be based upon the audio type present within
the audio channel 20. The detection of the audio type within the audio channel 20
can for example be achieved using metadata or, alternatively, by analyzing the content
34 of the audio channel 20. Thus, the prioritization 32 used for selection can be
dependent upon detection that content 34 of at least one of the N audio channels 20
is voice content. In such a voice-central case, natural pauses in the active content
34 allow for changes in the mono downmix. That is, the output audio channel 52 can
switch between the inclusion of different audio channels 20 in dependence upon which
of them includes active voice content. In this way priority can be given to spoken
language. The other channels for example the music channels 20
4, 20
5 may optionally be included, for example as background audio as previously described
with relation to FIG. 3.
[0092] In the examples illustrated in FIGS 6 and 7, the apparatus 10 deliberately loses
information by excluding (or diminishing) audio channels 20 with respect to the output
audio channel 52. Information is generally lost by the selective downmixing which
is required to maintain or guarantee intelligibility. It is, however, possible for
there to be two simultaneously important audio channels 20, only one of which is selected
for inclusion in the output audio channel 52. The apparatus illustrated in FIG. 8
addresses this issue.
[0093] The apparatus 10 illustrated is similar to that illustrated in FIGS 6 and 7. However,
it additionally comprises a memory 82 for storage of a further sub-set 80 of the N
audio channels 20 that is different to the sub-set 30 of N audio channels 20. Thus,
in this example at least some of the audio channels of the N audio channels 20 that
are not selected for inclusion in the sub-set 30 of M audio channels, are stored as
sub-set 80 and are available for later rendering. In some examples, the later rendering
may be at a faster playback rate and that playback may be fixed or may be adaptive.
In some examples, the sub-set 80 of audio channels is mixed to form an alternative
audio output channel for storage in the memory 82.
[0094] In the specific example illustrated at least some of the audio channels of the N
audio channels that are not selected to be in the sub-set 30 of M audio channels are
stored in memory 82 for later rendering.
[0095] In the particular illustrated example, there is selection of a first sub-set 30 of
M audio channels from the N audio channels based upon prioritization 32 of the N audio
channels. The first sub-set 30 of M audio channels are mixed to produce a first output
audio channel 52. There is selection of a different second sub-set 80 of audio channels
from the N audio channels based upon prioritization 32 of the N audio channels. The
second sub-set 80 of audio channels are mixed to produce a second output audio channel
for storage.
[0096] In the example illustrated in FIG. 8, the audio channel 20
3 includes content 34 comprising voice content from a single participant, and it is
selected for inclusion within the sub-set 30 of audio channels. It is used to produce
the output audio channel 52. The audio channels 20
4, 20
5, which have not been included within the output audio channel 52, or included only
as background (as described with reference to FIG. 3), are selected for mixing to
produce the second output audio signal that is stored in memory 82.
[0097] When there is storage of a second sub-set 80 of audio channels as a second audio
signal, it is desirable to let the consumer of the output audio channel 52 know of
the existence of the stored audio signal. This can for example facilitate user control
of switching from rendering the output audio channel 52 to rendering the stored audio
channel.
[0098] FIG. 10 illustrates an example of how such an indication may be provided to the consumer
of the output audio channel 52. Fig 10 is described in detail later.
[0099] In some examples, it may be possible to automatically switch from rendering the output
audio channel 52 to rendering the stored audio channel. For example, there may be
automatic switching during periods of inactivity of the output audio channel 52. An
apparatus 10 may switch to the stored audio channel and play that back at a higher
speed. For example, the apparatus 10 can monitor the typical length of inactivity
in the preferred output audio channel 52 and adjust the speed of playback for the
stored audio channel such that the relevant portions can be played back during a typical
inactive period.
[0100] FIG. 9A illustrates an example in which the apparatus 10 detects that content 34
of at least one of the N audio channels 20 comprises an identified keyword and adapts
the prioritization 32 accordingly. The prioritization 32 in turn controls selection
of which of the audio channels 20 are included in the sub-set 30 and the output audio
channel 52 (and, if implemented, the stored alternative audio channel).
[0101] In the example illustrated in FIG. 9B, the participant 'User 3' is speaking first
and has priority. Therefore, the audio channel 20
3 associated with the User 3 is initially selected as the priority audio channel and
is included within the output audio channel 52. Even though the participant 'User
5' begins to talk, the prioritization is not changed and the audio channel 20
3 remains the priority audio channel included within the output audio channel 52. At
time T1 it is detected that User 5 says a keyword, in this example the name of the
consumer of the output audio channel 52 (Dave). While this event increases the likelihood
of a switch in the prioritization of the audio channels 20
3, 20
5 such that the audio channel 20
5 becomes prioritized and included in the output audio channel 52, in this example
there is insufficient cause to change the prioritization 32 and consequently change
which of the audio channels 20 is included within the output audio channel 52.
[0102] In the example illustrated in FIG. 9C, the participant 'User 3' is speaking first
and has priority. Therefore, the audio channel 20
3 associated with the User 3 is initially selected as the priority audio channel and
is included within the sub-set 30 used to produce the output audio channel 52. Even
though the participant 'User 5' begins to talk, the prioritization is not changed
and the audio channel 20
3 remains the priority audio channel included within the sub-set 30 and the output
audio channel 52. At time T1 it is detected that User 5 says a keyword, in this example
the name of the consumer of the output audio channel 52 (Dave). This event causes
a switch in the prioritization of the audio channels 20
3, 20
5 such that the audio channel 20
5 becomes prioritized and included in the sub-set 30 and the output audio channel 52
and the audio channel 20
3 becomes de-prioritized and excluded from the sub-set 30 and the output audio channel
52.
[0103] In some examples, the consumer of the output audio channel 52 can via user input
settings control the likelihood of a switch when a keyword is mentioned within an
audio channel 20. For example, the consumer of the output audio channel 52 can, for
example, require a switch if a keyword is detected. Alternatively, the likelihood
of a switch can be increased.
[0104] In other examples, the occurrence of a keyword can increase the prioritization of
an audio channel 20 such that it is stored, for example as described in relation to
FIG. 8.
[0105] In other examples, the detection of a keyword may provide an option to the consumer
of the output audio channel 52, to enable the consumer to cause a change in the audio
channel 20 included within the sub-set 30 and the output audio channel 52. For example,
if the name of the consumer of the output audio channel 52 is included within an audio
channel 20 that is not being rendered, as a priority, within the output audio channel
52 then the consumer of the output audio channel 52 can be presented with an option
to change prioritization 32 and switch to using a sub-set 30 and output audio channel
52 that includes the audio channel 20 in which their name was detected.
[0106] Where a detected keyword causes a switch in the audio channels included in the sub-set
30 and output audio channel 52, the new output audio channel 52 based on the detected
keyword may be played back from the occurrence of the detected keyword. In some examples
the playback is at a faster rate to allow a catch-up with real time.
[0107] FIG. 10 illustrates an example in which a consumer of the output audio channel 52
is provided with information to allow that consumer to make an informed decision to
switch audio channels 20 included within the sub-set 30 and the output audio channel
52.
[0108] In some examples, some form of indication is given to indicate a change in activity
status. For example, if a particular participant begins to talk or there is a second
separate discussion ongoing, the consumer of the original output audio channel 52
is made aware of this.
[0109] A suitable indicator could for example be an audible indicator that is added to the
output audio channel 52. In some examples, each participant may have an associated
different tone and a beep with a particular tone may indicate which participant has
begun to speak. Alternatively, an indicator could be a visual indictor in an input
user interface.
[0110] In the example illustrated in FIG. 10, the background audio is adapted to provide
an audible indication. Initially, the consumer listening to the output audio channel
52 hears the audio channel 20
1 associated with a first participant's voice (User A voice). If a second audio channel
20 is mixed with the audio channel 20
1, then it may, for example, be an audio channel 20
2 that captures the ambient audio of the first participant (User A ambience). At time
T1 a second participant, User B, begins to talk. This does not initiate a switch of
prioritization 32 sufficient to change the sub-set 30. The primary audio channel 20
in the sub-set 30 and the output audio channel 52 remains the audio channel 20
1. However, an indication is provided to indicate to the consumer of the output audio
channel 52 that there is an alternative, available, audio channel 20
3. The indication is provided by mixing the primary audio channel 20
1 with an additional audio channel 20 associated with the User B. For example, the
additional audio channel 20 can be an attenuated version of the audio channel 20
3 or can be an ambient audio channel 20
4 for the User B (User B ambience). In this example, the second audio channel 20
2 is replaced by the additional audio channel 20
4.
[0111] The consumer of the output audio channel 52 can then decide whether or not they wish
to cause a change in the prioritization 32 to prioritize the audio channel 20
3 associated with the User B above the audio channel 20
1 associated with the User A. If this change in prioritization occurs then there is
a switch in the primary audio channel within the sub-set 30 and the output audio channel
52 from being the audio channel 20
1 to being the audio channel 20
3. In the example illustrated, the consumer does not make this switch. The switch does
however occur automatically when the User A stops talking at time T2.
[0112] In the example of FIG. 10, referring back to the example of FIG. 3, the background
audio B can be included and/or varied as an indication to the consumer of the output
audio channel 52 that an alternative audio channel 20 is available for selection.
[0113] FIG. 11A schematically illustrates audio rendered to a participant (User 5) at an
output end-point 204
s of the system 200 (not illustrated) that is configured for rendering spatial audio.
In accordance with the preceding examples, the audio output at the end-point 204
s has multiple rendered sound sources associated with audio channels 20
1. 20
2, 20
3, 20
4 at different locations. FIG. 11A illustrates that even with the presence in the system
200 (not illustrated) of an output end-point 204
m (FIG 11B) that is not configured for spatial audio rendering, there may be no need
to reduce the immersive capabilities or experience at the output end-points 204
s of the system 200 that are configured for rendering spatial audio.
[0114] FIG. 11B schematically illustrates audio rendered to a participant (User 1) at an
output end-point 204
m of the system 200 (not illustrated) that is not configured for rendering spatial
audio. In accordance with the preceding examples, the audio output at the end-point
204
m provided by the output audio channel 52 has a single monophonic output audio channel
52 that is based on the sub-set 30 of selected audio channels 20 and has good intelligibility.
In the example illustrated, the audio channel 20
2 is the primary audio channel that is included in the sub-set 30 and the output audio
channel 52.
[0115] The apparatus 10 can be configured to automatically switch the composition of the
audio channels 20 mixed to form the output audio channel 52 in dependence upon an
adaptive prioritization 32. Additionally or alternatively, in some examples, the switching
can be effected manually by the consumer at the end-point 204
m using a user interface which includes a user input interface 90.
[0116] In the example illustrated in FIG. 11B, the device at the output end-point 204
s, which in some examples may be the apparatus 10, comprises a user input interface
90 for controlling prioritization 32 of the N audio channels 20. For example, the
user input interface 90 can be configured to highlight or label selected ones of the
N audio channels 20 for selection. The user input interface 90 can be used to control
if and to what extent manual or automatic switching occurs to produce the output audio
channel 52 from selected ones of the audio channels 20. An adaptation of the prioritization
32 can cause an automatic switching or can cause a prompt to a consumer for manual
switching.
[0117] In some examples, the user input interface 90 can control if and the extent to which
prioritization 32 depends upon one or more of timing of content 34 of at least one
of the N audio channels 20 relative to timing of content 34 of at least another one
of the N audio channels 20; history of content 34 of at least one of the N audio channels
20; mapping to a particular person an identified voice in content 34 of at least one
of the N audio channels 20; detection that content 34 of at least one of the N audio
channels 20 is voice content; and/or detection that content 34 of at least one of
the N audio channels comprises an identified word.
[0118] In the example illustrated, within the user input interface 90, there is an option
91
4 that allows the participant, User 1, to select the audio channel 20
4 as a replacement primary audio channel that is included in the sub-set 30 and the
output audio channel 52 instead of the audio channel 20
2. There is also an option 91
3 that allows User 1 to select the audio channel 20
3 as a replacement primary audio channel that is included in the sub-set 30 and the
output audio channel 52 instead of the audio channel 20
2.
[0119] In some but not necessarily all examples, the user input interface 90 can provide
a visual spatial representation of the N audio channels 20 and indicate which of the
N audio channels 20 are comprised in the sub-set 30 of M audio channels.
[0120] The user input interface 90 can also indicate which of the N audio channels are not
comprised in the sub-set 30 of M audio channels and which, if any, of these are active.
[0121] In some, but not necessarily all, examples, the user input interface 90 may provide
textual information about an audio channel 20 that is active and available for selection.
For example, speech-to-text algorithms may be utilized to convert speech within that
audio channel 20 into an alert displayed at the user input interface 90. Referring
back to the example illustrated in FIG. 9A, the apparatus 10 may be configured to
cause the user input interface 90 to provide an option to a consumer of the output
audio channel 52 that enables that consumer to switch audio channels 20 included within
the sub-set 30 and output audio channel 52. In this example, the keyword is "Dave"
and the textual output provided by the user input interface 90 could, for example,
say "option to switch to User 5 who addressed you and said: 'In our last teleco Dave
made an interesting'". If the consumer, Dave, then selects the option to switch, the
sub-set 30 and the output audio channel 52 then includes the audio channel 20
5 from the User 5 and starts from the position "In our last teleco Dave made an interesting...".
A memory 82 (not illustrated in the FIG) could be used to store the audio channel
20
5 from the User 5.
[0122] In the preceding examples, the apparatus 10 can be permanently operational to perform
the selection of the sub-set 30 of audio channels 20 used to produce the output audio
channel 52. However, in other examples the apparatus 10 has a state in which it is
operational in this way and a state in which it is not operation in this way and it
can transition between these states, for example when a trigger event is or is not
detected.
The apparatus 10 can be configured to control a mixer 50 mixing of the N audio channels
20 to produce M audio channels in response to a trigger event,
[0123] One example of a trigger event is conflict between audio channels 20. An example
of detecting conflict would be when there is overlapping speech in audio channels
20.
[0124] Another example of a trigger event is a reduction in communication bandwidth for
receiving the audio channels 20 below a threshold value. In this example, the value
of M can be dependent upon the available bandwidth.
[0125] Another example of a trigger event is a reduction in communication bandwidth for
providing the output audio channel 52 beneath a threshold value. In this example,
the value of M can be dependent upon the available bandwidth.
[0126] In some examples, where the apparatus 10 can also be configured to control the transmission
of audio channels 20 to it, and reduce the number of audio channels received by N-M
from N to M, wherein only the M audio channels that may berequired for mixing to produce
the output audio channel 52 are received.
[0127] FIG. 12 illustrates an example of a method 100 that can for example be performed
by the apparatus 10. The method comprises, at block 102, receiving at least N audio
channels 20 where each of the N audio channels 20 can be rendered as a different audio
source.
[0128] The method 100 comprises, at block 104, controlling mixing of the N audio channels
20 to produce at least an output audio channel 52, wherein the mixer 50 selects a
sub-set 30 of at least M audio channels from the N audio channels 20 in dependence
upon prioritization 32 of the N audio channels 20, wherein the prioritization 32 is
adaptive and depends at least upon a content 34 of one or more of the N audio channels
20. The method 100 further comprises, at block 106, causing rendering of at least
the output audio channel 52.
[0129] FIG. 13 illustrates a method 110 for producing the output audio channel 52. This
method broadly corresponds to the method previously described with reference to FIG.
6.
[0130] At block 112, the method 110 comprises obtaining spatial audio signals from at least
two sources as distinct audio channels 20. At block 114, the method 110 comprises
determining temporal activity of each of the spatial audio signals (of the two audio
channels 20) and selecting at least one spatial audio signal (audio channel 20) for
mono downmix (for inclusion within the sub-set 30 and the output audio channel 52)
for duration of its activity. At block 116, the method 110 comprises determining a
content-based priority for at least one of the spatial audio signals (audio channels
20) for temporarily altering a previous selection. At block 118, the method 110 comprises
determining a first mono downmix (sub-set 30 and output audio channel 52) based on
at least one of the prioritized spatial audio signals (audio channels 20). The output
audio channel 52 is based upon the selected sub-set M which is in turn based upon
the prioritization 32. Then at block 120, the method 110 provides the first mono downmix
(the output audio channel 52) to the participant for listening. That is, it provides
the output audio channel 52 for rendering.
[0131] It will therefore be appreciated that the prioritization 32 determined at block 116
is used to adaptively adjust selection of the sub-set 30 of M audio channels 20 used
to produce the output audio channel 52.
[0132] FIG. 14 illustrates an example in which the audio channel 20
3 is first selected, based on prioritization, as the primary audio channel in the output
audio channel 52. In this example, at this time, the output audio channel 52 does
not comprise the audio channel 20
4 or 20
5. Until the activity in the selected audio channel 20
3 ends, the audio channel 20
3 remains prioritized. There is no change to the selection of the sub-set 30 of M audio
channels until the activity in the audio channel 20
3 ends. When the activity in the audio channel 20
3 ends then a new selection process can occur based upon the prioritization 32 of other
channels. In this example there is a selection grace period after the end of activity
in the audio channel 20
3. If there is resumed activity in the audio channel 20
3 during this selection grace period then the audio channel 20
3 will be re-selected as the primary channel to be included in the sub-set 30 and the
output audio channel 52. Thus during the selection grace period the audio channel
20
3 can have a higher prioritization and be selected if it becomes active. After the
selection grace period expires, the prioritization of the audio channel 20
3 can be decreased.
[0133] FIG. 15 illustrates an example of a method 130 that broadly corresponds to the method
previously described in relation to FIG. 8. At block 132, the method 130 comprises
obtaining spatial audio signals (audio channels 20) from at least two sources. This
corresponds to the receiving of at least two audio channels 20. At block 132, the
method 130 determines a first mono downmix (sub-set 30 and output audio channel 52)
based on at least one of the spatial audio signals (audio channels 20). Next, at block
136, the method 130 comprises determining at least one second mono downmix based (sub-set
80 and additional audio channel) on at least one of the spatial audio signals (audio
channels 20) not present in the first mono downmix. At block 138, the first mono downmix
is provided to a participant for listening as the output audio channel 52. At block
140, the second mono downmix is provided to a memory for storage.
[0134] In any of the examples, when an audio channel 20 associated with a particular input
end-point 206 is selected for inclusion within the sub-set 30 of audio channels used
to create the output audio channel 52, then this information may be provided as a
feedback at an output end-point 204 associated with that included input end-point
206.
[0135] In any of the examples, when an audio channel 20 associated with a particular input
end-point 206 is not selected for inclusion within the sub-set 30 of audio channels
used to create the output audio channel 52 at a particular output end point 204, then
this information may be provided as a feedback at an output end-point 204 associated
with that excluded input end-point 206. The information can for example identify the
input end-points 206 not selected for inclusion for rendering at a particular identified
output end-point 204.
[0136] FIG. 16 illustrates an example of a controller 70. Implementation of a controller
70 may be as controller circuitry. The controller 70 may be implemented in hardware
alone, have certain aspects in software including firmware alone or can be a combination
of hardware and software (including firmware).
[0137] As illustrated in FIG. 16 the controller 70 may be implemented using instructions
that enable hardware functionality, for example, by using executable instructions
of a computer program 76 in a general-purpose or special-purpose processor 72 that
may be stored on a computer readable storage medium (disk, memory etc) to be executed
by such a processor 72.
[0138] The processor 72 is configured to read from and write to the memory 74. The processor
72 may also comprise an output interface via which data and/or commands are output
by the processor 72 and an input interface via which data and/or commands are input
to the processor 72.
[0139] The memory 74 stores a computer program 76 comprising computer program instructions
(computer program code) that controls the operation of the apparatus when loaded into
the processor 72. The computer program instructions, of the computer program 76, provide
the logic and routines that enables the apparatus to perform the previously methods
illustrated and/or described. The processor 72 by reading the memory 74 is able to
load and execute the computer program 76.
[0140] The apparatus 10 therefore comprises:
at least one processor 72; and
at least one memory 74 including computer program code
the at least one memory 74 and the computer program code configured to, with the at
least one processor 72, cause the apparatus 10 at least to perform:
receiving at least N audio channels where each of the N audio channels can be rendered
as a different audio source
control mixing of the N audio channels to produce at least an output audio channel,
wherein the mixing selects a sub-set of at least M audio channels from the N audio
channels in dependence upon prioritization of the N audio channels, wherein the prioritization
is adaptive and depends at least upon a content of one or more of the N audio channels;
causing rendering at least the output audio channel.
[0141] As illustrated in FIG. 17, the computer program 76 may arrive at the apparatus 10
via any suitable delivery mechanism 78. The delivery mechanism 78 may be, for example,
a machine readable medium, a computer-readable medium, a non-transitory computer-readable
storage medium, a computer program product, a memory device, a record medium such
as a Compact Disc Read-Only Memory (CD-ROM) or a Digital Versatile Disc (DVD) or a
solid state memory, an article of manufacture that comprises or tangibly embodies
the computer program 76. The delivery mechanism may be a signal configured to reliably
transfer the computer program 76. The apparatus 10 may propagate or transmit the computer
program 76 as a computer data signal.
[0142] Computer program instructions for causing an apparatus to perform at least the following
or for performing at least the following:
receiving at least N audio channels where each of the N audio channels can be rendered
as a different audio source
control mixing of the N audio channels to produce at least an output audio channel,
wherein the mixing selects a sub-set of at least M audio channels from the N audio
channels in dependence upon prioritization of the N audio channels, wherein the prioritization
is adaptive and depends at least upon a content of one or more of the N audio channels;
causing rendering at least the output audio channel.
[0143] The computer program instructions may be comprised in a computer program, a non-transitory
computer readable medium, a computer program product, a machine readable medium. In
some but not necessarily all examples, the computer program instructions may be distributed
over more than one computer program.
[0144] Although the memory 74 is illustrated as a single component/circuitry it may be implemented
as one or more separate components/circuitry some or all of which may be integrated/removable
and/or may provide permanent/semi-permanent/ dynamic/cached storage.
[0145] Although the processor 72 is illustrated as a single component/circuitry it may be
implemented as one or more separate components/circuitry some or all of which may
be integrated/removable. The processor 72 may be a single core or multi-core processor.
[0146] References to 'computer-readable storage medium', 'computer program product', 'tangibly
embodied computer program' etc. or a 'controller', 'computer', 'processor' etc. should
be understood to encompass not only computers having different architectures such
as single /multi- processor architectures and sequential (Von Neumann)/parallel architectures
but also specialized circuits such as field-programmable gate arrays (FPGA), application
specific circuits (ASIC), signal processing devices and other processing circuitry.
References to computer program, instructions, code etc. should be understood to encompass
software for a programmable processor or firmware such as, for example, the programmable
content of a hardware device whether instructions for a processor, or configuration
settings for a fixed-function device, gate array or programmable logic device etc.
[0147] As used in this application, the term 'circuitry' may refer to one or more or all
of the following:
- (a) hardware-only circuitry implementations (such as implementations in only analog
and/or digital circuitry) and
- (b) combinations of hardware circuits and software, such as (as applicable):
- (i) a combination of analog and/or digital hardware circuit(s) with software/firmware
and
- (ii) any portions of hardware processor(s) with software (including digital signal
processor(s)), software, and memory(ies) that work together to cause an apparatus,
such as a mobile phone or server, to perform various functions and
- (c) hardware circuit(s) and or processor(s), such as a microprocessor(s) or a portion
of a microprocessor(s), that requires software (e.g. firmware) for operation, but
the software may not be present when it is not needed for operation.
This definition of circuitry applies to all uses of this term in this application,
including in any claims. As a further example, as used in this application, the term
circuitry also covers an implementation of merely a hardware circuit or processor
and its (or their) accompanying software and/or firmware. The term circuitry also
covers, for example and if applicable to the particular claim element, a baseband
integrated circuit for a mobile device or a similar integrated circuit in a server,
a cellular network device, or other computing or network device.
[0148] The blocks illustrated in the preceding Figs may represent steps in a method and/or
sections of code in the computer program 76. The illustration of a particular order
to the blocks does not necessarily imply that there is a required or preferred order
for the blocks and the order and arrangement of the block may be varied. Furthermore,
it may be possible for some blocks to be omitted.
[0149] Where a structural feature has been described, it may be replaced by means for performing
one or more of the functions of the structural feature whether that function or those
functions are explicitly or implicitly described.
[0150] The above described examples find application as enabling components of:
automotive systems; telecommunication systems; electronic systems including consumer
electronic products; distributed computing systems; media systems for generating or
rendering media content including audio, visual and audio visual content and mixed,
mediated, virtual and/or augmented reality; personal systems including personal health
systems or personal fitness systems; navigation systems; user interfaces also known
as human machine interfaces; networks including cellular, non-cellular, and optical
networks; ad-hoc networks; the internet; the internet of things; virtualized networks;
and related software and services.
[0151] The term 'comprise' is used in this document with an inclusive not an exclusive meaning.
That is any reference to X comprising Y indicates that X may comprise only one Y or
may comprise more than one Y. If it is intended to use 'comprise' with an exclusive
meaning then it will be made clear in the context by referring to "comprising only
one.." or by using "consisting".
[0152] In this description, reference has been made to various examples. The description
of features or functions in relation to an example indicates that those features or
functions are present in that example. The use of the term 'example' or 'for example'
or 'can' or 'may' in the text denotes, whether explicitly stated or not, that such
features or functions are present in at least the described example, whether described
as an example or not, and that they can be, but are not necessarily, present in some
of or all other examples. Thus 'example', 'for example', 'can' or 'may' refers to
a particular instance in a class of examples. A property of the instance can be a
property of only that instance or a property of the class or a property of a sub-class
of the class that includes some but not all of the instances in the class. It is therefore
implicitly disclosed that a feature described with reference to one example but not
with reference to another example, can where possible be used in that other example
as part of a working combination but does not necessarily have to be used in that
other example.
[0153] Although examples have been described in the preceding paragraphs with reference
to various examples, it should be appreciated that modifications to the examples given
can be made without departing from the scope of the claims.
[0154] Features described in the preceding description may be used in combinations other
than the combinations explicitly described above.
[0155] Although functions have been described with reference to certain features, those
functions may be performable by other features whether described or not.
[0156] Although features have been described with reference to certain examples, those features
may also be present in other examples whether described or not.
[0157] The term 'a' or 'the' is used in this document with an inclusive not an exclusive
meaning. That is any reference to X comprising a/the Y indicates that X may comprise
only one Y or may comprise more than one Y unless the context clearly indicates the
contrary. If it is intended to use 'a' or 'the' with an exclusive meaning then it
will be made clear in the context. In some circumstances the use of 'at least one'
or 'one or more' may be used to emphasis an inclusive meaning but the absence of these
terms should not be taken to infer any exclusive meaning.
[0158] The presence of a feature (or combination of features) in a claim is a reference
to that feature or (combination of features) itself and also to features that achieve
substantially the same technical effect (equivalent features). The equivalent features
include, for example, features that are variants and achieve substantially the same
result in substantially the same way. The equivalent features include, for example,
features that perform substantially the same function, in substantially the same way
to achieve substantially the same result.
[0159] In this description, reference has been made to various examples using adjectives
or adjectival phrases to describe characteristics of the examples. Such a description
of a characteristic in relation to an example indicates that the characteristic is
present in some examples exactly as described and is present in other examples substantially
as described.
[0160] Whilst endeavoring in the foregoing specification to draw attention to those features
believed to be of importance it should be understood that the Applicant may seek protection
via the claims in respect of any patentable feature or combination of features hereinbefore
referred to and/or shown in the drawings whether or not emphasis has been placed thereon.