TECHNICAL FIELD
[0001] The disclosure relates to method of operating a hearing device configured to be worn
at an ear of a user, according to the preamble of claim 1. The disclosure further
relates to a hearing device, according to the preamble of claim 15.
BACKGROUND
[0002] Hearing devices may be used to improve the hearing capability or communication capability
of a user, for instance by compensating a hearing loss of a hearing-impaired user,
in which case the hearing device is commonly referred to as a hearing instrument such
as a hearing aid, or hearing prosthesis. A hearing device may also be used to output
sound based on an audio signal which may be communicated by a wire or wirelessly to
the hearing device. A hearing device may also be used to reproduce a sound in a user's
ear canal detected by an input transducer such as a microphone or a microphone array.
The reproduced sound may be amplified to account for a hearing loss, such as in a
hearing instrument, or may be output without accounting for a hearing loss, for instance
to provide for a faithful reproduction of detected ambient sound and/or to add audio
features of an augmented reality in the reproduced ambient sound, such as in a hearable.
A hearing device may also provide for a situational enhancement of an acoustic scene,
e.g. beamforming and/or active noise cancelling (ANC), with or without amplification
of the reproduced sound. A hearing device may also be implemented as a hearing protection
device, such as an earplug, configured to protect the user's hearing. Different types
of hearing devices configured to be be worn at an ear include earbuds, earphones,
hearables, and hearing instruments such as receiver-in-the-canal (RIC) hearing aids,
behind-the-ear (BTE) hearing aids, in-the-ear (ITE) hearing aids, invisible-in-the-canal
(IIC) hearing aids, completely-in-the-canal (CIC) hearing aids, cochlear implant systems
configured to provide electrical stimulation representative of audio content to a
user, a bimodal hearing system configured to provide both amplification and electrical
stimulation representative of audio content to a user, or any other suitable hearing
prostheses. A hearing system comprising two hearing devices configured to be worn
at different ears of the user is sometimes also referred to as a binaural hearing
device. A hearing system may also comprise a hearing device, e.g., a single monaural
hearing device or a binaural hearing device, and a user device, e.g., a smartphone
and/or a smartwatch, communicatively coupled to the hearing device.
[0003] Hearing devices are often employed in conjunction with communication devices, such
as smartphones or tablets, for instance when listening to sound data processed by
the communication device and/or during a phone conversation operated by the communication
device. More recently, communication devices have been integrated with hearing devices
such that the hearing devices at least partially comprise the functionality of those
communication devices. A hearing system may comprise, for instance, a hearing device
and a communication device.
[0004] In recent times, some hearing devices are also increasingly equipped with different
sensor types. Traditionally, those sensors often include an input transducer to detect
a sound, e.g., a sound detector such as a microphone or a microphone array. An amplified
and/or signal processed version of the detected sound may then be outputted to the
user by an output transducer, e.g., a receiver, loudspeaker, or electrodes to provide
electrical stimulation representative of the outputted signal. In an effort to provide
the user with even more information about himself and/or the ambient environment,
various other sensor types are progressively implemented, in particular sensors which
are not directly related to the sound reproduction and/or amplification function of
the hearing device. Those sensors include inertial sensors, such as accelerometers,
allowing to monitor the user's movements. Physiological sensors, such as optical sensors
and bioelectric sensors, are mostly employed for monitoring the user's health.
[0005] Modern hearing devices provide several features that aim to facilitate speech intelligibility,
improve sound quality, reduce noise level, etc. Many of such sound cleaning features
are designed to benefit the hearing device user's hearing performance in very specific
situations. In order to activate the functionalities only in the situations where
benefit can be expected, an automatic steering system is often implemented which activates
sound cleaning features depending on a combination of, e.g., an acoustic environment
classification, a physical activity classification, a directional classification,
etc.
[0006] To provide for the acoustic environment classification, hearing devices have been
equipped with a sound classifier to classify an ambient sound. An input transducer
can provide an audio signal representative of the ambient sound. The sound classifier
can classify the audio signal allowing to identify different listening situations
by determining a characteristic from the audio signal and assigning the audio signal
to at least one relevant class from a plurality of predetermined classes depending
on the characteristic. Usually, the sound classification does not directly modify
a sound output of the hearing device. Instead, different audio processing instructions
are stored in a memory of the hearing device specifying different audio processing
parameters for a processing of the audio signal, wherein the different classes are
each associated with one of the different audio processing instructions. After assigning
the audio signal to one or more classes, the one or more associated audio processing
instructions are executed. The audio processing parameters specified by the audio
processing instructions can then provide a processing of the audio signal customized
for the particular listening situation corresponding to the at least one class identified
by the classifier. The different listening situations may comprise, for instance,
different classes of listening conditions and/or different classes of sounds. For
example, the different classes may comprise speech and/or nonspeech and/or music and/or
traffic noise and/or other ambient noise.
[0007] The classification may be based on a statistical evaluation of the audio signal,
as disclosed in
EP 3 036 915 B1. More recently, machine learning (ML) algorithms have been employed to classify the
ambient sound. The classifier can be implemented by an artificial intelligence (AI)
chip which may be configured to classify the audio signal by at least one deep neural
network (DNN). The classifier may comprise a sound source separator configured to
separate sound generated by different sound sources, for instance a conversation partner,
passengers passing by the user, vehicles moving in the vicinity of the user such as
cars, airborne traffic such as a helicopter, a sound scene in a restaurant, a sound
scene including road traffic, a sound scene during public transport, a sound scene
in a home environment, and/or the like. Examples of such a sound source separator
are disclosed in international patent application Nos.
PCT/EP 2020/051 734 and
PCT/EP 2020/051 735, and in German patent application No.
DE 2019 206 743.3.
[0008] Another approach would be to mix different features associated with different classes.
To this end, a mixed mode classifier has been proposed in
EP 1 858 292 B1. The mixed mode classifier can attribute one, two or more classes to the audio signal,
wherein the different features in the form of audio processing instructions associated
with the different classes can be mixed in dependence of class similarity factors.
The class similarity factors are indicative of a similarity of the current acoustic
environment with a respective predetermined acoustic environment associated with the
different classes. The mixing of the different audio processing instructions may imply,
e.g., a linear combination of base parameter sets representing the audio processing
instructions associated with the different classes, or other non-linear ways of mixing
the audio processing instructions. The different audio processing instructions may
be provided as sub-functions, which can be included into a transfer function used
by the signal processing circuit according to the desired mixing of the audio processing
instructions. For example, audio processing instructions, e.g., in the form of the
base parameter sets, related to a beamformer and/or a gain model (i.e., an amplification
characteristic) may be mixed depending on whether or to which degree the audio signal
is attributed, e.g., by the class similarity factors, to one or more of the classes
music and/or speech in noise and/or speech.
[0009] EP 2 201 793 B1 discloses a classifier configured for an automatic adaption of the audio processing
instructions associated with the different classes depending on adjustments performed
by the user. Adjustment data indicative of the user adjustments can be logged, e.g.,
stored in a storage unit, and evaluated to learn correction data for correcting the
audio processing instructions. In a mixed mode classifier, for a current sound environment
and depending on the adjustment data, an offset can be learned for the mixed base
parameter sets representing the audio processing instructions associated with the
different classes. For the purpose of learning, correction data may be separately
provided for different classes.
[0010] A rather specific use case of operating a hearing device concerns a faithful reproduction
of sound which is emitted from a localized media source in the user's environment.
In principle, the above described acoustic environment classification could also be
employed to determine whether an audio signal representative of the ambient sound
would include such a media content, e.g., by attributing the audio signal to a dedicated
class characteristic for the media sound. Subsequently, when the audio signal would
be attributed to such a class, at least one audio processing instruction associated
with the class which is optimized for perceiving sound from the localized media source
could be applied to the audio signal.
[0011] A difficulty of such an approach is that media content, which may be presented to
the user by various media sources in his environment, can, in general, vary greatly,
which makes a reliable classification of the environmental sound with regard to such
a media content rather complex and/or challenging. For example, some media sound,
such as a TV program and/or a movie presented at a movie theater, may comprise sound
features typically occurring also in other daily situations of the user, e.g., a single
speech, conversations of other people, traffic sound and/or sound emitted from other
noise sources, and may therefore be hard to distinguish by a classifier whether it
stems from a localized media source or not.
[0012] Another problem arising from such an approach is that, even if such a media content
is present in the user's environment, it remains questionable whether the user would
be interested in following and/or consuming such a content. In particular, initiating
an operation of the hearing device which would be optimized for perceiving the sound
from the localized media source would be mostly desirable when the user is also interested
in the media content.
SUMMARY
[0013] It is an object of the present disclosure to avoid at least one of the above mentioned
disadvantages and to provide for a hearing device functionality allowing an optimized
reproduction of sound emitted by a localized media source, in particular depending
on a presence of such a media source in the user's environment and/or depending on
the user's interest in perceiving such sound. It is another object to provide for
an audio processing automatically accounting for situations in which the user wants
to perceive sound from a localized media source. It is a further object to increase
a reliability of determining situations in which such a media source is present in
the user's environment and/or the user is interested in perceiving such sound. It
is yet another object to provide for an audio processing which is optimized for perceiving
sound from the localized media source. It is a further object to provide a hearing
device which is configured to operate in such a manner.
[0014] At least one of these objects can be achieved by a method of operating a hearing
device configured to be worn at an ear of a user comprising the features of claim
1 and/or a hearing device comprising the features of claim 15. Advantageous embodiments
of the invention are defined by the dependent claims and the following description.
[0015] Accordingly, the present disclosure proposes a method of operating a hearing device
configured to be worn at an ear of a user, the method comprising
- receiving, from an input transducer included in the hearing device, an audio signal
indicative of a sound detected in the environment of the user;
- receiving, from a displacement sensor included in the hearing device, displacement
data indicative of a displacement of the hearing device;
- processing the audio signal by applying one or more audio processing instructions
to the audio signal;
- initiating outputting, by an output transducer included in the hearing device, an
output audio signal based on the processed audio signal so as to stimulate the user's
hearing;
- determining, based on the audio signal and the displacement data, whether the user
is interested in perceiving sound from a media source localized in the environment;
and, when it is determined that the user is interested,
- initiating an operation of the hearing device optimizing the processing of the audio
signal for perceiving sound from the localized media source.
[0016] In this way, by employing information in the audio signal and the displacement data
to determine whether the user is interested in perceiving sound from the localized
media source, the hearing device operation optimized for perceiving sound from the
localized media source can be evoked in a more reliable way and/or can be better attuned
to the user's individual needs and/or sound properties of the media content. In particular,
an interest of the user in the media content may thus not be presumed based on solely
determining a presence of such a media source in the user's environment, e.g., based
on the audio signal, but rather on indications contained in the audio signal and/or
displacement data of the user's intention and/or preference to engage in such a content.
For instance, the operation optimized for perceiving sound from the localized media
source may thus be automatically activated depending the determined user interest
facilitating an interaction of the user with the hearing device.
[0017] Independently, the present disclosure also proposes a non-transitory computer-readable
medium storing instructions that, when executed by a processor, cause a hearing device
to perform operations of the method.
[0018] Independently, the present disclosure also proposes a hearing device configured to
be worn an ear of a user, the hearing device comprising
- an input transducer configured to provide an audio signal indicative of a sound detected
in the environment of the user;
- a displacement sensor configured to provide displacement data indicative of a displacement
of the hearing device;
- a processor configured to process the audio signal by applying one or more audio processing
instructions to the audio signal; and
- an output transducer configured to output an output audio signal based on the processed
audio signal so as to stimulate the user's hearing, wherein the processor is further
configured to
- determine, based on the audio signal and the displacement data, whether the user is
interested in perceiving sound from a media source localized in the environment; and,
when it is determined that the user is interested; and
- initiate an operation of the hearing device optimizing said processing of the audio
signal for perceiving sound from the localized media source.
[0019] Subsequently, additional features of some implementations of the method of operating
a hearing device and/or the computer-readable medium and/or the hearing device are
described. Each of those features can be provided solely or in combination with at
least another feature. The features can be correspondingly provided in some implementations
of the method and/or the hearing device.
[0020] In some implementations, the method further comprises
- determining, based on the audio signal, a first parameter indicative of a property
of the sound detected in the environment;
- determining, based on the displacement data, a second parameter indicative of a property
of movements performed by the user, wherein said determining whether the user is interested
in perceiving sound from the localized media source is based on the first parameter
and the second parameter.
[0021] In some implementations, the first parameter is indicative of a variability of the
sound detected in the environment, wherein said determining whether the user is interested
in perceiving sound from the localized media source includes a condition that the
first parameter exceeds a threshold. In some instances, the variability of the sound
may be determined with respect to a variability of at least one sound content, e.g.,
sound type, encoded in the audio signal and/or a level and/or a frequency and/or a
number of onsets and/or a direction of arrival (DOA) of the audio signal. In some
instances, the sound content may be characteristic for sound which is typical for
one or more acoustic objects emitting the sound. The variability of the sound content
may then be characteristic for an amount by which sound typical for one or more acoustic
objects varies.
[0022] In some implementations, the second parameter is indicative of an amplitude and/or
an amount and/or a variability of the movements performed by the user, wherein said
determining whether the user is interested in perceiving sound from the localized
media source includes a condition that the second parameter falls below a threshold.
[0023] In some implementations, the first parameter is indicative of a sound content encoded
in the audio signal, and said determining whether the user is interested in perceiving
sound from the localized media source includes a condition that the first parameter
is characteristic of a predetermined media sound.
[0024] In some implementations, the second parameter is indicative of a movement behavior
of the user over time, e.g., a type and/or sequence and/or lack of movements performed
by the user over time, and said determining whether the user is interested in perceiving
sound from the localized media source includes a condition that the second parameter
is characteristic of a predetermined movement pattern.
[0025] In some implementations, the method further comprises classifying the audio signal
by attributing at least one class from a plurality of predetermined classes to the
audio signal, wherein said determining whether the user is interested in perceiving
sound from the localized media source is based on the at least one class attributed
to the audio signal. In some implementations, the first parameter is indicative of
a variability, e.g., alteration over time, of the at least one class attributed to
the audio signal and/or whether the at least one class attributed to the audio signal
is characteristic of a predetermined media sound.
[0026] In some implementations, the method further comprises classifying the displacement
data by attributing at least one class from a plurality of predetermined classes to
the displacement data, wherein said determining whether the user is interested in
perceiving sound from the localized media source is based on the at least one class
attributed to the displacement data. In some implementations, the second parameter
is indicative of a variability of the at least one class attributed to the displacement
data and/or whether the at least one class attributed to the displacement data is
characteristic of a predetermined movement pattern of the user when focusing his attention
to the localized media source.
[0027] In some implementations, the method further comprises
- receiving, from an environmental sensor, environmental sensor data indicative of a
property of the environment; and/or
- receiving, from a physiological sensor, physiological sensor data indicative of a
physiological property of the user; and/or
- receiving, from a user interface, user interaction data indicative of an interaction
of the user with the user interface; and/or
- receiving, from a location sensor, location data indicative of a current location
of the user; and/or
- receiving, from a clock, time data indicative of a current time,
wherein said determining whether the user is interested in perceiving sound from the
media source is also based on the environmental sensor data and/or physiological sensor
data and/or user interaction data and/or location data and/or time data.
[0028] In some implementations, the method further comprises
- classifying the audio signal by attributing at least one class from a plurality of
predetermined classes to the audio signal, wherein different audio processing instructions
are associated with different classes; and
- processing the audio signal by applying the audio processing instruction associated
with the class attributed to the audio signal.
[0029] In some implementations, the operation of the hearing device optimizing the processing
of the audio signal comprises
- reducing a rate at which different audio processing instructions are applied to the
audio signal; and/or
- disabling applying of at least one audio processing instruction which is unsuitable
for perceiving sound from the localized media source to the audio signal; and/or
- applying at least one audio processing instruction optimized for perceiving sound
from the localized media source to the audio signal.
[0030] In some implementations, the audio processing instruction optimized for perceiving
sound from the localized media source comprises
- enhancing an intelligibility of speech encoded in the audio signal; and/or
- enhancing a quality of sound encoded in the audio signal; and/or
- enhancing sound from the media source encoded in the audio signal relative to other
environmental sound encoded in the audio signal; and/or
- separating sound from the media source encoded in the audio signal from other sound
encoded in the audio signal.
[0031] In some implementations, the method further comprises
- determining a degree of correlation between the audio signal and the displacement
data; and/or
- determining a variability of a degree of correlation between the audio signal and
the displacement data,
wherein said determining whether the user is interested in perceiving sound from the
localized media source is based on the degree of correlation and/or the variability
of the degree of correlation.
[0032] In some implementations, the method of further comprises
- determining whether the audio signal contains an own voice of the user, and, when
the audio signal contains the user's own voice,
- disabling the operation of the hearing device optimized for perceiving the sound from
the localized media source.
[0033] In some implementations, the localized media source is configured to provide for
a visual media content. In some implementations, the localized media source comprises
a screen for displaying the visual content. E.g., the media source may comprise a
television and/or a screen in a movie theater. In some implementations, the operation
of the hearing device is optimized for perceiving the sound of a television program
and/or a movie shown in a movie theater.
[0034] In some implementations, the method further comprises
- determining whether the audio signal contains an own voice of the user, and, when
the audio signal contains the user's own voice,
- disabling the operation of the hearing device optimized for perceiving the sound from
the localized media source.
[0035] In some implementations, the method further comprises
- receiving, from a user interface, a user command indicative of whether the user accepts
or rejects the operation of the hearing device, wherein the operation optimizing the
processing of the audio signal is initiated depending on the user command.
[0036] In some implementations, the audio signal and/or the first parameter and the displacement
data and/or the second parameter are input into a machine learning (ML) algorithm,
which outputs, e.g., a probability and/or likelihood, whether the user is interested
in perceiving sound from the localized media source, wherein the ML algorithm has
been trained with previous audio signals and/or first parameters and associated displacement
data and/or second parameters. E.g., the operation of the hearing device optimizing
said processing of the audio signal for perceiving sound from the localized media
source may then be initiated when the probability and/or likelihood exceeds a threshold.
[0037] In some implementations, the variability of the sound and/or movements and/or at
least one class attributed to the audio signal may be defined as a temporal variability,
e.g., alteration over time, of the sound and/or movements and/or class, in particular
an amount by which the sound and/or movements and/or class varies over time.
BRIEF DESCRIPTION OF THE DRAWINGS
[0038] Reference will now be made in detail to embodiments, examples of which are illustrated
in the accompanying drawings. The drawings illustrate various embodiments and are
a part of the specification. The illustrated embodiments are merely examples and do
not limit the scope of the disclosure. Throughout the drawings, identical or similar
reference numbers designate identical or similar elements. In the drawings:
- Fig. 1
- schematically illustrates an exemplary hearing device;
- Fig. 2
- schematically illustrates an exemplary sensor unit comprising one or more sensors
which may be implemented in the hearing device illustrated in Fig. 1;
- Fig. 3
- schematically illustrates an embodiment of the hearing device illustrated in Fig.
1 as a RIC hearing aid;
- Fig. 4
- schematically illustrates a media source localized in an environment of a user;
- Fig. 5
- schematically illustrates an exemplary algorithm of processing an audio signal according
to principles described herein; and
- Figs. 6, 7
- schematically illustrate some exemplary methods of processing an audio signal according
to principles described herein.
DETAILED DESCRIPTION OF THE DRAWINGS
[0039] FIG. 1 illustrates an exemplary hearing device 110 configured to be worn at an ear
of a user. Hearing device 110 may be implemented by any type of hearing device configured
to enable or enhance hearing or a listening experience of a user wearing hearing device
110. For example, hearing device 110 may be implemented by a hearing aid configured
to provide an amplified version of audio content to a user, a sound processor included
in a cochlear implant system configured to provide electrical stimulation representative
of audio content to a user, a sound processor included in a bimodal hearing system
configured to provide both amplification and electrical stimulation representative
of audio content to a user, or any other suitable hearing prosthesis, or an earbud
or an earphone or a hearable.
[0040] Different types of hearing device 110 can also be distinguished by the position at
which they are worn at the ear. Some hearing devices, such as behind-the-ear (BTE)
hearing aids and receiver-in-the-canal (RIC) hearing aids, typically comprise an earpiece
configured to be at least partially inserted into an ear canal of the ear, and an
additional housing configured to be worn at a wearing position outside the ear canal,
in particular behind the ear of the user. Some other hearing devices, as for instance
earbuds, earphones, hearables, in-the-ear (ITE) hearing aids, invisible-in-the-canal
(IIC) hearing aids, and completely-in-the-canal (CIC) hearing aids, commonly comprise
such an earpiece to be worn at least partially inside the ear canal without an additional
housing for wearing at the different ear position.
[0041] As shown, hearing device 110 includes a processor 112 communicatively coupled to
a memory 113, an input transducer 115, a displacement sensor 125, and an output transducer
117. A sensor unit 120 comprises input transducer 115 and displacement sensor 125.
Hearing device 110 may include additional or alternative components as may serve a
particular implementation.
[0042] Input transducer 115 may be implemented by any suitable device configured to detect
sound in the environment of the user and to provide an input audio signal indicative
of the detected sound, e.g., a microphone or a microphone array. Output transducer
117 may be implemented by any suitable audio transducer configured to output an output
audio signal to the user, for instance a receiver of a hearing aid, an output electrode
of a cochlear implant system, or a loudspeaker of an earbud.
[0043] Displacement sensor 125 may be implemented by any suitable detector configured to
provide displacement data indicative of a rotational displacement and/or a translational
displacement of hearing device 110. The displacement data may then also indicate a
corresponding movement of the user wearing hearing device 110. Examples include a
rotation of the user's head and/or body, a walking activity of the user and/or any
translational movement of the user's body and/or head and/or the like. In particular,
displacement sensor 125 may comprise at least one inertial sensor. The inertial sensor
can include, for instance, an accelerometer configured to provide the displacement
data representative of an acceleration and/or a translational movement and/or a rotation,
and/or a gyroscope configured to provide the displacement data representative of a
rotation. Displacement sensor 125 may also comprise an optical detector such as a
light sensor and/or a camera. Examples include a charged-coupled-device (CCD) sensor,
a photodetector sensitive for light in the red and/or infrared electromagnetic spectrum,
a photoplethysmography (PPG) sensor, a pulse oximeter including a photodetector for
determining an oxygen saturation (SpO
2 level) of the user's blood, and/or the like. In some instances, the optical detector
may be implemented to be disposed inside the ear canal and/or at the concha of the
ear when the hearing device is worn at the ear. Displacement sensor 125 may also comprise
an electronic compass such as a magnetometer configured to provide the displacement
data representative of a change of a magnet field, in particular a magnetic field
in an ambient environment of hearing device 110 such as the Earth's magnetic field.
[0044] Processor 112 is configured to receive, from input transducer 115, an input audio
signal indicative of a sound detected in the environment of the user; to receive,
from displacement sensor 125 displacement data indicative of a displacement of the
hearing device; to process the audio signal; to initiate outputting, by output transducer
117, an output audio signal based on the processed audio signal so as to stimulate
the user's hearing; to determine, based on the audio signal and the displacement data,
whether the user is interested in perceiving sound from a localized media source;
and, when it is determined that the user is interested, to initiate an operation of
hearing device 110 optimizing the processing of the audio signal for perceiving sound
from the localized media source. These and other operations, which may be performed
by processor 112, are described in more detail in the description that follows.
[0045] Memory 113 may be implemented by any suitable type of storage medium and is configured
to maintain, e.g. store, data controlled by processor 112, in particular data generated,
accessed, modified and/or otherwise used by processor 112. For example, memory 113
may be configured to store instructions used by processor 112 to process the input
audio signal received from input transducer 115, e.g., audio processing instructions
in the form of one or more audio processing programs. The audio processing programs
may comprise different audio processing instructions of modifying the input audio
signal received from input transducer 115. For instance, the audio processing instructions
may include algorithms providing a gain model, noise cleaning, noise cancelling, wind
noise cancelling, reverberation cancelling, narrowband coupling, beamforming, in particular
static and/or adaptive beamforming, and/or the like.
[0046] As another example, memory 113 may be configured to store instructions used by processor
112 to classify the input audio signal received from input transducer 115 by attributing
at least one class from a plurality of predetermined sound classes to the input audio
signal. Exemplary classes may include, but are not limited to, low ambient noise,
high ambient noise, traffic noise, music, machine noise, babble noise, public area
noise, background noise, speech, nonspeech, speech in quiet, speech in babble, speech
in noise, speech from the user, speech from a significant other, background speech,
speech from multiple sources, quiet indoor, quiet outdoor, speech in a car, speech
in traffic, speech in a reverberating environment, speech in wind noise, speech in
a lounge, car noise, applause, music, e.g. classical music, and/or the like. In some
instances, the different audio processing instructions can be associated with different
classes.
[0047] As another example, memory 113 may be configured to store instructions used by processor
112 to classify the displacement data received from displacement sensor 125 by attributing
at least one class from a plurality of predetermined displacement classes to the movement
data. Exemplary classes may include at least one of movements of the user indicating
an interest of the user in perceiving sound from a localized media source, movements
indicating an absence and/or lack of such an interest, movements indicating the user
is moving in a certain direction, rotating his head, resting, sitting, laying and/or
lying down, walking, running, dancing and/or the like.
[0048] Memory 113 may comprise a non-volatile memory from which the maintained data may
be retrieved even after having been power cycled, for instance a flash memory and/or
a read only memory (ROM) chip such as an electrically erasable programmable ROM (EEPROM).
A non-transitory computer-readable medium may thus be implemented by memory 113. Memory
113 may further comprise a volatile memory, for instance a static or dynamic random
access memory (RAM).
[0049] As illustrated, hearing device 110 may further comprise a communication port 119.
Communication port 119 may be implemented by any suitable data transmitter and/or
data receiver and/or data transducer configured to exchange data with another device.
For instance, the other device may be another hearing device configured to be worn
at the other ear of the user than hearing device 110 and/or a communication device
such as a smartphone, smartwatch, tablet and/or the like. Communication port 119 may
be configured for wired and/or wireless data communication. For instance, data may
be communicated in accordance with a Bluetooth
™ protocol and/or by any other type of radio frequency (RF) communication.
[0050] Hearing device 110 may also comprise at least one further sensor communicatively
coupled to processor 112 in addition to input transducer 115 and displacement sensor
125, as further described below.
[0051] FIG. 2 illustrates a sensor unit 130, which may be implemented in hearing device
110 place of sensor unit 120, comprising input transducer 115 and displacement sensor
125. Sensor unit 130 may further include at least one environmental sensor configured
to provide environmental data indicative of a property of the environment of the user
in addition to input transducer 115, for example a barometric sensor 131 and/or an
ambient temperature sensor 132. Sensor unit 120 may further include at least one physiological
sensor configured to provide physiological data indicative of a physiological property
of the user, for example an optical sensor 133 and/or a bioelectric sensor 134 and/or
a body temperature sensor 135. Optical sensor 133 may be configured to emit the light
at a wavelength absorbable by an analyte contained in blood such that the physiological
sensor data comprises information about the blood flowing through tissue at the ear.
E.g., optical sensor 133 can be configured as a photoplethysmography (PPG) sensor
such that the physiological sensor data comprises PPG data, e.g. a PPG waveform. Bioelectric
sensor 134 may be implemented as a skin impedance sensor and/or an electrocardiogram
(ECG) sensor and/or an electroencephalogram (EEG) sensor and/or an electrooculography
(EOG) sensor. Sensor unit 130 may include a user interface 137 configured to provide
interaction data indicative of an interaction of the user with hearing device 110,
e.g., a touch sensor and/or a push button. Sensor unit 130 may include at least one
location sensor 138 configured to provide location data indicative of a current location
of the user, for instance a GPS sensor. Sensor unit 130 may include a clock 139 configured
to provide time data indicative of a current time.
[0052] In some implementations, the sensor data may be received by processor 112 from an
external device via communication port 119, e.g., from a communication device and/or
a portable device such as a smartphone. E.g., one or more of sensors 131 - 135, 137
- 139 may then be included in the external device. The external device may also include
a further displacement sensor configured to provide displacement data and/or a further
input transducer configured to provide an audio signal indicative of a sound detected
in the environment of the user.
[0053] FIG. 3 illustrates an exemplary implementation of hearing device 110 as a RIC hearing
aid 210. RIC hearing aid 210 comprises a BTE part 220 configured to be worn at an
ear at a wearing position behind the ear, and an ITE part 240 configured to be worn
at the ear at a wearing position at least partially inside an ear canal of the ear.
BTE part 220 comprises a BTE housing 221 configured to be worn behind the ear. BTE
housing 221 accommodates processor 112 communicatively coupled to input transducer
115 and displacement sensor 125. BTE part 220 further includes a battery 227 as a
power source. ITE part 240 is an earpiece comprising an ITE housing 241 at least partially
insertable in the ear canal. ITE housing 241 accommodates output transducer 117 and
may also include another sensor 245, which may include, e.g., another input transducer,
such as an ear canal microphone, and/or another displacement sensor and/or any of
further sensors 131 - 135, 137 - 139. BTE part 220 and ITE part 240 are interconnected
by a cable 251. Processor 112 is communicatively coupled to output transducer 117
and sensor 245 of ITE part 240 via cable 251 and cable connectors 252, 253 provided
at BTE housing 221 and ITE housing 241.
[0054] FIG. 4 schematically illustrates a media source 410 localized in an environment of
a user 450 which may be interested in perceiving sound from media source 410 or may
be disinterested. Localized media source 410 comprises at least one sound source 411
emitting sound 413 in a direction 415 toward user 450. As also illustrated, media
source 410 may comprise at least another sound source 412 also emitting sound 414
in a direction 416 toward user 450 which may be different from direction 415. A media
boundary 421 of media source 410 may be defined as a region in which media content
is presented to user 450. The media content comprises sound 413, 414 emitted from
sound sources 411, 412. The media content may further comprise visual content which
can be seen by user 450.
[0055] For example, the media content may be a TV program. Localized media source 410 may
then be implemented as a television comprising a screen and sound source 411, 412,
e.g., a loudspeaker, at a fixed position relative to a ground on which user 450 is
positioned and/or within boundary 421. As another example, the media content may be
a movie shown in a movie theater. Localized media source 410 may then comprise a movie
screen on which the movie is displayed and sound source 411, 412 as a loudspeaker
at a fixed position relative to the ground on which user 450 is positioned and/or
a fixed position within boundary 421. As another example, the media content may be
a play performed in a theater or an opera performed in an opera house or a concert
performed at a concert venue. Localized media source 410 may then comprise a stage
on which the play or opera or concert is performed and sound source 411, 412 may be
implemented as a voice and/or a music instrument of an actor or singer performing
the play or opera or concert. Sound source 411, 412 may then be positioned at a fixed
position and/or moving relative to the ground on which user 450 is positioned and/or
within boundary 421.
[0056] User 450 is wearing a hearing device 451 at an ear which may be implemented, for
instance, as hearing device 110 or hearing device 210. As also illustrated, user 450
may be wearing a second hearing device 452 at a second ear which may be implemented,
for instance, corresponding to first hearing device 451 worn at first ear. Hearing
devices 451, 452 may thus be provided in a binaural configuration. Sound 413, 414
emitted by sound source 411, 412 can be received and/or detected by input transducer
115 included in hearing device 451, 452. A movement and/or orientation of hearing
device 451, 452 can be detected by displacement sensor 125 included in hearing device
451, 452. The movement and/or orientation of hearing device 451, 452 worn by user
450 can thus be indicative of a movement and/or orientation of user 450 and/or a viewing
direction 461 of user 450.
[0057] The displacement, e.g., movement, of hearing device 451, 452 may be detected and/or
determined and/or logged, e.g., by displacement sensor 125 and/or processor 112 and/or
memory 113, over time and/or relative to a reference point and/or relative to a reference
direction. In some implementations, the reference point and/or reference direction
may be arbitrary fixed relative to a position and/or orientation of hearing device
451, 452. In some implementations, the reference point and/or reference direction
may be a location and/or direction at which localized media source 410, e.g., sound
source 411, 412, is positioned, or relative to this location and/or direction. To
this end, a location of sound source 411, 412 and/or a direction of arrival (DOA)
of sound 413, 414 emitted by sound source 411, 412 may be determined beforehand. E.g.,
the audio signal provided by input transducer 115 may be evaluated to determine the
location of sound source 411, 412 and/or the DOA of sound 413, 414.
[0058] A displacement of hearing device 451, 452 may indicate a translational displacement
465 and/or a rotational displacement 466 of user 450, in particular of the user's
head and/or viewing direction 461 and/or body. In a case in which user 450 is interested
in perceiving sound 413, 414 from media source 410, user 450 may exert a different
movement behavior as compared to a case in which user 450 is not interested. Accordingly,
displacement data provided by displacement sensor 125 may be employed as an indicator
whether user 450 is interested or not. Based on the displacement data, a parameter
indicative of a property of movements performed by the user may be determined as an
indicator whether user 450 is interested or not.
[0059] To illustrate, in a case in which media source 410 is configured to also provide
for a visual media content, e.g., on a screen or a stage, and in a situation in which
user 450 is interested in following the media content from media source 410 including
perceiving sound 413, 414 from media source 410, viewing direction 461 of user 450
may be typically more fixed toward the visual media content as compared to a situation
in which user 450 is not interested. As another example, in a case in which media
source 410 is not providing for visual media content and in a situation in which user
450 is interested in perceiving sound 413, 414 from media source 410, the head of
user 450 may be typically more fixed in a certain direction, e.g., to keep his ears
and/or hearing device 451, 452 in a stable position, as compared to a situation in
which user 450 is not interested. Accordingly, in the situation in which user 450
is interested, displacement data provided by displacement sensor 125 may be indicative
of a lower amplitude and/or a lower amount and/or a smaller temporal variability of
movements performed by the user as compared to the situation in which user 450 is
not interested. Furthermore, e.g., when media source 410 is providing for visual media
content, displacement data provided by displacement sensor 125 may be indicative of
viewing direction 461 pointing more and/or longer and/or more frequent toward media
source 410 as compared to the situation in which user 450 is not interested.
[0060] Furthermore, a prerequisite for user 450 perceiving sound 413, 414 from media source
410 in which he could be interested is that sound detected in the environment comprises
such sound 413, 414. Based on the audio signal provided by input transducer 115, a
parameter indicative of a property of sound detected in the environment may thus be
determined as another indicator whether user 450 could be interested in perceiving
sound 413, 414 from media source 410 or not. In particular, the parameter may be indicative
of a property which is typical and/or characteristic for sound 413, 414 emitted by
media source 410. In some examples, such a property may comprise a level and/or a
temporal variability and/or a sound content and/or a frequency content and/or a signal
to noise ratio (SNR) of the sound detected in the environment.
[0061] To illustrate, at least certain types of media content which may be presented by
media source 410, e.g., a TV program and/or a movie, may exhibit a rather high variability
of sound 413, 414. For instance, a content of speech and/or music and/or traffic noise
in sound 413, 414 may alter at a rather high rate as compared to other sound in the
user's environment unrelated to the media content. Furthermore, a level and/or content
and/or SNR of sound 413, 414 may also alter at an extraordinarily high rate when compared
to other sound in the user's environment. In particular, the circumstance may be exploited
that a variability of sound in the user's environment other than media content is
typically rather low and/or stable due to the fact that sound occurring in a normal
daily routine of user 450 is typically more constant and doesn't change as fast as
media content.
[0062] As another example, the media content presented by media source 410 may comprise
sound 413, 414 having a level rather unusual as compared to other sound in the user's
environment. E.g., a conversation between people may be exhibit a rather low or high
volume when compared to a typical conversation occurring in the other environment
of the user. As another example, the media content presented by media source 410 may
comprise sound 413, 414 having an SNR rather unusual as compared to other sound in
the user's environment. E.g., an SNR of media content may be untypically high when
compared to a typical SNR in the user's environment.
[0063] Thus, in order to predict whether user 450 would be interested in perceiving sound
414, 415 from a media source 410 localized in the environment of user 450, a first
parameter indicative of a property of the sound detected in the environment and a
second parameter indicative of a property of movements performed by user 450 may be
compared relative to a threshold. Depending on the comparison, the user's interest
may be confirmed or rejected. E.g., when the first parameter is indicative of a variability
of the sound detected in the environment and the second parameter is indicative of
an amplitude and/or an amount and/or a variability of the movements performed by the
user, the user's interest may be identified when the first parameter exceeds a first
threshold and the second parameter falls below a second threshold. In a contrary case,
when the first parameter falls below the first threshold and/or the second parameter
exceeds the second threshold, the user's interest may be denied.
[0064] FIG. 5 illustrates a functional block diagram of an exemplary audio signal processing
algorithm that may be executed by a processor 310. For instance, processor 310 may
comprise processor 112 of hearing device 110 and/or another processor communicatively
coupled to processor 112. As shown, the algorithm is configured to be applied to an
input audio signal 302 indicative of a sound detected in the environment of the user,
which may be provided by input transducer 115. After a processing of audio signal
302, the algorithm provides a processed audio signal based on which an output audio
signal 305 can be outputted by output transducer 117. Furthermore, the algorithm is
configured to be applied to displacement data 303 indicative of a displacement of
the hearing device, which may be provided by displacement sensor 125. The algorithm
may also be configured to be applied to further sensor data 304, which may be provided
by any of sensors 115, 131 - 135, 137 - 139.
[0065] The algorithm comprises an audio signal analyzing module 318, a sensor data analyzing
module 319, a media interest determination module 315, a processing instruction selection
module 317, and an audio processing module 313. Audio signal 302 can be received by
audio processing module 313. Audio processing module 313 is configured to process
audio signal 302, e.g., based on one or more audio processing instructions provided
by processing instruction selection module 317.
[0066] Audio signal 302 can be received by audio signal analyzing module 318. Audio signal
analyzer 318 is configured to determine, based on audio signal 302, a first parameter
indicative of a property of the sound detected in the environment. In particular,
as described above, the parameter may be determined as an indicator whether user 450
could be interested in perceiving sound from a media source localized in the environment.
For example, the parameter may be indicative of a temporal variability of the sound
detected in the environment. To illustrate, some media sources, e.g., a television
program and/or a movie shown in a movie theater and/or a theater play, may emit sound
at a large variability of the sound. For instance, the emitted sound may change rather
frequently between different sound types and/or sound contents. Different sound types
and/or sound content may be defined as sound, when perceived by a human, is typically
associated by the human with different acoustic objects emitting the sound. E.g.,
in a television program and/or a movie, a plurality of such acoustic objects may be
presented to the user within a rather short time leading to a rather large variability
of the sound emitted by the media source. Examples of such different sound types and/or
sound content, which may be emitted by a media source, include speech and/or music
and/or environmental sound and/or background noise and/or sound special effects and/or
silence. Similarly, a level and/or frequency content and/or a number of onsets and/or,
when sound is emitted by the media source from a plurality of sound sources 411, 412,
a DOA of the sound at hearing device 110, 210 may change rather frequently.
[0067] As another example, the parameter may be indicative of a level and/or a content and/or
an SNR of the sound detected in the environment. Other examples may include, but are
not limited to, a mean-squared signal power, a standard deviation of a signal envelope,
a mel-frequency cepstrum (MFC), a mel-frequency cepstrum coefficient (MFCC), a delta
mel-frequency cepstrum coefficient (delta MFCC), a spectral centroid such as a power
spectrum centroid, a standard deviation of the centroid, a spectral entropy such as
a power spectrum entropy, a zero crossing rate (ZCR), a standard deviation of the
ZCR, a broadband envelope correlation lag and/or peak, and a four-band envelope correlation
lag and/or peak.
[0068] In some implementations, audio signal analyzing module 318 comprises an audio signal
classifier. The audio signal classifier can be configured to classify audio signal
302 by attributing at least one class from a plurality of predetermined classes to
audio signal 302. The first parameter may be indicative of a temporal variability
of the one or more classes attributed to the audio signal and/or contain information
about the one or more classes attributed to audio signal 302, e.g., whether a class
unrelated to the media sound and/or a class associated with the media sound has been
attributed. To illustrate, when a localized media source emits sound at a rather large
variability, the one or more classes attributed to the audio signal may change rather
often. Thus, a variability of the one or more classes attributed to audio signal 302
over time can indicate a sound stemming from a media source, and thus a presence of
the media source, localized in the environment of the user. For instance, the audio
signal classifier may be implemented as a sound classification module configured for
a statistical evaluation of audio signal 302 as disclosed, e.g., in
EP 3 036 915 B1, and/or a mixed mode classifier as disclosed, e.g., in
EP 1 858 292 B1, and/or a sound source separator configured to separate sound generated by different
sound sources in the environment, as disclosed, e.g., in
PCT/EP 2020/051 734,
PCT/EP 2020/051 735 and
DE 2019 206 743.3, which may comprise one or more neural networks (NNs).
[0069] The classes may represent a specific sound content and/or sound type encoded in audio
signal 302. Exemplary classes include, but are not limited to, low ambient noise,
high ambient noise, traffic noise, machine noise, babble noise, public area noise,
background noise, speech, nonspeech, speech in quiet, speech in babble, speech in
noise, speech from the user, own voice of the user, speech from a significant other,
background speech, speech from multiple sources, quiet indoor, quiet outdoor, speech
in a car, speech in traffic, speech in a reverberating environment, speech in wind
noise, speech in a lounge, car noise, applause, music, e.g. classical music, and/or
the like. Information about the classes may be stored in a database, e.g., in memory
113, and accessed by audio signal analyzer 318. E.g., the information may comprise
different patterns associated with each class wherein it is determined whether audio
signal 302, in particular characteristics and/or features determined from audio signal
302, matches, at least to a certain extent, the respective pattern such that the respective
class can be attributed to the audio signal 302. E.g., a probability may be determined
whether the respective pattern associated with the respective class matches the characteristics
and/or features determined from audio signal 302, wherein the respective class may
be attributed to audio signal 302 when the probability exceeds a threshold. In some
instances, at least one of the classes may indicate whether audio signal 302 contains
sound from a localized media source, e.g., as a precondition for the user being interested
in perceiving the media sound, and/or at least one of the classes may indicate whether
audio signal 302 does not contain such sound.
[0070] In some implementations, as illustrated, the parameter determined by audio signal
analyzing module 318 can be received by processing instruction selection module 317.
Processing instruction selector 317 may select one or more audio processing instructions
depending on the parameter which can then be applied to audio signal 302 by audio
processing module 313. In some instances, when audio signal analyzing module 318 comprises
an audio signal classifier, one or more of the audio processing instructions may be
associated with at least one respective class, or a plurality of respective classes.
For example, the audio processing instructions may be stored in a database, e.g.,
in memory 113, and accessed by processing instruction selector 317 and/or audio signal
processor 313. For instance, the audio processing instructions may be implemented
as different audio processing programs which can be executed by audio signal processing
module 313. The audio processing instructions may include, e.g., instructions executable
by processor 310 providing for at least one of a gain model (GM), noise cancelling
(NC), wind noise cancelling (WNC), reverberation cancelling (RevC), narrowband coupling,
feedback cancelling (FC), speech enhancement (SE), noise cleaning, beamforming (BF),
in particular static and/or adaptive beamforming, and/or the like.
[0071] In some instances, the audio processing instructions may also include one or more
instructions optimized for perceiving sound from a localized media source by the user.
For example, the at least one audio processing instruction may provide for a separation
of sound from the localized media source from other sound features contained in audio
signal 302 such that the separated sound from the localized media source can be presented
to user 450 via output transducer 117. As another example, the audio processing instruction
may provide for enhancing of the media sound encoded in audio signal 302 relative
to other environmental sound encoded in audio signal 302. E.g., the audio processing
instruction may provide for noise reduction of the other environmental sound. As another
example, the audio processing instruction may provide for enhancing an intelligibility
of speech encoded in the audio signal 302. As another example, the audio processing
instructions may provide for enhancing a quality of sound encoded in the audio signal
302. For instance, the quality of sound may be improved with regard to a clarity of
the sound, e.g., be increasing a sharpness of the sound, and/or with regard to a listening
comfort of the sound, e.g., by modifying the sound to be more pleasing and/or less
aggressive.
[0072] Displacement data 303 can be received by sensor data analyzing module 319. Sensor
data analyzer 319 is configured to determine, based on displacement data 303, a second
parameter indicative of a property of movements performed by the user. In particular,
as described above, the parameter may be determined as another indicator whether user
450 could be interested in perceiving sound from a media source localized in the environment.
e.g., an amplitude and/or an amount and/or a temporal variability of movements performed
by the user. As another example, the parameter may be indicative of a viewing direction
of user relative to the location of the media source. E.g., the location of the media
source may be determined based on a DOA of the sound of the media source contained
in audio signal 302.
[0073] In some implementations, sensor data analyzing module 319 comprises a displacement
data classifier. The displacement data classifier can be configured to classify displacement
data 303 by attributing at least one class from a plurality of predetermined classes
to displacement data 303. The classes may represent a specific movement pattern performed
by the user. Exemplary classes include, but are not limited to, the user sitting,
lying, walking, running, turning his head, shaking his head, orienting his head in
a specific direction, moving in a specific direction, moving steady, moving irregularly,
being in a sedentary position, being restless, and/or the like. Information about
the classes may be stored in a database, e.g., in memory 113, and accessed by sensor
data analyzer 319. E.g., a probability may be determined whether the respective pattern
associated with the respective class matches a characteristic and/or feature determined
from displacement data 303, wherein the respective class may be attributed to displacement
data 303 when the probability exceeds a threshold.
[0074] In some instances, at least one of the classes may indicate whether displacement
data 303 is typical for the user being interested in perceiving sound from a localized
media source and/or at least one of the classes may indicate whether the displacement
data is typical for the user not being interested. E.g., a rather motionless behavior
of the user and/or movements of small amplitude and/or a head orientation toward a
media source may be attributed to the class of the user being interested. Rather frequent
movements and/or movements of large amplitude and/or a large amount of movements may
be attributed to the class of the user not being interested.
[0075] In some instances, sensor data analyzing module 319 can be configured to determine,
based on displacement data 303, a movement behavior of the user over time. E.g., the
movement behavior may include a type and/or sequence and/or rate and/or amplitude
and/or duration and/or lack of movements performed by the user. The second parameter
may then be indicative of the user's movement behavior. For example, sensor data analyzing
module 319 may be configured to log displacement data 303 over time and to extract
a type and/or sequence and/or lack of movements performed by the user over time from
the logged displacement data 303. As another example, sensor data analyzing module
319 may be configured to determine a type and/or sequence and/or lack of movements
performed by the user from currently received displacement data 303 and to log the
determined movement characteristics over time. E.g., for the purpose of the data logging
of the user's movement behavior over time, the displacement data 303 and/or movement
characteristics may be stored and/or accessed in memory 113.
[0076] In some implementations, further sensor data 304, which may be provided, e.g., by
any of sensors 115, 131 - 135, 137 - 139, may also be received by sensor data analyzing
module 319. Sensor data analyzer 319 may then be configured to determine, based on
sensor data 304, a third parameter. For example, sensor data 304 may be provided by
any of environmental sensors 115, 131, 132. The parameter may then be indicative of
whether the environment of the user is suitable and/or typical for perceiving sound
from a media source localized in the environment, e.g., for a media source being localized
in the environment, or not. To illustrate, when audio signal 302 provided by input
transducer 115 would include traffic sound and/or when barometric data provided by
barometric sensor would indicate a rather high altitude and/or when ambient temperature
sensor 132 would indicate a rather hot environment it may be concluded that the user's
interest in perceiving sound from a localized media source is rather small.
[0077] As another example, sensor data 304 may be provided by any of physiological sensors
133 - 135. The parameter may then be indicative of whether a physiological condition
of the user is suitable and/or typical for perceiving sound from a localized media
source, or not. To illustrate, when physiological data provided by optical sensor
133, e.g., a PPG sensor, and/or bioelectric sensor, e.g., an ECG sensor, would indicate
a rather high heart rate of the user and/or body temperature sensor 135 would indicate
a rather elevated temperature, it may be concluded that the user is not interested
in perceiving sound from a localized media source, e.g., because he's involved in
a sports activity. As a further example, sensor data 304 may be provided by location
sensor 138 and/or clock 139. E.g., a current location and/or time may be typical or
rather unusual for the user being interested in perceiving sound from a localized
media source. As a further example, sensor data 304 may be provided by user interface
137. E.g., some adjustments of hearing device 110, 210 performed by the user on the
user interface may be typical or rather unusual for the user being interested in perceiving
sound from a localized media source.
[0078] Media interest determination module 315 is configured to receive the first parameter
determined by audio signal analyzer 318 indicative of a property of sound detected
in the environment, and the second parameter determined by sensor data analyzer 319
indicative of a property of movements performed by the user. In some instances, when
sensor data analyzer 319 is configured to determine, based on sensor data 304, a third
parameter, media interest determinator 315 may also be configured to receive the third
parameter. Media interest determinator 315 is configured to determine, based on the
first and second parameter, and optionally also based on the third parameter, whether
the user is interested in perceiving sound from a media source localized in the environment.
In particular, media interest determinator 315 may be configured to determine whether
the first parameter and the second parameter fulfill a condition as a requirement
for concluding and/or predicting that the user could be interested in perceiving sound
from the localized media source. In some instances, a further requirement may be that
the third parameter fulfills such a condition.
[0079] In some instances, the condition may be determined relative to a threshold for at
least one of the parameters, e.g., a first threshold for the first parameter and/or
a second threshold for the second parameter and/or a third threshold for the third
parameter. E.g., the condition may be determined to be fulfilled when the first parameter
exceeds a first threshold and the second parameter falls below a second threshold.
To illustrate, when the first parameter is indicative of a temporal variability of
sound detected in the environment, the first parameter exceeding the threshold may
indicate a rather large variability of the sound suggesting that the sound may originate
from such a localized media source. When the second parameter is indicative of an
amplitude and/or an amount and/or a temporal variability of the movements performed
by the user, the second parameter falling below the threshold may further indicate
a rather small amplitude and/or amount and/or temporal variability of the movements
suggesting that the user has the intention to dedicate his attention to the sound
and/or other media content, e.g., visual content, originating from the media source.
Accordingly, evaluating the first parameter relative to the first threshold and the
second parameter relative to the second threshold can allow to conclude and/or predict
an interest of the user in perceiving sound from a localized media source with a higher
certainty as compared to only evaluating one of the parameters. In some examples,
the certainty of such a prediction may be further enhanced by also evaluating the
third parameter, e.g., relative to a third threshold.
[0080] In some instances, the condition may be determined with respect to a content of the
sound detected in the environment, e.g., whether the content is characteristic of
a predetermined media sound. In particular, the first parameter may be indicative
of a content of the sound detected in the environment. Media interest determination
module 315 may then be configured to determine whether the content of the environmental
sound matches the predetermined media sound. E.g., the predetermined media sound may
be provided as a sound pattern which can be compared to the environmental sound content.
When the environmental sound content matches the sound pattern, at least part of the
condition for the user being interested in perceiving the media sound may be concluded
and/or predicted to be fulfilled.
[0081] In some instances, media interest determination module 315 can be configured to execute
a machine learning (ML) algorithm configured to predict and/or indicate and/or output
a likelihood whether the first parameter indicative of a content of the sound detected
in the environment matches the predetermined media sound, e.g., in the form of a sound
pattern indicative of a localized media source. In particular, the ML algorithm may
be trained with previously recorded media sound of such a localized media source.
For instance, one or more NNs may be employed configured to provide for a separation
of sound emanated from a localized media source from other content and/or sound components
contained in audio signal 302. The first parameter may then be indicative of the separated
sound received from the localized media source and/or about the circumstance whether
such media sound is present in audio signal 302. Examples of such NNs, which may be
implemented as one or more deep neural networks (DNNs), configured to separate content
and/or sound components stemming from different acoustic objects from audio signal
302 are disclosed in international patent application Nos.
PCT/EP 2020/051 734 and
PCT/EP 2020/051 735, and in German patent application No.
DE 2019 206 743.3.
[0082] In some instances, when audio signal analyzing module 318 comprises an audio signal
classifier and the first parameter is indicative of a temporal variability of the
one or more classes attributed to audio signal 302 and/or contains information about
the one or more classes attributed to audio signal 302, media interest determination
module 315 can be configured to determine whether the first parameter fulfills a condition
of the one or more classes attributed to audio signal 302 indicating that audio signal
302 contains sound and/or sound components from a localized media source in which
the user could be interested.
[0083] For example, when the first parameter is indicative of a temporal variability of
the one or more classes attributed to audio signal 302, the condition may be evaluated
relative to a threshold. To illustrate, an indicator for sound and/or sound components
from a localized media source contained in audio signal 302 can be that the variability
of the one or more classes attributed to audio signal 302 exceeds the threshold. In
particular, the threshold may be exceeded when the one or more classes attributed
to audio signal 302 change rather often. In this way the circumstance may be exploited
that other sound in the user's environment, in particular sound unrelated to a localized
media source, may typically result in a more steady attribution of the one or more
classes to audio signal 302 and/or a variability of the one or more classes attributed
to audio signal 302 falling below the threshold. Thus, the first parameter exceeding
the threshold may be taken as a condition for sound and/or sound components from a
localized media source contained in audio signal 302 in which the user could be interested.
[0084] As another example, when the first parameter contains information about the one or
more classes attributed to audio signal 302, the condition may be evaluated relative
to whether the information is characteristic of a predetermined media sound which
may be characteristic of sound and/or sound components from a localized media source.
E.g., the information may comprise a label and/or identifier and/or other characteristic
of the one or more classes attributed to audio signal 302. The condition may be determined
to be fulfilled when it is determined that the information is characteristic of the
predetermined media sound. To illustrate, some classes which may be attributed to
audio signal 302, e.g., quiet indoor, quiet outdoor, speech in a reverberating environment,
speech in noise, speech from the user, own voice of the user, and/or the like, may
be less characteristic of sound and/or sound components from a localized media source
as compared to other classes, e.g., public area noise, speech, nonspeech, speech in
quiet, public area noise, applause, music and/or the like. Accordingly, the condition
may be deemed to be fulfilled when the information yields that at least one of the
classes which are more characteristic for sound and/or sound components from the localized
media source has been attributed to audio signal 302.
[0085] In some instances, when the second parameter is indicative of a movement behavior
of the user over time, the condition may be determined with respect to whether the
movement behavior is characteristic for the user being interested in perceiving sound
from a localized media source. E.g., media interest determination module 315 may be
configured to determine whether the user's movement behavior matches a predetermined
movement pattern. The movement pattern may be indicative of a movement behavior of
the user, e.g., a type and/or sequence and/or rate and/or amplitude and/or duration
and/or lack of movements performed by the user, which are typical for the user being
interested in perceiving sound from the localized media source. When the movement
behavior matches the movement pattern, at least part of the condition for the user
being interested in perceiving the media sound may be concluded and/or predicted to
be fulfilled. For instance, media interest determination module 315 can be configured
to execute an ML algorithm configured to predict and/or indicate and/or output a likelihood
whether the second parameter matches the movement pattern. In particular, the ML algorithm
may be trained with previously recorded movement behaviors of the user and/or other
users over time. The training data may be labelled with regard to whether the user
has been interested or uninterested in perceiving sound from a localized media source
when executing the respective movement behavior.
[0086] In some instances, when a third parameter is provided by sensor data analyzer 319
based on sensor data 304 and received by media interest determination module 315,
the condition may be also determined to be fulfilled by the third parameter. E.g.,
the third parameter may be evaluated with respect to an environmental condition to
be fulfilled by environmental sensor data provided by any of environmental sensors
115, 131, 132 and/or a physiological condition to be fulfilled by physiological sensor
data provided by any of physiological sensors 133 - 135 and/or with respect to a location
and/or time and/or adjustments via user interface 137 indicating the user's interest
in perceiving sound from a localized media source.
[0087] When it is determined, by media interest determination module 315, that the user
is interested in perceiving sound from a localized media source, media interest determination
module 315 can be configured to initiate an operation of hearing device 110, 210 optimizing
the processing of audio signal 302, as performed by audio processing module 313, for
perceiving sound from the localized media source. In particular, media interest determination
module 315 may be configured to provide one or more instructions for the optimizing
of the processing of audio signal 302 to processing instruction selection module 317.
[0088] In some instances, the operation for optimizing the processing of audio signal 302
comprises reducing a rate at which different audio processing instructions are applied
to audio signal 302. In particular, some media sources may emit sound at a large variability
of the sound. For instance, the emitted sound may change rather frequently between
different sound types and/or sound contents, e.g., speech and/or music and/or environmental
sound and/or background noise and/or sound special effects and/or silence. Similarly,
a level and/or frequency content and/or a number of onsets and/or, when sound is emitted
by the media source from a plurality of sound sources 411, 412, a DOA of the sound
at hearing device 110, 210 may change rather frequently. In a standard operation of
hearing device 110, 210, the large variability of the detected sound may lead to a
frequent change of the audio processing instructions are applied to audio signal 302.
[0089] To illustrate, some audio processing instructions may be optimized for a reproduction
of speech encoded in audio signal 302, other audio processing instructions may be
optimized for a reproduction of music encoded in audio signal 302, still other audio
processing instructions may be optimized for noise reduction. When the sound type
and/or sound content in audio signal 302 changes frequently, e.g., between music and/or
speech and/or background noise, the applied audio processing instructions optimized
for the respective sound type and/or sound content may change accordingly. Such a
frequent switching between different audio processing instructions applied to audio
signal 302 can be rather disturbing for the user, e.g., due to a varying sound reproduction
and/or processing delays and/or sound artefacts caused by the switching. Thus, reducing
the rate at which different audio processing instructions are applied to audio signal
302, can optimize the processing of audio signal 302 for perceiving sound from a localized
media source.
[0090] To further illustrate, when audio signal analyzing module 318 comprises an audio
signal classifier, audio signal 302 may be processed by audio signal processor 318
by applying one or more audio processing instructions associated with the one or more
classes attributed to audio signal 302 by the audio signal classifier. To this end,
in a standard operation of hearing device 110, 210, processing instruction selection
module 317 may be configured to select the audio processing instructions applied by
audio signal processor 318 depending on the classification performed by the audio
signal classifier. In a case in which sound encoded in audio signal 302 has a rather
large variability, however, the one or more classes attributed to audio signal 302
may change rather frequent leading to a frequent change of the applied audio processing
instructions. In order to overcome the negative side effects of such a frequent change
of the audio processing, when it is determined that the user is interested in perceiving
sound from a localized media source, media interest determination module 315 can provide
instructions to processing instruction selection module 317 to reduce the rate at
which different audio processing instructions are applied to audio signal 302. The
instructions may include, e.g., to apply currently applied audio processing instructions
for a minimum time even if a different class has been attributed to audio signal 302
by the audio signal classifier. The instructions may also include, e.g., to only select
one or more audio processing instructions associated with one of the classes attributed
to audio signal 302 to be applied to audio signal 302 which are most appropriate for
reproducing sound from the localized media source. For instance, one or more audio
processing instructions for enhancing an intelligibility of speech encoded in audio
signal 302 may then be selected to be applied to audio signal 302, e.g., under the
presumption that the user is mostly interested in comprehending a speech content presented
by the media source, even if speech content and music content would be reproduced
by the media source.
[0091] In some instances, the operation for optimizing the processing of audio signal 302
comprises disabling applying of at least one audio processing instruction which is
unsuitable for perceiving sound from the localized media source to audio signal 302.
To illustrate, some audio processing instructions, which may be applied to audio signal
302 in a standard operation of hearing device 110, 210, may be unsuitable for reproducing
sound from a localized media source and/or may affect aversely a desired perception
of such sound for the user. For instance, some media sources, e.g., a television program
and/or a movie shown in a movie theater, may reproduce sound features such as traffic
noise, machine noise, speech in babble, and/or the like as part of the media content,
e.g., to provide for a desired sound ambience and/or for entertainment purposes. In
such a case, it may be undesirable to process the media content encoded in audio signal
302 equivalently to a corresponding content which does not originate from a media
source but from another sound source in the user's environment. Accordingly, when
it is determined that the user is interested in perceiving sound from a localized
media source, media interest determination module 315 can provide instructions to
processing instruction selection module 317 to disable a selection of at least one
audio processing instruction which is unsuitable for perceiving sound from the localized
media source when applied to audio signal 302. E.g., audio processing instructions
usually employed for a noise reduction of environmental sound may then be disabled,
e.g., to avoid an undesired influence and/or distortion of the media content.
[0092] In some instances, the operation for optimizing the processing of audio signal 302
comprises applying at least one audio processing instruction optimized for perceiving
sound from the localized media source to audio signal 302. In some examples, at least
one audio processing instruction may be provided which is uniquely applicable to audio
signal 302 when it is determined that the user is interested in perceiving sound from
a localized media source. E.g., the at least one audio processing instruction may
not be associated with one or more classes attributed to audio signal 302 by an audio
signal classifier included in audio signal analyzing module 318. Accordingly, when
it is determined that the user is interested in perceiving sound from the localized
media source, media interest determination module 315 can provide instructions to
processing instruction selection module 317 to select the at least one audio processing
instruction to be applied to audio signal 302.
[0093] In some instances, the at least one audio processing instruction optimized for perceiving
sound from the localized media source may provide for at least one of enhancing an
intelligibility of speech encoded in the audio signal, in particular speech presented
from the localized media source; enhancing a quality of sound encoded in the audio
signal; enhancing sound from the media source encoded in the audio signal relative
to other environmental sound encoded in the audio signal; and separating sound from
the media source encoded in the audio signal from other sound encoded in the audio
signal. E.g., the at least one audio processing instruction may provide for noise
reduction of the other environmental sound and/or improve the quality of sound with
regard to a clarity of the reproduced sound and/or with regard to a listening comfort
for the user when perceiving the sound.
[0094] FIG. 6 illustrates a block flow diagram for an exemplary method of processing input
audio signal 302. The method may be executed by processor 112, 310 of hearing device
110, 210 and/or another processor communicatively coupled to processor 112, 310. At
operation S12, after receiving audio signal 302, which may be provided by input transducer
115, a processing of audio signal 302 is performed by applying one or more audio processing
instructions to audio signal 302. Based on the processed audio signal, an output audio
signal 305 is provided which can be output by output transducer 117 so as to stimulate
the user's hearing.
[0095] At operation S 13, after receiving audio signal 302 and displacement data 303, which
may be provided by displacement sensor 125, it is determined whether the user is interested
in perceiving sound from a media source localized in the environment. Operation S13
may be performed independently and/or in parallel to the audio processing performed
at S12. In some implementations, further sensor data 304 may be received at S13 to
determine whether the user is interested. In a case in which it is determined that
the user is interested, operation S14 is executed. At S14, an operation optimizing
the processing of audio signal 302 for perceiving sound from the localized media source
is initiated, which can then be applied in the audio processing at S12.
[0096] FIG. 7 illustrates a block flow diagram of an exemplary implementation of the method
illustrated in FIG. 6. In this implementation, operation S13 is replaced by operations
S22 and S23. At S22, after receiving audio signal 302, audio signal 302 is classified
by attributing at least one class from a plurality of classes to audio signal 302,
wherein different audio processing instructions are associated with different classes.
The at least one audio processing instruction associated with the at least one class
attributed to audio signal 302 can then be applied in the audio processing at S12.
[0097] At operation S23, after receiving displacement data 303, it is determined whether
the user is interested in perceiving sound from a media source localized in the environment.
The determining whether the user is interested can thus be based on displacement data
303 and the at least one class which has been attributed to audio signal 302 at S22.
E.g., the determining whether the user is interested may be determined depending on
a temporal variability of the attribution of the at least one class to audio signal
302 and/or whether the at least one class attributed to audio signal 302 is characteristic
of a predetermined media sound. In some instances, as illustrated, audio signal 302
may be further employed at S23 for the determining whether the user is interested.
E.g., in addition to the information about the at least one class which has been attributed
to audio signal 302 at S22, the determining whether the user is interested may be
based on another characteristic of audio signal 302, e.g., a level and/or a SNR and/or
a frequency content of audio signal 302.
[0098] While the principles of the disclosure have been described above in connection with
specific devices and methods, it is to be clearly understood that this description
is made only by way of example and not as limitation on the scope of the invention.
The above described preferred embodiments are intended to illustrate the principles
of the invention, but not to limit the scope of the invention. Various other embodiments
and modifications to those preferred embodiments may be made by those skilled in the
art without departing from the scope of the present invention that is solely defined
by the claims. In the claims, the word "comprising" does not exclude other elements
or steps, and the indefinite article "a" or "an" does not exclude a plurality. A single
processor or controller or other unit may fulfil the functions of several items recited
in the claims. The mere fact that certain measures are recited in mutually different
dependent claims does not indicate that a combination of these measures cannot be
used to advantage. Any reference signs in the claims should not be construed as limiting
the scope.