OPERATING A HEARING DEVICE FOR OPTIMIZING SOUND DELIVERY FROM A LOCALIZED MEDIA SOURCE

(19)

(11)

EP 4 521 777 A1

(12)	EUROPEAN PATENT APPLICATION

(43)	Date of publication:
	12.03.2025 Bulletin 2025/11

(21)	Application number: 23196022.0

(22)	Date of filing: 07.09.2023

(51)

International Patent Classification (IPC):

H04R 25/00^(2006.01)

(52)	Cooperative Patent Classification (CPC):
	H04R 25/407; H04R 25/405; H04R 25/43; H04R 25/50

(84)	Designated Contracting States:
	AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC ME MK MT NL NO PL PT RO RS SE SI SK SM TR
	Designated Extension States:
	BA
	Designated Validation States:
	KH MA MD TN

(71)	Applicant: Sonova AG
	8712 Stäfa (CH)

(72)	Inventor:
	Müller, Stephan 8610 Uster (CH)

(54)	OPERATING A HEARING DEVICE FOR OPTIMIZING SOUND DELIVERY FROM A LOCALIZED MEDIA SOURCE

(57) The disclosure relates to a method of operating a hearing device configured to be worn at an ear of a user, the method comprising receiving an audio signal (302) indicative of a sound detected in the environment of the user; receiving displacement data (303) indicative of a displacement of the hearing device; processing the audio signal (302) by applying one or more audio processing instructions to the audio signal (302); and initiating outputting an output audio signal (305) based on the processed audio signal so as to stimulate the user's hearing. The disclosure further relates to a hearing device configured to perform the method.
To provide for an optimized reproduction of sound emitted by a media source localized in the environment, the disclosure proposes determining, based on the audio signal (302) and the displacement data (303), whether the user is interested in perceiving sound from the media source (410); and, when it is determined that the user is interested, initiating an operation of the hearing device optimizing said processing of the audio signal (302) for perceiving sound from the localized media source (410).

Description

TECHNICAL FIELD

[0001] The disclosure relates to method of operating a hearing device configured to be worn at an ear of a user, according to the preamble of claim 1. The disclosure further relates to a hearing device, according to the preamble of claim 15.

BACKGROUND

[0002] Hearing devices may be used to improve the hearing capability or communication capability of a user, for instance by compensating a hearing loss of a hearing-impaired user, in which case the hearing device is commonly referred to as a hearing instrument such as a hearing aid, or hearing prosthesis. A hearing device may also be used to output sound based on an audio signal which may be communicated by a wire or wirelessly to the hearing device. A hearing device may also be used to reproduce a sound in a user's ear canal detected by an input transducer such as a microphone or a microphone array. The reproduced sound may be amplified to account for a hearing loss, such as in a hearing instrument, or may be output without accounting for a hearing loss, for instance to provide for a faithful reproduction of detected ambient sound and/or to add audio features of an augmented reality in the reproduced ambient sound, such as in a hearable. A hearing device may also provide for a situational enhancement of an acoustic scene, e.g. beamforming and/or active noise cancelling (ANC), with or without amplification of the reproduced sound. A hearing device may also be implemented as a hearing protection device, such as an earplug, configured to protect the user's hearing. Different types of hearing devices configured to be be worn at an ear include earbuds, earphones, hearables, and hearing instruments such as receiver-in-the-canal (RIC) hearing aids, behind-the-ear (BTE) hearing aids, in-the-ear (ITE) hearing aids, invisible-in-the-canal (IIC) hearing aids, completely-in-the-canal (CIC) hearing aids, cochlear implant systems configured to provide electrical stimulation representative of audio content to a user, a bimodal hearing system configured to provide both amplification and electrical stimulation representative of audio content to a user, or any other suitable hearing prostheses. A hearing system comprising two hearing devices configured to be worn at different ears of the user is sometimes also referred to as a binaural hearing device. A hearing system may also comprise a hearing device, e.g., a single monaural hearing device or a binaural hearing device, and a user device, e.g., a smartphone and/or a smartwatch, communicatively coupled to the hearing device.

[0003] Hearing devices are often employed in conjunction with communication devices, such as smartphones or tablets, for instance when listening to sound data processed by the communication device and/or during a phone conversation operated by the communication device. More recently, communication devices have been integrated with hearing devices such that the hearing devices at least partially comprise the functionality of those communication devices. A hearing system may comprise, for instance, a hearing device and a communication device.

[0004] In recent times, some hearing devices are also increasingly equipped with different sensor types. Traditionally, those sensors often include an input transducer to detect a sound, e.g., a sound detector such as a microphone or a microphone array. An amplified and/or signal processed version of the detected sound may then be outputted to the user by an output transducer, e.g., a receiver, loudspeaker, or electrodes to provide electrical stimulation representative of the outputted signal. In an effort to provide the user with even more information about himself and/or the ambient environment, various other sensor types are progressively implemented, in particular sensors which are not directly related to the sound reproduction and/or amplification function of the hearing device. Those sensors include inertial sensors, such as accelerometers, allowing to monitor the user's movements. Physiological sensors, such as optical sensors and bioelectric sensors, are mostly employed for monitoring the user's health.

[0005] Modern hearing devices provide several features that aim to facilitate speech intelligibility, improve sound quality, reduce noise level, etc. Many of such sound cleaning features are designed to benefit the hearing device user's hearing performance in very specific situations. In order to activate the functionalities only in the situations where benefit can be expected, an automatic steering system is often implemented which activates sound cleaning features depending on a combination of, e.g., an acoustic environment classification, a physical activity classification, a directional classification, etc.

[0006] To provide for the acoustic environment classification, hearing devices have been equipped with a sound classifier to classify an ambient sound. An input transducer can provide an audio signal representative of the ambient sound. The sound classifier can classify the audio signal allowing to identify different listening situations by determining a characteristic from the audio signal and assigning the audio signal to at least one relevant class from a plurality of predetermined classes depending on the characteristic. Usually, the sound classification does not directly modify a sound output of the hearing device. Instead, different audio processing instructions are stored in a memory of the hearing device specifying different audio processing parameters for a processing of the audio signal, wherein the different classes are each associated with one of the different audio processing instructions. After assigning the audio signal to one or more classes, the one or more associated audio processing instructions are executed. The audio processing parameters specified by the audio processing instructions can then provide a processing of the audio signal customized for the particular listening situation corresponding to the at least one class identified by the classifier. The different listening situations may comprise, for instance, different classes of listening conditions and/or different classes of sounds. For example, the different classes may comprise speech and/or nonspeech and/or music and/or traffic noise and/or other ambient noise.

[0007] The classification may be based on a statistical evaluation of the audio signal, as disclosed in EP 3 036 915 B1. More recently, machine learning (ML) algorithms have been employed to classify the ambient sound. The classifier can be implemented by an artificial intelligence (AI) chip which may be configured to classify the audio signal by at least one deep neural network (DNN). The classifier may comprise a sound source separator configured to separate sound generated by different sound sources, for instance a conversation partner, passengers passing by the user, vehicles moving in the vicinity of the user such as cars, airborne traffic such as a helicopter, a sound scene in a restaurant, a sound scene including road traffic, a sound scene during public transport, a sound scene in a home environment, and/or the like. Examples of such a sound source separator are disclosed in international patent application Nos. PCT/EP 2020/051 734 and PCT/EP 2020/051 735, and in German patent application No. DE 2019 206 743.3.

[0008] Another approach would be to mix different features associated with different classes. To this end, a mixed mode classifier has been proposed in EP 1 858 292 B1. The mixed mode classifier can attribute one, two or more classes to the audio signal, wherein the different features in the form of audio processing instructions associated with the different classes can be mixed in dependence of class similarity factors. The class similarity factors are indicative of a similarity of the current acoustic environment with a respective predetermined acoustic environment associated with the different classes. The mixing of the different audio processing instructions may imply, e.g., a linear combination of base parameter sets representing the audio processing instructions associated with the different classes, or other non-linear ways of mixing the audio processing instructions. The different audio processing instructions may be provided as sub-functions, which can be included into a transfer function used by the signal processing circuit according to the desired mixing of the audio processing instructions. For example, audio processing instructions, e.g., in the form of the base parameter sets, related to a beamformer and/or a gain model (i.e., an amplification characteristic) may be mixed depending on whether or to which degree the audio signal is attributed, e.g., by the class similarity factors, to one or more of the classes music and/or speech in noise and/or speech.

[0009] EP 2 201 793 B1 discloses a classifier configured for an automatic adaption of the audio processing instructions associated with the different classes depending on adjustments performed by the user. Adjustment data indicative of the user adjustments can be logged, e.g., stored in a storage unit, and evaluated to learn correction data for correcting the audio processing instructions. In a mixed mode classifier, for a current sound environment and depending on the adjustment data, an offset can be learned for the mixed base parameter sets representing the audio processing instructions associated with the different classes. For the purpose of learning, correction data may be separately provided for different classes.

[0010] A rather specific use case of operating a hearing device concerns a faithful reproduction of sound which is emitted from a localized media source in the user's environment. In principle, the above described acoustic environment classification could also be employed to determine whether an audio signal representative of the ambient sound would include such a media content, e.g., by attributing the audio signal to a dedicated class characteristic for the media sound. Subsequently, when the audio signal would be attributed to such a class, at least one audio processing instruction associated with the class which is optimized for perceiving sound from the localized media source could be applied to the audio signal.

[0011] A difficulty of such an approach is that media content, which may be presented to the user by various media sources in his environment, can, in general, vary greatly, which makes a reliable classification of the environmental sound with regard to such a media content rather complex and/or challenging. For example, some media sound, such as a TV program and/or a movie presented at a movie theater, may comprise sound features typically occurring also in other daily situations of the user, e.g., a single speech, conversations of other people, traffic sound and/or sound emitted from other noise sources, and may therefore be hard to distinguish by a classifier whether it stems from a localized media source or not.

[0012] Another problem arising from such an approach is that, even if such a media content is present in the user's environment, it remains questionable whether the user would be interested in following and/or consuming such a content. In particular, initiating an operation of the hearing device which would be optimized for perceiving the sound from the localized media source would be mostly desirable when the user is also interested in the media content.

SUMMARY

[0013] It is an object of the present disclosure to avoid at least one of the above mentioned disadvantages and to provide for a hearing device functionality allowing an optimized reproduction of sound emitted by a localized media source, in particular depending on a presence of such a media source in the user's environment and/or depending on the user's interest in perceiving such sound. It is another object to provide for an audio processing automatically accounting for situations in which the user wants to perceive sound from a localized media source. It is a further object to increase a reliability of determining situations in which such a media source is present in the user's environment and/or the user is interested in perceiving such sound. It is yet another object to provide for an audio processing which is optimized for perceiving sound from the localized media source. It is a further object to provide a hearing device which is configured to operate in such a manner.

[0014] At least one of these objects can be achieved by a method of operating a hearing device configured to be worn at an ear of a user comprising the features of claim 1 and/or a hearing device comprising the features of claim 15. Advantageous embodiments of the invention are defined by the dependent claims and the following description.

[0015] Accordingly, the present disclosure proposes a method of operating a hearing device configured to be worn at an ear of a user, the method comprising

receiving, from an input transducer included in the hearing device, an audio signal indicative of a sound detected in the environment of the user;
receiving, from a displacement sensor included in the hearing device, displacement data indicative of a displacement of the hearing device;
processing the audio signal by applying one or more audio processing instructions to the audio signal;
initiating outputting, by an output transducer included in the hearing device, an output audio signal based on the processed audio signal so as to stimulate the user's hearing;
determining, based on the audio signal and the displacement data, whether the user is interested in perceiving sound from a media source localized in the environment; and, when it is determined that the user is interested,
initiating an operation of the hearing device optimizing the processing of the audio signal for perceiving sound from the localized media source.

[0016] In this way, by employing information in the audio signal and the displacement data to determine whether the user is interested in perceiving sound from the localized media source, the hearing device operation optimized for perceiving sound from the localized media source can be evoked in a more reliable way and/or can be better attuned to the user's individual needs and/or sound properties of the media content. In particular, an interest of the user in the media content may thus not be presumed based on solely determining a presence of such a media source in the user's environment, e.g., based on the audio signal, but rather on indications contained in the audio signal and/or displacement data of the user's intention and/or preference to engage in such a content. For instance, the operation optimized for perceiving sound from the localized media source may thus be automatically activated depending the determined user interest facilitating an interaction of the user with the hearing device.

[0017] Independently, the present disclosure also proposes a non-transitory computer-readable medium storing instructions that, when executed by a processor, cause a hearing device to perform operations of the method.

[0018] Independently, the present disclosure also proposes a hearing device configured to be worn an ear of a user, the hearing device comprising

an input transducer configured to provide an audio signal indicative of a sound detected in the environment of the user;
a displacement sensor configured to provide displacement data indicative of a displacement of the hearing device;
a processor configured to process the audio signal by applying one or more audio processing instructions to the audio signal; and
an output transducer configured to output an output audio signal based on the processed audio signal so as to stimulate the user's hearing, wherein the processor is further configured to
determine, based on the audio signal and the displacement data, whether the user is interested in perceiving sound from a media source localized in the environment; and, when it is determined that the user is interested; and
initiate an operation of the hearing device optimizing said processing of the audio signal for perceiving sound from the localized media source.

[0019] Subsequently, additional features of some implementations of the method of operating a hearing device and/or the computer-readable medium and/or the hearing device are described. Each of those features can be provided solely or in combination with at least another feature. The features can be correspondingly provided in some implementations of the method and/or the hearing device.

[0020] In some implementations, the method further comprises

determining, based on the audio signal, a first parameter indicative of a property of the sound detected in the environment;
determining, based on the displacement data, a second parameter indicative of a property of movements performed by the user, wherein said determining whether the user is interested in perceiving sound from the localized media source is based on the first parameter and the second parameter.

[0021] In some implementations, the first parameter is indicative of a variability of the sound detected in the environment, wherein said determining whether the user is interested in perceiving sound from the localized media source includes a condition that the first parameter exceeds a threshold. In some instances, the variability of the sound may be determined with respect to a variability of at least one sound content, e.g., sound type, encoded in the audio signal and/or a level and/or a frequency and/or a number of onsets and/or a direction of arrival (DOA) of the audio signal. In some instances, the sound content may be characteristic for sound which is typical for one or more acoustic objects emitting the sound. The variability of the sound content may then be characteristic for an amount by which sound typical for one or more acoustic objects varies.

[0022] In some implementations, the second parameter is indicative of an amplitude and/or an amount and/or a variability of the movements performed by the user, wherein said determining whether the user is interested in perceiving sound from the localized media source includes a condition that the second parameter falls below a threshold.

[0023] In some implementations, the first parameter is indicative of a sound content encoded in the audio signal, and said determining whether the user is interested in perceiving sound from the localized media source includes a condition that the first parameter is characteristic of a predetermined media sound.

[0024] In some implementations, the second parameter is indicative of a movement behavior of the user over time, e.g., a type and/or sequence and/or lack of movements performed by the user over time, and said determining whether the user is interested in perceiving sound from the localized media source includes a condition that the second parameter is characteristic of a predetermined movement pattern.

[0025] In some implementations, the method further comprises classifying the audio signal by attributing at least one class from a plurality of predetermined classes to the audio signal, wherein said determining whether the user is interested in perceiving sound from the localized media source is based on the at least one class attributed to the audio signal. In some implementations, the first parameter is indicative of a variability, e.g., alteration over time, of the at least one class attributed to the audio signal and/or whether the at least one class attributed to the audio signal is characteristic of a predetermined media sound.

[0026] In some implementations, the method further comprises classifying the displacement data by attributing at least one class from a plurality of predetermined classes to the displacement data, wherein said determining whether the user is interested in perceiving sound from the localized media source is based on the at least one class attributed to the displacement data. In some implementations, the second parameter is indicative of a variability of the at least one class attributed to the displacement data and/or whether the at least one class attributed to the displacement data is characteristic of a predetermined movement pattern of the user when focusing his attention to the localized media source.

[0027] In some implementations, the method further comprises

receiving, from an environmental sensor, environmental sensor data indicative of a property of the environment; and/or
receiving, from a physiological sensor, physiological sensor data indicative of a physiological property of the user; and/or
receiving, from a user interface, user interaction data indicative of an interaction of the user with the user interface; and/or
receiving, from a location sensor, location data indicative of a current location of the user; and/or
receiving, from a clock, time data indicative of a current time,

wherein said determining whether the user is interested in perceiving sound from the media source is also based on the environmental sensor data and/or physiological sensor data and/or user interaction data and/or location data and/or time data.

[0028] In some implementations, the method further comprises

classifying the audio signal by attributing at least one class from a plurality of predetermined classes to the audio signal, wherein different audio processing instructions are associated with different classes; and
processing the audio signal by applying the audio processing instruction associated with the class attributed to the audio signal.

[0029] In some implementations, the operation of the hearing device optimizing the processing of the audio signal comprises

reducing a rate at which different audio processing instructions are applied to the audio signal; and/or
disabling applying of at least one audio processing instruction which is unsuitable for perceiving sound from the localized media source to the audio signal; and/or
applying at least one audio processing instruction optimized for perceiving sound from the localized media source to the audio signal.

[0030] In some implementations, the audio processing instruction optimized for perceiving sound from the localized media source comprises

enhancing an intelligibility of speech encoded in the audio signal; and/or
enhancing a quality of sound encoded in the audio signal; and/or
enhancing sound from the media source encoded in the audio signal relative to other environmental sound encoded in the audio signal; and/or
separating sound from the media source encoded in the audio signal from other sound encoded in the audio signal.

[0031] In some implementations, the method further comprises

determining a degree of correlation between the audio signal and the displacement data; and/or
determining a variability of a degree of correlation between the audio signal and the displacement data,

wherein said determining whether the user is interested in perceiving sound from the localized media source is based on the degree of correlation and/or the variability of the degree of correlation.

[0032] In some implementations, the method of further comprises

determining whether the audio signal contains an own voice of the user, and, when the audio signal contains the user's own voice,
disabling the operation of the hearing device optimized for perceiving the sound from the localized media source.

[0033] In some implementations, the localized media source is configured to provide for a visual media content. In some implementations, the localized media source comprises a screen for displaying the visual content. E.g., the media source may comprise a television and/or a screen in a movie theater. In some implementations, the operation of the hearing device is optimized for perceiving the sound of a television program and/or a movie shown in a movie theater.

[0034] In some implementations, the method further comprises

determining whether the audio signal contains an own voice of the user, and, when the audio signal contains the user's own voice,
disabling the operation of the hearing device optimized for perceiving the sound from the localized media source.

[0035] In some implementations, the method further comprises

receiving, from a user interface, a user command indicative of whether the user accepts or rejects the operation of the hearing device, wherein the operation optimizing the processing of the audio signal is initiated depending on the user command.

[0036] In some implementations, the audio signal and/or the first parameter and the displacement data and/or the second parameter are input into a machine learning (ML) algorithm, which outputs, e.g., a probability and/or likelihood, whether the user is interested in perceiving sound from the localized media source, wherein the ML algorithm has been trained with previous audio signals and/or first parameters and associated displacement data and/or second parameters. E.g., the operation of the hearing device optimizing said processing of the audio signal for perceiving sound from the localized media source may then be initiated when the probability and/or likelihood exceeds a threshold.

[0037] In some implementations, the variability of the sound and/or movements and/or at least one class attributed to the audio signal may be defined as a temporal variability, e.g., alteration over time, of the sound and/or movements and/or class, in particular an amount by which the sound and/or movements and/or class varies over time.

BRIEF DESCRIPTION OF THE DRAWINGS

[0038] Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. The drawings illustrate various embodiments and are a part of the specification. The illustrated embodiments are merely examples and do not limit the scope of the disclosure. Throughout the drawings, identical or similar reference numbers designate identical or similar elements. In the drawings:

Fig. 1: schematically illustrates an exemplary hearing device;
Fig. 2: schematically illustrates an exemplary sensor unit comprising one or more sensors which may be implemented in the hearing device illustrated in Fig. 1;
Fig. 3: schematically illustrates an embodiment of the hearing device illustrated in Fig. 1 as a RIC hearing aid;
Fig. 4: schematically illustrates a media source localized in an environment of a user;
Fig. 5: schematically illustrates an exemplary algorithm of processing an audio signal according to principles described herein; and
Figs. 6, 7: schematically illustrate some exemplary methods of processing an audio signal according to principles described herein.

DETAILED DESCRIPTION OF THE DRAWINGS

[0039] FIG. 1 illustrates an exemplary hearing device 110 configured to be worn at an ear of a user. Hearing device 110 may be implemented by any type of hearing device configured to enable or enhance hearing or a listening experience of a user wearing hearing device 110. For example, hearing device 110 may be implemented by a hearing aid configured to provide an amplified version of audio content to a user, a sound processor included in a cochlear implant system configured to provide electrical stimulation representative of audio content to a user, a sound processor included in a bimodal hearing system configured to provide both amplification and electrical stimulation representative of audio content to a user, or any other suitable hearing prosthesis, or an earbud or an earphone or a hearable.

[0040] Different types of hearing device 110 can also be distinguished by the position at which they are worn at the ear. Some hearing devices, such as behind-the-ear (BTE) hearing aids and receiver-in-the-canal (RIC) hearing aids, typically comprise an earpiece configured to be at least partially inserted into an ear canal of the ear, and an additional housing configured to be worn at a wearing position outside the ear canal, in particular behind the ear of the user. Some other hearing devices, as for instance earbuds, earphones, hearables, in-the-ear (ITE) hearing aids, invisible-in-the-canal (IIC) hearing aids, and completely-in-the-canal (CIC) hearing aids, commonly comprise such an earpiece to be worn at least partially inside the ear canal without an additional housing for wearing at the different ear position.

[0041] As shown, hearing device 110 includes a processor 112 communicatively coupled to a memory 113, an input transducer 115, a displacement sensor 125, and an output transducer 117. A sensor unit 120 comprises input transducer 115 and displacement sensor 125. Hearing device 110 may include additional or alternative components as may serve a particular implementation.

[0042] Input transducer 115 may be implemented by any suitable device configured to detect sound in the environment of the user and to provide an input audio signal indicative of the detected sound, e.g., a microphone or a microphone array. Output transducer 117 may be implemented by any suitable audio transducer configured to output an output audio signal to the user, for instance a receiver of a hearing aid, an output electrode of a cochlear implant system, or a loudspeaker of an earbud.

[0043] Displacement sensor 125 may be implemented by any suitable detector configured to provide displacement data indicative of a rotational displacement and/or a translational displacement of hearing device 110. The displacement data may then also indicate a corresponding movement of the user wearing hearing device 110. Examples include a rotation of the user's head and/or body, a walking activity of the user and/or any translational movement of the user's body and/or head and/or the like. In particular, displacement sensor 125 may comprise at least one inertial sensor. The inertial sensor can include, for instance, an accelerometer configured to provide the displacement data representative of an acceleration and/or a translational movement and/or a rotation, and/or a gyroscope configured to provide the displacement data representative of a rotation. Displacement sensor 125 may also comprise an optical detector such as a light sensor and/or a camera. Examples include a charged-coupled-device (CCD) sensor, a photodetector sensitive for light in the red and/or infrared electromagnetic spectrum, a photoplethysmography (PPG) sensor, a pulse oximeter including a photodetector for determining an oxygen saturation (SpO₂ level) of the user's blood, and/or the like. In some instances, the optical detector may be implemented to be disposed inside the ear canal and/or at the concha of the ear when the hearing device is worn at the ear. Displacement sensor 125 may also comprise an electronic compass such as a magnetometer configured to provide the displacement data representative of a change of a magnet field, in particular a magnetic field in an ambient environment of hearing device 110 such as the Earth's magnetic field.

[0044] Processor 112 is configured to receive, from input transducer 115, an input audio signal indicative of a sound detected in the environment of the user; to receive, from displacement sensor 125 displacement data indicative of a displacement of the hearing device; to process the audio signal; to initiate outputting, by output transducer 117, an output audio signal based on the processed audio signal so as to stimulate the user's hearing; to determine, based on the audio signal and the displacement data, whether the user is interested in perceiving sound from a localized media source; and, when it is determined that the user is interested, to initiate an operation of hearing device 110 optimizing the processing of the audio signal for perceiving sound from the localized media source. These and other operations, which may be performed by processor 112, are described in more detail in the description that follows.

[0045] Memory 113 may be implemented by any suitable type of storage medium and is configured to maintain, e.g. store, data controlled by processor 112, in particular data generated, accessed, modified and/or otherwise used by processor 112. For example, memory 113 may be configured to store instructions used by processor 112 to process the input audio signal received from input transducer 115, e.g., audio processing instructions in the form of one or more audio processing programs. The audio processing programs may comprise different audio processing instructions of modifying the input audio signal received from input transducer 115. For instance, the audio processing instructions may include algorithms providing a gain model, noise cleaning, noise cancelling, wind noise cancelling, reverberation cancelling, narrowband coupling, beamforming, in particular static and/or adaptive beamforming, and/or the like.

[0046] As another example, memory 113 may be configured to store instructions used by processor 112 to classify the input audio signal received from input transducer 115 by attributing at least one class from a plurality of predetermined sound classes to the input audio signal. Exemplary classes may include, but are not limited to, low ambient noise, high ambient noise, traffic noise, music, machine noise, babble noise, public area noise, background noise, speech, nonspeech, speech in quiet, speech in babble, speech in noise, speech from the user, speech from a significant other, background speech, speech from multiple sources, quiet indoor, quiet outdoor, speech in a car, speech in traffic, speech in a reverberating environment, speech in wind noise, speech in a lounge, car noise, applause, music, e.g. classical music, and/or the like. In some instances, the different audio processing instructions can be associated with different classes.

[0047] As another example, memory 113 may be configured to store instructions used by processor 112 to classify the displacement data received from displacement sensor 125 by attributing at least one class from a plurality of predetermined displacement classes to the movement data. Exemplary classes may include at least one of movements of the user indicating an interest of the user in perceiving sound from a localized media source, movements indicating an absence and/or lack of such an interest, movements indicating the user is moving in a certain direction, rotating his head, resting, sitting, laying and/or lying down, walking, running, dancing and/or the like.

[0048] Memory 113 may comprise a non-volatile memory from which the maintained data may be retrieved even after having been power cycled, for instance a flash memory and/or a read only memory (ROM) chip such as an electrically erasable programmable ROM (EEPROM). A non-transitory computer-readable medium may thus be implemented by memory 113. Memory 113 may further comprise a volatile memory, for instance a static or dynamic random access memory (RAM).

[0049] As illustrated, hearing device 110 may further comprise a communication port 119. Communication port 119 may be implemented by any suitable data transmitter and/or data receiver and/or data transducer configured to exchange data with another device. For instance, the other device may be another hearing device configured to be worn at the other ear of the user than hearing device 110 and/or a communication device such as a smartphone, smartwatch, tablet and/or the like. Communication port 119 may be configured for wired and/or wireless data communication. For instance, data may be communicated in accordance with a Bluetooth^™ protocol and/or by any other type of radio frequency (RF) communication.

[0050] Hearing device 110 may also comprise at least one further sensor communicatively coupled to processor 112 in addition to input transducer 115 and displacement sensor 125, as further described below.

[0051] FIG. 2 illustrates a sensor unit 130, which may be implemented in hearing device 110 place of sensor unit 120, comprising input transducer 115 and displacement sensor 125. Sensor unit 130 may further include at least one environmental sensor configured to provide environmental data indicative of a property of the environment of the user in addition to input transducer 115, for example a barometric sensor 131 and/or an ambient temperature sensor 132. Sensor unit 120 may further include at least one physiological sensor configured to provide physiological data indicative of a physiological property of the user, for example an optical sensor 133 and/or a bioelectric sensor 134 and/or a body temperature sensor 135. Optical sensor 133 may be configured to emit the light at a wavelength absorbable by an analyte contained in blood such that the physiological sensor data comprises information about the blood flowing through tissue at the ear. E.g., optical sensor 133 can be configured as a photoplethysmography (PPG) sensor such that the physiological sensor data comprises PPG data, e.g. a PPG waveform. Bioelectric sensor 134 may be implemented as a skin impedance sensor and/or an electrocardiogram (ECG) sensor and/or an electroencephalogram (EEG) sensor and/or an electrooculography (EOG) sensor. Sensor unit 130 may include a user interface 137 configured to provide interaction data indicative of an interaction of the user with hearing device 110, e.g., a touch sensor and/or a push button. Sensor unit 130 may include at least one location sensor 138 configured to provide location data indicative of a current location of the user, for instance a GPS sensor. Sensor unit 130 may include a clock 139 configured to provide time data indicative of a current time.

[0052] In some implementations, the sensor data may be received by processor 112 from an external device via communication port 119, e.g., from a communication device and/or a portable device such as a smartphone. E.g., one or more of sensors 131 - 135, 137 - 139 may then be included in the external device. The external device may also include a further displacement sensor configured to provide displacement data and/or a further input transducer configured to provide an audio signal indicative of a sound detected in the environment of the user.

[0053] FIG. 3 illustrates an exemplary implementation of hearing device 110 as a RIC hearing aid 210. RIC hearing aid 210 comprises a BTE part 220 configured to be worn at an ear at a wearing position behind the ear, and an ITE part 240 configured to be worn at the ear at a wearing position at least partially inside an ear canal of the ear. BTE part 220 comprises a BTE housing 221 configured to be worn behind the ear. BTE housing 221 accommodates processor 112 communicatively coupled to input transducer 115 and displacement sensor 125. BTE part 220 further includes a battery 227 as a power source. ITE part 240 is an earpiece comprising an ITE housing 241 at least partially insertable in the ear canal. ITE housing 241 accommodates output transducer 117 and may also include another sensor 245, which may include, e.g., another input transducer, such as an ear canal microphone, and/or another displacement sensor and/or any of further sensors 131 - 135, 137 - 139. BTE part 220 and ITE part 240 are interconnected by a cable 251. Processor 112 is communicatively coupled to output transducer 117 and sensor 245 of ITE part 240 via cable 251 and cable connectors 252, 253 provided at BTE housing 221 and ITE housing 241.

[0054] FIG. 4 schematically illustrates a media source 410 localized in an environment of a user 450 which may be interested in perceiving sound from media source 410 or may be disinterested. Localized media source 410 comprises at least one sound source 411 emitting sound 413 in a direction 415 toward user 450. As also illustrated, media source 410 may comprise at least another sound source 412 also emitting sound 414 in a direction 416 toward user 450 which may be different from direction 415. A media boundary 421 of media source 410 may be defined as a region in which media content is presented to user 450. The media content comprises sound 413, 414 emitted from sound sources 411, 412. The media content may further comprise visual content which can be seen by user 450.

[0055] For example, the media content may be a TV program. Localized media source 410 may then be implemented as a television comprising a screen and sound source 411, 412, e.g., a loudspeaker, at a fixed position relative to a ground on which user 450 is positioned and/or within boundary 421. As another example, the media content may be a movie shown in a movie theater. Localized media source 410 may then comprise a movie screen on which the movie is displayed and sound source 411, 412 as a loudspeaker at a fixed position relative to the ground on which user 450 is positioned and/or a fixed position within boundary 421. As another example, the media content may be a play performed in a theater or an opera performed in an opera house or a concert performed at a concert venue. Localized media source 410 may then comprise a stage on which the play or opera or concert is performed and sound source 411, 412 may be implemented as a voice and/or a music instrument of an actor or singer performing the play or opera or concert. Sound source 411, 412 may then be positioned at a fixed position and/or moving relative to the ground on which user 450 is positioned and/or within boundary 421.

[0056] User 450 is wearing a hearing device 451 at an ear which may be implemented, for instance, as hearing device 110 or hearing device 210. As also illustrated, user 450 may be wearing a second hearing device 452 at a second ear which may be implemented, for instance, corresponding to first hearing device 451 worn at first ear. Hearing devices 451, 452 may thus be provided in a binaural configuration. Sound 413, 414 emitted by sound source 411, 412 can be received and/or detected by input transducer 115 included in hearing device 451, 452. A movement and/or orientation of hearing device 451, 452 can be detected by displacement sensor 125 included in hearing device 451, 452. The movement and/or orientation of hearing device 451, 452 worn by user 450 can thus be indicative of a movement and/or orientation of user 450 and/or a viewing direction 461 of user 450.

[0057] The displacement, e.g., movement, of hearing device 451, 452 may be detected and/or determined and/or logged, e.g., by displacement sensor 125 and/or processor 112 and/or memory 113, over time and/or relative to a reference point and/or relative to a reference direction. In some implementations, the reference point and/or reference direction may be arbitrary fixed relative to a position and/or orientation of hearing device 451, 452. In some implementations, the reference point and/or reference direction may be a location and/or direction at which localized media source 410, e.g., sound source 411, 412, is positioned, or relative to this location and/or direction. To this end, a location of sound source 411, 412 and/or a direction of arrival (DOA) of sound 413, 414 emitted by sound source 411, 412 may be determined beforehand. E.g., the audio signal provided by input transducer 115 may be evaluated to determine the location of sound source 411, 412 and/or the DOA of sound 413, 414.

[0058] A displacement of hearing device 451, 452 may indicate a translational displacement 465 and/or a rotational displacement 466 of user 450, in particular of the user's head and/or viewing direction 461 and/or body. In a case in which user 450 is interested in perceiving sound 413, 414 from media source 410, user 450 may exert a different movement behavior as compared to a case in which user 450 is not interested. Accordingly, displacement data provided by displacement sensor 125 may be employed as an indicator whether user 450 is interested or not. Based on the displacement data, a parameter indicative of a property of movements performed by the user may be determined as an indicator whether user 450 is interested or not.

[0059] To illustrate, in a case in which media source 410 is configured to also provide for a visual media content, e.g., on a screen or a stage, and in a situation in which user 450 is interested in following the media content from media source 410 including perceiving sound 413, 414 from media source 410, viewing direction 461 of user 450 may be typically more fixed toward the visual media content as compared to a situation in which user 450 is not interested. As another example, in a case in which media source 410 is not providing for visual media content and in a situation in which user 450 is interested in perceiving sound 413, 414 from media source 410, the head of user 450 may be typically more fixed in a certain direction, e.g., to keep his ears and/or hearing device 451, 452 in a stable position, as compared to a situation in which user 450 is not interested. Accordingly, in the situation in which user 450 is interested, displacement data provided by displacement sensor 125 may be indicative of a lower amplitude and/or a lower amount and/or a smaller temporal variability of movements performed by the user as compared to the situation in which user 450 is not interested. Furthermore, e.g., when media source 410 is providing for visual media content, displacement data provided by displacement sensor 125 may be indicative of viewing direction 461 pointing more and/or longer and/or more frequent toward media source 410 as compared to the situation in which user 450 is not interested.

[0060] Furthermore, a prerequisite for user 450 perceiving sound 413, 414 from media source 410 in which he could be interested is that sound detected in the environment comprises such sound 413, 414. Based on the audio signal provided by input transducer 115, a parameter indicative of a property of sound detected in the environment may thus be determined as another indicator whether user 450 could be interested in perceiving sound 413, 414 from media source 410 or not. In particular, the parameter may be indicative of a property which is typical and/or characteristic for sound 413, 414 emitted by media source 410. In some examples, such a property may comprise a level and/or a temporal variability and/or a sound content and/or a frequency content and/or a signal to noise ratio (SNR) of the sound detected in the environment.

[0061] To illustrate, at least certain types of media content which may be presented by media source 410, e.g., a TV program and/or a movie, may exhibit a rather high variability of sound 413, 414. For instance, a content of speech and/or music and/or traffic noise in sound 413, 414 may alter at a rather high rate as compared to other sound in the user's environment unrelated to the media content. Furthermore, a level and/or content and/or SNR of sound 413, 414 may also alter at an extraordinarily high rate when compared to other sound in the user's environment. In particular, the circumstance may be exploited that a variability of sound in the user's environment other than media content is typically rather low and/or stable due to the fact that sound occurring in a normal daily routine of user 450 is typically more constant and doesn't change as fast as media content.

[0062] As another example, the media content presented by media source 410 may comprise sound 413, 414 having a level rather unusual as compared to other sound in the user's environment. E.g., a conversation between people may be exhibit a rather low or high volume when compared to a typical conversation occurring in the other environment of the user. As another example, the media content presented by media source 410 may comprise sound 413, 414 having an SNR rather unusual as compared to other sound in the user's environment. E.g., an SNR of media content may be untypically high when compared to a typical SNR in the user's environment.

[0063] Thus, in order to predict whether user 450 would be interested in perceiving sound 414, 415 from a media source 410 localized in the environment of user 450, a first parameter indicative of a property of the sound detected in the environment and a second parameter indicative of a property of movements performed by user 450 may be compared relative to a threshold. Depending on the comparison, the user's interest may be confirmed or rejected. E.g., when the first parameter is indicative of a variability of the sound detected in the environment and the second parameter is indicative of an amplitude and/or an amount and/or a variability of the movements performed by the user, the user's interest may be identified when the first parameter exceeds a first threshold and the second parameter falls below a second threshold. In a contrary case, when the first parameter falls below the first threshold and/or the second parameter exceeds the second threshold, the user's interest may be denied.

[0064] FIG. 5 illustrates a functional block diagram of an exemplary audio signal processing algorithm that may be executed by a processor 310. For instance, processor 310 may comprise processor 112 of hearing device 110 and/or another processor communicatively coupled to processor 112. As shown, the algorithm is configured to be applied to an input audio signal 302 indicative of a sound detected in the environment of the user, which may be provided by input transducer 115. After a processing of audio signal 302, the algorithm provides a processed audio signal based on which an output audio signal 305 can be outputted by output transducer 117. Furthermore, the algorithm is configured to be applied to displacement data 303 indicative of a displacement of the hearing device, which may be provided by displacement sensor 125. The algorithm may also be configured to be applied to further sensor data 304, which may be provided by any of sensors 115, 131 - 135, 137 - 139.

[0065] The algorithm comprises an audio signal analyzing module 318, a sensor data analyzing module 319, a media interest determination module 315, a processing instruction selection module 317, and an audio processing module 313. Audio signal 302 can be received by audio processing module 313. Audio processing module 313 is configured to process audio signal 302, e.g., based on one or more audio processing instructions provided by processing instruction selection module 317.

[0066] Audio signal 302 can be received by audio signal analyzing module 318. Audio signal analyzer 318 is configured to determine, based on audio signal 302, a first parameter indicative of a property of the sound detected in the environment. In particular, as described above, the parameter may be determined as an indicator whether user 450 could be interested in perceiving sound from a media source localized in the environment. For example, the parameter may be indicative of a temporal variability of the sound detected in the environment. To illustrate, some media sources, e.g., a television program and/or a movie shown in a movie theater and/or a theater play, may emit sound at a large variability of the sound. For instance, the emitted sound may change rather frequently between different sound types and/or sound contents. Different sound types and/or sound content may be defined as sound, when perceived by a human, is typically associated by the human with different acoustic objects emitting the sound. E.g., in a television program and/or a movie, a plurality of such acoustic objects may be presented to the user within a rather short time leading to a rather large variability of the sound emitted by the media source. Examples of such different sound types and/or sound content, which may be emitted by a media source, include speech and/or music and/or environmental sound and/or background noise and/or sound special effects and/or silence. Similarly, a level and/or frequency content and/or a number of onsets and/or, when sound is emitted by the media source from a plurality of sound sources 411, 412, a DOA of the sound at hearing device 110, 210 may change rather frequently.

[0067] As another example, the parameter may be indicative of a level and/or a content and/or an SNR of the sound detected in the environment. Other examples may include, but are not limited to, a mean-squared signal power, a standard deviation of a signal envelope, a mel-frequency cepstrum (MFC), a mel-frequency cepstrum coefficient (MFCC), a delta mel-frequency cepstrum coefficient (delta MFCC), a spectral centroid such as a power spectrum centroid, a standard deviation of the centroid, a spectral entropy such as a power spectrum entropy, a zero crossing rate (ZCR), a standard deviation of the ZCR, a broadband envelope correlation lag and/or peak, and a four-band envelope correlation lag and/or peak.

[0068] In some implementations, audio signal analyzing module 318 comprises an audio signal classifier. The audio signal classifier can be configured to classify audio signal 302 by attributing at least one class from a plurality of predetermined classes to audio signal 302. The first parameter may be indicative of a temporal variability of the one or more classes attributed to the audio signal and/or contain information about the one or more classes attributed to audio signal 302, e.g., whether a class unrelated to the media sound and/or a class associated with the media sound has been attributed. To illustrate, when a localized media source emits sound at a rather large variability, the one or more classes attributed to the audio signal may change rather often. Thus, a variability of the one or more classes attributed to audio signal 302 over time can indicate a sound stemming from a media source, and thus a presence of the media source, localized in the environment of the user. For instance, the audio signal classifier may be implemented as a sound classification module configured for a statistical evaluation of audio signal 302 as disclosed, e.g., in EP 3 036 915 B1, and/or a mixed mode classifier as disclosed, e.g., in EP 1 858 292 B1, and/or a sound source separator configured to separate sound generated by different sound sources in the environment, as disclosed, e.g., in PCT/EP 2020/051 734, PCT/EP 2020/051 735 and DE 2019 206 743.3, which may comprise one or more neural networks (NNs).

[0069] The classes may represent a specific sound content and/or sound type encoded in audio signal 302. Exemplary classes include, but are not limited to, low ambient noise, high ambient noise, traffic noise, machine noise, babble noise, public area noise, background noise, speech, nonspeech, speech in quiet, speech in babble, speech in noise, speech from the user, own voice of the user, speech from a significant other, background speech, speech from multiple sources, quiet indoor, quiet outdoor, speech in a car, speech in traffic, speech in a reverberating environment, speech in wind noise, speech in a lounge, car noise, applause, music, e.g. classical music, and/or the like. Information about the classes may be stored in a database, e.g., in memory 113, and accessed by audio signal analyzer 318. E.g., the information may comprise different patterns associated with each class wherein it is determined whether audio signal 302, in particular characteristics and/or features determined from audio signal 302, matches, at least to a certain extent, the respective pattern such that the respective class can be attributed to the audio signal 302. E.g., a probability may be determined whether the respective pattern associated with the respective class matches the characteristics and/or features determined from audio signal 302, wherein the respective class may be attributed to audio signal 302 when the probability exceeds a threshold. In some instances, at least one of the classes may indicate whether audio signal 302 contains sound from a localized media source, e.g., as a precondition for the user being interested in perceiving the media sound, and/or at least one of the classes may indicate whether audio signal 302 does not contain such sound.

[0070] In some implementations, as illustrated, the parameter determined by audio signal analyzing module 318 can be received by processing instruction selection module 317. Processing instruction selector 317 may select one or more audio processing instructions depending on the parameter which can then be applied to audio signal 302 by audio processing module 313. In some instances, when audio signal analyzing module 318 comprises an audio signal classifier, one or more of the audio processing instructions may be associated with at least one respective class, or a plurality of respective classes. For example, the audio processing instructions may be stored in a database, e.g., in memory 113, and accessed by processing instruction selector 317 and/or audio signal processor 313. For instance, the audio processing instructions may be implemented as different audio processing programs which can be executed by audio signal processing module 313. The audio processing instructions may include, e.g., instructions executable by processor 310 providing for at least one of a gain model (GM), noise cancelling (NC), wind noise cancelling (WNC), reverberation cancelling (RevC), narrowband coupling, feedback cancelling (FC), speech enhancement (SE), noise cleaning, beamforming (BF), in particular static and/or adaptive beamforming, and/or the like.

[0071] In some instances, the audio processing instructions may also include one or more instructions optimized for perceiving sound from a localized media source by the user. For example, the at least one audio processing instruction may provide for a separation of sound from the localized media source from other sound features contained in audio signal 302 such that the separated sound from the localized media source can be presented to user 450 via output transducer 117. As another example, the audio processing instruction may provide for enhancing of the media sound encoded in audio signal 302 relative to other environmental sound encoded in audio signal 302. E.g., the audio processing instruction may provide for noise reduction of the other environmental sound. As another example, the audio processing instruction may provide for enhancing an intelligibility of speech encoded in the audio signal 302. As another example, the audio processing instructions may provide for enhancing a quality of sound encoded in the audio signal 302. For instance, the quality of sound may be improved with regard to a clarity of the sound, e.g., be increasing a sharpness of the sound, and/or with regard to a listening comfort of the sound, e.g., by modifying the sound to be more pleasing and/or less aggressive.

[0072] Displacement data 303 can be received by sensor data analyzing module 319. Sensor data analyzer 319 is configured to determine, based on displacement data 303, a second parameter indicative of a property of movements performed by the user. In particular, as described above, the parameter may be determined as another indicator whether user 450 could be interested in perceiving sound from a media source localized in the environment. e.g., an amplitude and/or an amount and/or a temporal variability of movements performed by the user. As another example, the parameter may be indicative of a viewing direction of user relative to the location of the media source. E.g., the location of the media source may be determined based on a DOA of the sound of the media source contained in audio signal 302.

[0073] In some implementations, sensor data analyzing module 319 comprises a displacement data classifier. The displacement data classifier can be configured to classify displacement data 303 by attributing at least one class from a plurality of predetermined classes to displacement data 303. The classes may represent a specific movement pattern performed by the user. Exemplary classes include, but are not limited to, the user sitting, lying, walking, running, turning his head, shaking his head, orienting his head in a specific direction, moving in a specific direction, moving steady, moving irregularly, being in a sedentary position, being restless, and/or the like. Information about the classes may be stored in a database, e.g., in memory 113, and accessed by sensor data analyzer 319. E.g., a probability may be determined whether the respective pattern associated with the respective class matches a characteristic and/or feature determined from displacement data 303, wherein the respective class may be attributed to displacement data 303 when the probability exceeds a threshold.

[0074] In some instances, at least one of the classes may indicate whether displacement data 303 is typical for the user being interested in perceiving sound from a localized media source and/or at least one of the classes may indicate whether the displacement data is typical for the user not being interested. E.g., a rather motionless behavior of the user and/or movements of small amplitude and/or a head orientation toward a media source may be attributed to the class of the user being interested. Rather frequent movements and/or movements of large amplitude and/or a large amount of movements may be attributed to the class of the user not being interested.

[0075] In some instances, sensor data analyzing module 319 can be configured to determine, based on displacement data 303, a movement behavior of the user over time. E.g., the movement behavior may include a type and/or sequence and/or rate and/or amplitude and/or duration and/or lack of movements performed by the user. The second parameter may then be indicative of the user's movement behavior. For example, sensor data analyzing module 319 may be configured to log displacement data 303 over time and to extract a type and/or sequence and/or lack of movements performed by the user over time from the logged displacement data 303. As another example, sensor data analyzing module 319 may be configured to determine a type and/or sequence and/or lack of movements performed by the user from currently received displacement data 303 and to log the determined movement characteristics over time. E.g., for the purpose of the data logging of the user's movement behavior over time, the displacement data 303 and/or movement characteristics may be stored and/or accessed in memory 113.

[0076] In some implementations, further sensor data 304, which may be provided, e.g., by any of sensors 115, 131 - 135, 137 - 139, may also be received by sensor data analyzing module 319. Sensor data analyzer 319 may then be configured to determine, based on sensor data 304, a third parameter. For example, sensor data 304 may be provided by any of environmental sensors 115, 131, 132. The parameter may then be indicative of whether the environment of the user is suitable and/or typical for perceiving sound from a media source localized in the environment, e.g., for a media source being localized in the environment, or not. To illustrate, when audio signal 302 provided by input transducer 115 would include traffic sound and/or when barometric data provided by barometric sensor would indicate a rather high altitude and/or when ambient temperature sensor 132 would indicate a rather hot environment it may be concluded that the user's interest in perceiving sound from a localized media source is rather small.

[0077] As another example, sensor data 304 may be provided by any of physiological sensors 133 - 135. The parameter may then be indicative of whether a physiological condition of the user is suitable and/or typical for perceiving sound from a localized media source, or not. To illustrate, when physiological data provided by optical sensor 133, e.g., a PPG sensor, and/or bioelectric sensor, e.g., an ECG sensor, would indicate a rather high heart rate of the user and/or body temperature sensor 135 would indicate a rather elevated temperature, it may be concluded that the user is not interested in perceiving sound from a localized media source, e.g., because he's involved in a sports activity. As a further example, sensor data 304 may be provided by location sensor 138 and/or clock 139. E.g., a current location and/or time may be typical or rather unusual for the user being interested in perceiving sound from a localized media source. As a further example, sensor data 304 may be provided by user interface 137. E.g., some adjustments of hearing device 110, 210 performed by the user on the user interface may be typical or rather unusual for the user being interested in perceiving sound from a localized media source.

[0078] Media interest determination module 315 is configured to receive the first parameter determined by audio signal analyzer 318 indicative of a property of sound detected in the environment, and the second parameter determined by sensor data analyzer 319 indicative of a property of movements performed by the user. In some instances, when sensor data analyzer 319 is configured to determine, based on sensor data 304, a third parameter, media interest determinator 315 may also be configured to receive the third parameter. Media interest determinator 315 is configured to determine, based on the first and second parameter, and optionally also based on the third parameter, whether the user is interested in perceiving sound from a media source localized in the environment. In particular, media interest determinator 315 may be configured to determine whether the first parameter and the second parameter fulfill a condition as a requirement for concluding and/or predicting that the user could be interested in perceiving sound from the localized media source. In some instances, a further requirement may be that the third parameter fulfills such a condition.

[0079] In some instances, the condition may be determined relative to a threshold for at least one of the parameters, e.g., a first threshold for the first parameter and/or a second threshold for the second parameter and/or a third threshold for the third parameter. E.g., the condition may be determined to be fulfilled when the first parameter exceeds a first threshold and the second parameter falls below a second threshold. To illustrate, when the first parameter is indicative of a temporal variability of sound detected in the environment, the first parameter exceeding the threshold may indicate a rather large variability of the sound suggesting that the sound may originate from such a localized media source. When the second parameter is indicative of an amplitude and/or an amount and/or a temporal variability of the movements performed by the user, the second parameter falling below the threshold may further indicate a rather small amplitude and/or amount and/or temporal variability of the movements suggesting that the user has the intention to dedicate his attention to the sound and/or other media content, e.g., visual content, originating from the media source. Accordingly, evaluating the first parameter relative to the first threshold and the second parameter relative to the second threshold can allow to conclude and/or predict an interest of the user in perceiving sound from a localized media source with a higher certainty as compared to only evaluating one of the parameters. In some examples, the certainty of such a prediction may be further enhanced by also evaluating the third parameter, e.g., relative to a third threshold.

[0080] In some instances, the condition may be determined with respect to a content of the sound detected in the environment, e.g., whether the content is characteristic of a predetermined media sound. In particular, the first parameter may be indicative of a content of the sound detected in the environment. Media interest determination module 315 may then be configured to determine whether the content of the environmental sound matches the predetermined media sound. E.g., the predetermined media sound may be provided as a sound pattern which can be compared to the environmental sound content. When the environmental sound content matches the sound pattern, at least part of the condition for the user being interested in perceiving the media sound may be concluded and/or predicted to be fulfilled.

[0081] In some instances, media interest determination module 315 can be configured to execute a machine learning (ML) algorithm configured to predict and/or indicate and/or output a likelihood whether the first parameter indicative of a content of the sound detected in the environment matches the predetermined media sound, e.g., in the form of a sound pattern indicative of a localized media source. In particular, the ML algorithm may be trained with previously recorded media sound of such a localized media source. For instance, one or more NNs may be employed configured to provide for a separation of sound emanated from a localized media source from other content and/or sound components contained in audio signal 302. The first parameter may then be indicative of the separated sound received from the localized media source and/or about the circumstance whether such media sound is present in audio signal 302. Examples of such NNs, which may be implemented as one or more deep neural networks (DNNs), configured to separate content and/or sound components stemming from different acoustic objects from audio signal 302 are disclosed in international patent application Nos. PCT/EP 2020/051 734 and PCT/EP 2020/051 735, and in German patent application No. DE 2019 206 743.3.

[0082] In some instances, when audio signal analyzing module 318 comprises an audio signal classifier and the first parameter is indicative of a temporal variability of the one or more classes attributed to audio signal 302 and/or contains information about the one or more classes attributed to audio signal 302, media interest determination module 315 can be configured to determine whether the first parameter fulfills a condition of the one or more classes attributed to audio signal 302 indicating that audio signal 302 contains sound and/or sound components from a localized media source in which the user could be interested.

[0083] For example, when the first parameter is indicative of a temporal variability of the one or more classes attributed to audio signal 302, the condition may be evaluated relative to a threshold. To illustrate, an indicator for sound and/or sound components from a localized media source contained in audio signal 302 can be that the variability of the one or more classes attributed to audio signal 302 exceeds the threshold. In particular, the threshold may be exceeded when the one or more classes attributed to audio signal 302 change rather often. In this way the circumstance may be exploited that other sound in the user's environment, in particular sound unrelated to a localized media source, may typically result in a more steady attribution of the one or more classes to audio signal 302 and/or a variability of the one or more classes attributed to audio signal 302 falling below the threshold. Thus, the first parameter exceeding the threshold may be taken as a condition for sound and/or sound components from a localized media source contained in audio signal 302 in which the user could be interested.

[0084] As another example, when the first parameter contains information about the one or more classes attributed to audio signal 302, the condition may be evaluated relative to whether the information is characteristic of a predetermined media sound which may be characteristic of sound and/or sound components from a localized media source. E.g., the information may comprise a label and/or identifier and/or other characteristic of the one or more classes attributed to audio signal 302. The condition may be determined to be fulfilled when it is determined that the information is characteristic of the predetermined media sound. To illustrate, some classes which may be attributed to audio signal 302, e.g., quiet indoor, quiet outdoor, speech in a reverberating environment, speech in noise, speech from the user, own voice of the user, and/or the like, may be less characteristic of sound and/or sound components from a localized media source as compared to other classes, e.g., public area noise, speech, nonspeech, speech in quiet, public area noise, applause, music and/or the like. Accordingly, the condition may be deemed to be fulfilled when the information yields that at least one of the classes which are more characteristic for sound and/or sound components from the localized media source has been attributed to audio signal 302.

[0085] In some instances, when the second parameter is indicative of a movement behavior of the user over time, the condition may be determined with respect to whether the movement behavior is characteristic for the user being interested in perceiving sound from a localized media source. E.g., media interest determination module 315 may be configured to determine whether the user's movement behavior matches a predetermined movement pattern. The movement pattern may be indicative of a movement behavior of the user, e.g., a type and/or sequence and/or rate and/or amplitude and/or duration and/or lack of movements performed by the user, which are typical for the user being interested in perceiving sound from the localized media source. When the movement behavior matches the movement pattern, at least part of the condition for the user being interested in perceiving the media sound may be concluded and/or predicted to be fulfilled. For instance, media interest determination module 315 can be configured to execute an ML algorithm configured to predict and/or indicate and/or output a likelihood whether the second parameter matches the movement pattern. In particular, the ML algorithm may be trained with previously recorded movement behaviors of the user and/or other users over time. The training data may be labelled with regard to whether the user has been interested or uninterested in perceiving sound from a localized media source when executing the respective movement behavior.

[0086] In some instances, when a third parameter is provided by sensor data analyzer 319 based on sensor data 304 and received by media interest determination module 315, the condition may be also determined to be fulfilled by the third parameter. E.g., the third parameter may be evaluated with respect to an environmental condition to be fulfilled by environmental sensor data provided by any of environmental sensors 115, 131, 132 and/or a physiological condition to be fulfilled by physiological sensor data provided by any of physiological sensors 133 - 135 and/or with respect to a location and/or time and/or adjustments via user interface 137 indicating the user's interest in perceiving sound from a localized media source.

[0087] When it is determined, by media interest determination module 315, that the user is interested in perceiving sound from a localized media source, media interest determination module 315 can be configured to initiate an operation of hearing device 110, 210 optimizing the processing of audio signal 302, as performed by audio processing module 313, for perceiving sound from the localized media source. In particular, media interest determination module 315 may be configured to provide one or more instructions for the optimizing of the processing of audio signal 302 to processing instruction selection module 317.

[0088] In some instances, the operation for optimizing the processing of audio signal 302 comprises reducing a rate at which different audio processing instructions are applied to audio signal 302. In particular, some media sources may emit sound at a large variability of the sound. For instance, the emitted sound may change rather frequently between different sound types and/or sound contents, e.g., speech and/or music and/or environmental sound and/or background noise and/or sound special effects and/or silence. Similarly, a level and/or frequency content and/or a number of onsets and/or, when sound is emitted by the media source from a plurality of sound sources 411, 412, a DOA of the sound at hearing device 110, 210 may change rather frequently. In a standard operation of hearing device 110, 210, the large variability of the detected sound may lead to a frequent change of the audio processing instructions are applied to audio signal 302.

[0089] To illustrate, some audio processing instructions may be optimized for a reproduction of speech encoded in audio signal 302, other audio processing instructions may be optimized for a reproduction of music encoded in audio signal 302, still other audio processing instructions may be optimized for noise reduction. When the sound type and/or sound content in audio signal 302 changes frequently, e.g., between music and/or speech and/or background noise, the applied audio processing instructions optimized for the respective sound type and/or sound content may change accordingly. Such a frequent switching between different audio processing instructions applied to audio signal 302 can be rather disturbing for the user, e.g., due to a varying sound reproduction and/or processing delays and/or sound artefacts caused by the switching. Thus, reducing the rate at which different audio processing instructions are applied to audio signal 302, can optimize the processing of audio signal 302 for perceiving sound from a localized media source.

[0090] To further illustrate, when audio signal analyzing module 318 comprises an audio signal classifier, audio signal 302 may be processed by audio signal processor 318 by applying one or more audio processing instructions associated with the one or more classes attributed to audio signal 302 by the audio signal classifier. To this end, in a standard operation of hearing device 110, 210, processing instruction selection module 317 may be configured to select the audio processing instructions applied by audio signal processor 318 depending on the classification performed by the audio signal classifier. In a case in which sound encoded in audio signal 302 has a rather large variability, however, the one or more classes attributed to audio signal 302 may change rather frequent leading to a frequent change of the applied audio processing instructions. In order to overcome the negative side effects of such a frequent change of the audio processing, when it is determined that the user is interested in perceiving sound from a localized media source, media interest determination module 315 can provide instructions to processing instruction selection module 317 to reduce the rate at which different audio processing instructions are applied to audio signal 302. The instructions may include, e.g., to apply currently applied audio processing instructions for a minimum time even if a different class has been attributed to audio signal 302 by the audio signal classifier. The instructions may also include, e.g., to only select one or more audio processing instructions associated with one of the classes attributed to audio signal 302 to be applied to audio signal 302 which are most appropriate for reproducing sound from the localized media source. For instance, one or more audio processing instructions for enhancing an intelligibility of speech encoded in audio signal 302 may then be selected to be applied to audio signal 302, e.g., under the presumption that the user is mostly interested in comprehending a speech content presented by the media source, even if speech content and music content would be reproduced by the media source.

[0091] In some instances, the operation for optimizing the processing of audio signal 302 comprises disabling applying of at least one audio processing instruction which is unsuitable for perceiving sound from the localized media source to audio signal 302. To illustrate, some audio processing instructions, which may be applied to audio signal 302 in a standard operation of hearing device 110, 210, may be unsuitable for reproducing sound from a localized media source and/or may affect aversely a desired perception of such sound for the user. For instance, some media sources, e.g., a television program and/or a movie shown in a movie theater, may reproduce sound features such as traffic noise, machine noise, speech in babble, and/or the like as part of the media content, e.g., to provide for a desired sound ambience and/or for entertainment purposes. In such a case, it may be undesirable to process the media content encoded in audio signal 302 equivalently to a corresponding content which does not originate from a media source but from another sound source in the user's environment. Accordingly, when it is determined that the user is interested in perceiving sound from a localized media source, media interest determination module 315 can provide instructions to processing instruction selection module 317 to disable a selection of at least one audio processing instruction which is unsuitable for perceiving sound from the localized media source when applied to audio signal 302. E.g., audio processing instructions usually employed for a noise reduction of environmental sound may then be disabled, e.g., to avoid an undesired influence and/or distortion of the media content.

[0092] In some instances, the operation for optimizing the processing of audio signal 302 comprises applying at least one audio processing instruction optimized for perceiving sound from the localized media source to audio signal 302. In some examples, at least one audio processing instruction may be provided which is uniquely applicable to audio signal 302 when it is determined that the user is interested in perceiving sound from a localized media source. E.g., the at least one audio processing instruction may not be associated with one or more classes attributed to audio signal 302 by an audio signal classifier included in audio signal analyzing module 318. Accordingly, when it is determined that the user is interested in perceiving sound from the localized media source, media interest determination module 315 can provide instructions to processing instruction selection module 317 to select the at least one audio processing instruction to be applied to audio signal 302.

[0093] In some instances, the at least one audio processing instruction optimized for perceiving sound from the localized media source may provide for at least one of enhancing an intelligibility of speech encoded in the audio signal, in particular speech presented from the localized media source; enhancing a quality of sound encoded in the audio signal; enhancing sound from the media source encoded in the audio signal relative to other environmental sound encoded in the audio signal; and separating sound from the media source encoded in the audio signal from other sound encoded in the audio signal. E.g., the at least one audio processing instruction may provide for noise reduction of the other environmental sound and/or improve the quality of sound with regard to a clarity of the reproduced sound and/or with regard to a listening comfort for the user when perceiving the sound.

[0094] FIG. 6 illustrates a block flow diagram for an exemplary method of processing input audio signal 302. The method may be executed by processor 112, 310 of hearing device 110, 210 and/or another processor communicatively coupled to processor 112, 310. At operation S12, after receiving audio signal 302, which may be provided by input transducer 115, a processing of audio signal 302 is performed by applying one or more audio processing instructions to audio signal 302. Based on the processed audio signal, an output audio signal 305 is provided which can be output by output transducer 117 so as to stimulate the user's hearing.

[0095] At operation S 13, after receiving audio signal 302 and displacement data 303, which may be provided by displacement sensor 125, it is determined whether the user is interested in perceiving sound from a media source localized in the environment. Operation S13 may be performed independently and/or in parallel to the audio processing performed at S12. In some implementations, further sensor data 304 may be received at S13 to determine whether the user is interested. In a case in which it is determined that the user is interested, operation S14 is executed. At S14, an operation optimizing the processing of audio signal 302 for perceiving sound from the localized media source is initiated, which can then be applied in the audio processing at S12.

[0096] FIG. 7 illustrates a block flow diagram of an exemplary implementation of the method illustrated in FIG. 6. In this implementation, operation S13 is replaced by operations S22 and S23. At S22, after receiving audio signal 302, audio signal 302 is classified by attributing at least one class from a plurality of classes to audio signal 302, wherein different audio processing instructions are associated with different classes. The at least one audio processing instruction associated with the at least one class attributed to audio signal 302 can then be applied in the audio processing at S12.

[0097] At operation S23, after receiving displacement data 303, it is determined whether the user is interested in perceiving sound from a media source localized in the environment. The determining whether the user is interested can thus be based on displacement data 303 and the at least one class which has been attributed to audio signal 302 at S22. E.g., the determining whether the user is interested may be determined depending on a temporal variability of the attribution of the at least one class to audio signal 302 and/or whether the at least one class attributed to audio signal 302 is characteristic of a predetermined media sound. In some instances, as illustrated, audio signal 302 may be further employed at S23 for the determining whether the user is interested. E.g., in addition to the information about the at least one class which has been attributed to audio signal 302 at S22, the determining whether the user is interested may be based on another characteristic of audio signal 302, e.g., a level and/or a SNR and/or a frequency content of audio signal 302.

[0098] While the principles of the disclosure have been described above in connection with specific devices and methods, it is to be clearly understood that this description is made only by way of example and not as limitation on the scope of the invention. The above described preferred embodiments are intended to illustrate the principles of the invention, but not to limit the scope of the invention. Various other embodiments and modifications to those preferred embodiments may be made by those skilled in the art without departing from the scope of the present invention that is solely defined by the claims. In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. A single processor or controller or other unit may fulfil the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. Any reference signs in the claims should not be construed as limiting the scope.

Claims

1. A method of operating a hearing device configured to be worn at an ear of a user, the method comprising

- receiving, from an input transducer (115) included in the hearing device, an audio signal (302) indicative of a sound detected in the environment of the user;

- receiving, from a displacement sensor (125) included in the hearing device, displacement data (303) indicative of a displacement of the hearing device;

- processing the audio signal (302) by applying one or more audio processing instructions to the audio signal (302); and

- initiating outputting, by an output transducer (117) included in the hearing device, an output audio signal (305) based on the processed audio signal so as to stimulate the user's hearing;
characterized by

- determining, based on the audio signal (302) and the displacement data (303), whether the user is interested in perceiving sound from a media source (410) localized in the environment; and, when it is determined that the user is interested,

- initiating an operation of the hearing device optimizing said processing of the audio signal (302) for perceiving sound from the localized media source (410).

2. The method of claim 1, further comprising

- determining, based on the audio signal (302), a first parameter indicative of a property of the sound detected in the environment;

- determining, based on the displacement data (303), a second parameter indicative of a property of movements performed by the user,

wherein said determining whether the user is interested in perceiving sound from the localized media source (410) is based on the first parameter and the second parameter.

3. The method of claim 2, wherein the first parameter is indicative of a variability of the sound detected in the environment, wherein said determining whether the user is interested in perceiving sound from the localized media source (410) includes a condition that the first parameter exceeds a threshold.

4. The method of claim 2 or 3, wherein the second parameter is indicative of an amplitude and/or an amount and/or a variability of the movements performed by the user, wherein said determining whether the user is interested in perceiving sound from the localized media source (410) includes a condition that the second parameter falls below a threshold.

5. The method of any of claims 2 to 4, wherein the first parameter is indicative of a sound content encoded in the audio signal (302), and said determining whether the user is interested in perceiving sound from the localized media source (410) includes a condition that the first parameter is characteristic of a predetermined media sound.

6. The method of any of the preceding claims, further comprising

- classifying the audio signal (302) by attributing at least one class from a plurality of predetermined classes to the audio signal (302),

wherein said determining whether the user is interested in perceiving sound from the localized media source (410) is based on the at least one class attributed to the audio signal (302).

7. The method of any of claims 2 to 5 and claim 6, wherein the first parameter is indicative of a variability of the at least one class attributed to the audio signal (302) and/or whether the at least one class attributed to the audio signal (302) is characteristic of a predetermined media sound.

8. The method of any of the preceding claims, further comprising

- classifying the displacement data (303) by attributing at least one class from a plurality of predetermined classes to the displacement data (303),

wherein said determining whether the user is interested in perceiving sound from the localized media source (410) is based on the at least one class attributed to the displacement data (303).

9. The method of any of claims 2 to 7 and claim 8, wherein the second parameter is indicative of a variability of the at least one class attributed to the displacement data (303) and/or whether the at least one class attributed to the displacement data (303) is characteristic of a predetermined movement pattern of the user when focusing his attention to the localized media source (410).

10. The method of any of the preceding claims, further comprising

- classifying the audio signal (302) by attributing at least one class from a plurality of predetermined classes to the audio signal (302), wherein different audio processing instructions are associated with different classes; and

- processing the audio signal (302) by applying the at least one audio processing instruction associated with the class attributed to the audio signal (302).

11. The method of any of the preceding claims, wherein the operation of the hearing device optimizing the processing of the audio signal (302) comprises at least one of

- reducing a rate at which different audio processing instructions are applied to the audio signal (302),

- disabling applying of at least one audio processing instruction which is unsuitable for perceiving sound from the localized media source (410) to the audio signal (302), and

- applying at least one audio processing instruction optimized for perceiving sound from the localized media source (410) to the audio signal (302).

12. The method claim 11, wherein the at least one audio processing instruction optimized for perceiving sound from the localized media source (410) comprises at least one of

- enhancing an intelligibility of speech encoded in the audio signal (302);

- enhancing a quality of sound encoded in the audio signal (302);

- enhancing sound from the media source (410) encoded in the audio signal (302) relative to other environmental sound encoded in the audio signal (302); and

- separating sound from the media source (410) encoded in the audio signal (302) from other sound encoded in the audio signal (302).

13. The method of any of the preceding claims, wherein the media source (410) is configured to provide for a visual content.

14. The method of claim 13, wherein the media source (410) comprises a screen for displaying the visual content.

15. A hearing device configured to be worn at an ear of a user, the hearing device comprising

- an input transducer (115) configured to provide an audio signal (302) indicative of a sound detected in the environment of the user;

- a displacement sensor (125) configured to provide displacement data (303) indicative of a displacement of the hearing device;

- a processor (112, 310) configured to process the audio signal (302) by applying one or more audio processing instructions to the audio signal (302); and

- an output transducer (117) configured to output an output audio signal (305) based on the processed audio signal so as to stimulate the user's hearing,
characterized in that the processor (112, 310) is further configured to

- determine, based on the audio signal (302) and the displacement data (303), whether the user is interested in perceiving sound from a media source (410) localized in the environment; and, when it is determined that the user is interested,

- initiate an operation of the hearing device optimizing said processing of the audio signal (302) for perceiving sound from the localized media source (410).

Drawing

Search report

Search report

Cited references

REFERENCES CITED IN THE DESCRIPTION

This list of references cited by the applicant is for the reader's convenience only. It does not form part of the European patent document. Even though great care has been taken in compiling the references, errors or omissions cannot be excluded and the EPO disclaims all liability in this regard.

Patent documents cited in the description

EP3036915B1 [0007] [0068]
EP2020051734W [0007] [0068] [0081]
EP2020051735W [0007] [0068] [0081]
DE2019206743 [0007] [0068] [0081]
EP1858292B1 [0008] [0068]
EP2201793B1 [0009]