(19)
(11) EP 4 404 584 A1

(12) EUROPEAN PATENT APPLICATION

(43) Date of publication:
24.07.2024 Bulletin 2024/30

(21) Application number: 23152475.2

(22) Date of filing: 19.01.2023
(51) International Patent Classification (IPC): 
H04R 1/10(2006.01)
(52) Cooperative Patent Classification (CPC):
H04R 2460/15; H04R 1/1016; H04R 1/1008
(84) Designated Contracting States:
AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC ME MK MT NL NO PL PT RO RS SE SI SK SM TR
Designated Extension States:
BA
Designated Validation States:
KH MA MD TN

(71) Applicant: Nokia Technologies Oy
02610 Espoo (FI)

(72) Inventors:
  • DEMIREL, Berken Utku
    Kirsehir (TR)
  • AL-NAIMI, Khaldoon
    Beaconsfield (GB)
  • MONTANARI, Alessandro
    Cambridge (GB)

(74) Representative: Nokia EPO representatives 
Nokia Technologies Oy Karakaari 7
02610 Espoo
02610 Espoo (FI)

   


(54) APPARATUS, METHODS AND COMPUTER PROGRAMS FOR ANALYZING EARPHONE SEALING


(57) Examples of the disclosure relate to apparatus, methods and computer programs for analyzing earphone sealing. This can be used to determine a quality of sealing between the earphone and a user's ear. In examples of the disclosure an apparatus can be configured to compare a microphone signal obtained from at least one microphone located in an earphone with a reference microphone signal. The reference microphone signal can be obtained using a model of the at least one microphone and at least one speaker so as to obtain a correlation between the obtained microphone signal and the reference microphone signal. The apparatus can also be configured to process the correlation between the obtained microphone signal and the reference microphone signal to determine a quality of sealing between the earphone and a user's ear.


Description

TECHNOLOGICAL FIELD



[0001] Examples of the disclosure relate to apparatus, methods and computer programs for analyzing earphone sealing. Some relate to apparatus, methods and computer programs for determining a quality of sealing between the earphone and a user's ear.

BACKGROUND



[0002] Earphones are audio output devices that are configured to be worn in or on a user's ear. When a user is wearing an earphone a seal, or partial seal, can be created between the earphone and the user's ear. The quality of this seal and/or the air leakage through this seal can affect the sound level within the user's ear canal. This can affect measurements made by sensors in the earphones, audio signal quality, pressure levels within the user's ear and/or other functions of the earphones.

BRIEF SUMMARY



[0003] According to various, but not necessarily all, examples of the disclosure there may be provided an apparatus comprising means for:

comparing a microphone signal obtained from at least one microphone located in an earphone with a reference microphone signal obtained using a model of the at least one microphone and at least one speaker so as to obtain a correlation between the obtained microphone signal and the reference microphone signal;

processing the correlation between the obtained microphone signal and the reference microphone signal to determine a quality of sealing between the earphone and a user's ear.



[0004] The model of the at least one microphone and at least one speaker may be trained with the at least one microphone and the at least one speaker not positioned in a user's ear.

[0005] The model of the at least one microphone and at least one speaker may comprise a non-linear model.

[0006] A machine learning program may be used to model the at least one microphone and at least one speaker.

[0007] The processing of the correlation may determines two or more parameters and the two or more parameters may be used to determine the quality of sealing between the earphone and a user's ear.

[0008] The means may be for estimating a sealing coefficient by normalizing the correlation wherein the sealing coefficient is used to determine the quality of sealing between the earphone and the user's ear.

[0009] The means may be for calculating a proportion of energy in a low frequency range of the obtained microphone signal.

[0010] The means may be for estimating a correlation coefficient between the obtained microphone signal and the reference microphone signal.

[0011] The microphone signal obtained from at least one microphone may comprise at least one of; external noise, reflections, distortions from at least part of an ear.

[0012] The earphone may be configured to be worn inside a user's ear.

[0013] The earphone may be configured to be worn over a user's ear.

[0014] According to various, but not necessarily all, examples of the disclosure there may be provided an earphone comprising an apparatus as described herein.

[0015] According to various, but not necessarily all, examples of the disclosure there may be provided a method comprising:

comparing a microphone signal obtained from at least one microphone located in an earphone with a reference microphone signal obtained using a model of the at least one microphone and at least one speaker so as to obtain a correlation between the obtained microphone signal and the reference microphone signal;

processing the correlation between the obtained microphone signal and the reference microphone signal to determine a quality of sealing between the earphone and a user's ear.



[0016] According to various, but not necessarily all, examples of the disclosure there may be provided a computer program comprising instructions which, when executed by an apparatus, cause the apparatus to perform:

comparing a microphone signal obtained from at least one microphone located in an earphone with a reference microphone signal obtained using a model of the at least one microphone and at least one speaker so as to obtain a correlation between the obtained microphone signal and the reference microphone signal;

processing the correlation between the obtained microphone signal and the reference microphone signal to determine a quality of sealing between the earphone and a user's ear.



[0017] While the above examples of the disclosure and optional features are described separately, it is to be understood that their provision in all possible combinations and permutations is contained within the disclosure. It is to be understood that various examples of the disclosure can comprise any or all of the features described in respect of other examples of the disclosure, and vice versa. Also, it is to be appreciated that any one or more or all of the features, in any combination, may be implemented by/comprised in/performable by an apparatus, a method, and/or computer program instructions as desired, and as appropriate.

BRIEF DESCRIPTION



[0018] Some examples will now be described with reference to the accompanying drawings in which:

FIG. 1 shows an example system;

FIG. 2 shows an example method;

FIG. 3 shows training of an example model;

FIG. 4 shows deployment of an example model;

FIG. 5 shows results obtained for examples of the disclosure;

FIG. 6 shows results obtained for examples of the disclosure;

FIG. 7 shows results obtained for examples of the disclosure;

FIG. 8 shows example earphones according to examples of the disclosure; and

FIG. 9 shows an apparatus according to examples of the disclosure.



[0019] The figures are not necessarily to scale. Certain features and views of the figures can be shown schematically or exaggerated in scale in the interest of clarity and conciseness. For example, the dimensions of some elements in the figures can be exaggerated relative to other elements to aid explication. Corresponding reference numerals are used in the figures to designate corresponding features. For clarity, all reference numerals are not necessarily displayed in all figures.

DETAILED DESCRIPTION



[0020] Earphones can be worn by a user to provide audio signals to a user's ears. The earphones can be worn in a user's ear, for example the earphones could comprise ear bud devices. In some examples the earphones could be worn over a user's ear. For instance, they could comprise head phones or headsets.

[0021] In some examples the earphones can also comprise additional sensors that can be configured to monitor one or more characteristics of the user. For example, the earphones could comprise a microphone, vibration sensors, infrared sensors, air pressure sensors, motion sensors or other suitable types of sensors. These sensors can be used for monitoring health characteristics of a user, identifying a user, identifying activities or actions of a user or for any other suitable purpose.

[0022] The earphone can form a seal with the user's ear. The quality of this seal and/or the air leakage through this seal can affect the sound level within the user's ear canal. This can affect measurements made by sensors in the earphones, audio signal quality, pressure levels within the user's ear and/or other functions of the earphones. It is therefore useful to determine the quality of the seal between the earphone and the user's ear.

[0023] Examples of the disclosure provide a reliable method for determining the quality of the seal.

[0024] Fig. 1 schematically shows an example system 101 that can be used to implement some examples of the disclosure. The system 101 comprises an earphone 103 and an apparatus 105. Only components that are referred to in the following description are shown in Fig. 1. The system 101 could comprise other components in other examples.

[0025] The earphone 103 can be a device that is configured to be worn by a user 111. The earphone 103 can be configured to be worn in or over a user's ear. This can enable the earphone 103 to be used to provide acoustic signals to the user 111. For example, the earphone 103 can be used for the playback of audio content.

[0026] The earphone 103 comprises at least one speaker 107 and at least one microphone 109. The earphone 103 could comprise additional components in some examples.

[0027] The speakers 107 can comprise any means that can be configured to generate an acoustic signal. The speakers 107 can be configured to convert an electrical input signal into an output acoustic signal. The speakers 107 can be positioned within the earphone 103 so that when a user 111 is using the earphone the speaker 107 is positioned in, or close to, the user's ear.

[0028] The microphones 109 can comprise any means that can be configured to detect acoustic signals. The microphones 109 can be configured to detect acoustic signals and convert the acoustic signals into an output electric signal. The microphones 109 therefore provide microphone signals as an output. The microphone signals can comprise audio signals.

[0029] The microphones 109 can be positioned within the earphones so that they can detect acoustic signals generated by the speaker 107 and/or reflections and reverberations of the acoustic signals.

[0030] In the system 101 of Fig. 1 the earphone 103 and the apparatus 105 are shown as separate devices. For example, the apparatus 105 could be provided within a different device to the earphone 103. Example devices that could comprise the apparatus 105 could be a personal device, such as a mobile phone, belonging to the user 111 or any other suitable device. The apparatus 105 can be configured to communicate with the earphone via a wireless communication link or by any other suitable means.

[0031] In some examples the apparatus 105 could be comprised within the earphone. For example, the earphone could be part of a headset or other wearable device that comprises an apparatus 105.

[0032] The apparatus 105 can comprise a controller comprising a processor and memory. Examples of an apparatus 105 are shown in Fig. 9. The apparatus 105 can be configured to process microphone signals or to perform any other suitable function.

[0033] As mentioned above the system 101 could comprise additional components that are not shown in Fig. 1 in some examples of the disclosure. For instance, the system 101 could comprise one or more sensors within the earphone 103. The sensors could comprise any means that are configured to detect a physical characteristic of a user 111. The sensors could be located in any suitable position within the earphone 103.

[0034] In some examples the sensors could be positioned so that, when the earphone 103 is in use, the sensors are located within the ear of the user 111. The sensors could comprise microphones, vibration sensors, infrared sensors, air pressure sensors, motion sensors or other suitable types of sensors. These sensors can be used for monitoring health characteristics of a user, identifying a user, identifying activities or actions of a user, or for any other suitable purpose.

[0035] In the example of Fig. 1 only one earphone 103 is shown. In some examples there could be two earphones 103 so that one is provided for each ear of the user 111.

[0036] Fig. 2 shows an example method. The method could be implemented using the system of Fig.1 or any other suitable system.

[0037] The method comprises, at block 201, comparing a microphone signal obtained from at least one microphone with a reference microphone signal. The microphone signal can be obtained from a microphone 109 within an earphone 103. The reference microphone signal can be obtained using a model of the microphone 109 and the speaker 107. The reference microphone signal is an estimated microphone signal rather than one that has been captured using a microphone.

[0038] The microphone that is used to obtain the microphone signal can be positioned within the earphone so that it is facing inwards. The microphone 109 could be an in ear microphone 109. The microphone signal obtained from the microphone 109 comprises external noise, and also reflections and/or distortions from at least part of an ear. The mount of external noise in the microphone signal will be increased if the seal is of poor quality. The part of the ear that causes the reflections and/or distortions will be determined by the type of earphones used and the position of the microphones 109 within the earphone 103. If the earphone 103 is an in ear device, then the reflections and distortions will come from the user's ear canal while if the earphone is an over ear device the reflections and distortions could come from other parts of the ear.

[0039] The model that is used for the reference microphone signal can be a pre-trained model. The model can be trained using a microphone 109 and a speaker 107 that are not positioned in a user's ear.

[0040] The model can comprise a non-linear model. In some examples the model can comprise a machine learning program such as a neural network, or any other suitable type of machine learning program.

[0041] In some examples a correlation between the obtained microphone signal and the reference microphone signal can be used to obtain a difference between the obtained microphone signal and the reference microphone signal. The correlation gives an indication of the differences between the respective signals. These differences can be due to leakage of external noise into the microphone 109 and reflections and reverberations of the acoustic signal within the user's ear.

[0042] At block 203 the method comprises processing the correlation between the obtained microphone signal and the reference microphone signal to determine a quality of sealing between the earphone and a user's ear. The amount of leakage of external noise into the microphone 109 and the relative levels of reflections and reverberations are affected by the quality of the seal between the earphone 103 and the user's ear. Therefore, by determining and analysing these components of the signal a quality level of the seal can be determined.

[0043] Any suitable methods can be used to process the correlation between the obtained microphone signal and the reference microphone signal. In some examples the process could comprise determining two or more parameters. The parameters could comprise a measurement of a physical characteristic of the response of the user's ear to the acoustic signal. The parameters could then be used to determine the quality of sealing between the earphone and a user's ear.

[0044] In some examples the processing of the correlation could comprise estimating a sealing coefficient by normalizing the correlation. In some examples the processing of the correlation could comprise calculating a proportion of energy in a low frequency range. This could be calculated as a ratio of low frequency energy to all frequency energy or could be calculated using any other suitable method. In some examples the processing of the correlation could comprise estimating a correlation coefficient such as a Pearson coefficient or any other suitable coefficient.

[0045] In some examples multiple parameters can be determined and used in combination to determine the quality of sealing between the earphone and a user's ear. Using multiple different parameters can provide a more reliable estimate of the quality of the seal.

[0046] Fig. 3 shows training of an example model 303. The model 303 can be a model of the speaker 107 and the microphone 109. The model 303 can account for non-linearities within the system of the speaker 107 and the microphone 109. The model 303 can be trained with the speaker 107 and the microphone 109 in open air so that they are not positioned within a user's ear during training. In this case there is no seal.

[0047] The model 303 can be trained to predict an output microphone signal for a given input audio signal. The model 303 can be trained to account for nonlinearities of the speaker 107 and/or the microphone 109. For example, the model 303 can account for intermodulation distortions. The intermodulation distortions are multi-tone distortion products of two or more signals being present in the output of the non-linear speaker 107 The intermodulation distortions arise due to the non-linearity of the speaker. The intermodulation distortions arise due to the interaction of the respective frequency components with each other. The effect of intermodulation distortions can be particularly pronounced for devices such as earbuds which have to be small enough to fit into a user's ear and so only have space for a small speaker. The small speakers have limitations on the movement of the cone and so can show higher non-linear behaviours than bigger speakers.

[0048] The model 303 could comprise a machine learning program. The machine learning program can comprise a neural network or any other suitable type of trainable model. The term "Machine Learning program" refers to any kind of artificial intelligence (Al), intelligent or other method that is trainable or tunable using data. The machine learning program can comprise a computer program. The machine learning program can be trained to perform a task, such as obtaining a reference a microphone signal for given input to a speaker 107, without being explicitly programmed to perform that task. The machine learning program can be configured to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E. In these examples the machine learning program can often learn from reference data to make estimations on future data. The machine learning program can also be a trainable computer program. Other types of machine learning programs could be used in other examples.

[0049] The training of the machine learning program can be performed by any suitable apparatus or system. For example, the machine learning program could be trained by a system or other apparatus that has a high processing capacity. In some examples the machine learning program could be trained by a system comprising one or more graphical processing units (GPUs) or any other suitable type of processor.

[0050] In the example training process shown in Fig. 3 an audio signal 301 is provided as an input to the model 303. The audio signal 301 is denoted as x[n] in Fig. 3. The model 303 provides a reference microphone signal as an output. The reference microphone signal is denoted fθ (. ) in Fig. 3.

[0051] The audio signal 301 can comprise any suitable type of audio content. In some examples the audio signals could comprise sound signals with varying magnitude. The sound signals could comprise different languages or other types of sound signals. The different languages could be Turkish, English, Italian, Arabic, or any other suitable languages.

[0052] Before the model 303 is trained it can be configured in an initial state. In the initial state the weights or other parameters of the model 303 can be set to any suitable value. In some examples the initial weights of the model 303 can be set to unity.

[0053] The audio signal 301 is also provided to a digital to analogue converter 305 to generate the analogue audio signal (t). The analogue audio signal (t) is then provided as an input to a speaker 107.

[0054] The speaker 107 is configured to convert an input electrical signal into an output acoustic signal 307. In this example the acoustic signal 307 comprises a pressure wave that could be detected by a user's ear. The output signal can be any suitable acoustic signal. The output acoustic signal could comprise content that a user might be listening to such as music or people talking or any other suitable type of audio. The output acoustic signal 307 is denoted (t) in this example.

[0055] To enable training of the model 303 one or more microphones 109 are configured to detect the acoustic signal 307 that is output from the speaker 107. The microphone 109 can be an inward facing microphone so that when the earphone 103 is in use the microphone 109 would be facing inwards to the user's ear canal. The one or more microphones 109 can comprise any means that can be configured to convert an acoustic input signal into a corresponding electrical output signal. The microphones 109 can be part of an earphone 103.

[0056] The output microphone signal from the microphone 109 is provided to an analogue to digital converter 309. The analogue to digital converter 309 is configured to convert the analogue output from the microphone 109 into a digital signal. The digital microphone signal is denoted y[n] in Fig. 3. This provides an obtained microphone signal that can be compared to the reference microphone signal of the model 303.

[0057] For the purposes of training the model 303 the output of the microphone 109 is the ground truth and the audio signal 301 is the stimuli.

[0058] The comparison between the predicted microphone signal fθ(.) from the model 303 and the obtained output of the microphones 109 enables the output of the model 303 to be evaluated relative to the actual output of the microphone 109. The comparison provides an error signal 311 as an output. The error signal 311 can comprise an error value or other indication of the difference between the estimated speaker output and the actual speaker output.

[0059] The error value or other indication can be provided as an input to the model 303. One or more weights or other parameters of the model 303 can be adjusted in response to the error input. For example, if the model 303 comprises a neural network one or more weights of the neural network can be adjusted. The weights or other parameters can be updated to decrease the loss between the output of the model 303 and the output of the microphone 109.

[0060] The process shown in Fig. 3 can be repeated until the outputs of the model 303 have sufficiently converged with the output of the microphone 109. The outputs of the model 303 can be determined to have converged with the output of the microphone if the error value is below a given threshold or if any other suitable criteria has been satisfied.

[0061] Once the outputs of the model 303 have been determined to be converged the model 303 is considered to be trained. The trained model 303 can then be used in systems such as the system 101 of Fig. 1 and used to determine a quality of a seal between a user's ear and an earphone 103.

[0062] Any suitable functions can be used in the model 303 to approximate the non-linearities of the speaker 107 and/or microphone 109. In some examples a Volterra filter can be used. It can be assumed that Volterra kernels have a finite memory length N. The not cubic or higher order terms can be ignored for computational ease because the effect of higher distortions is less. The input-output relation of the speaker 107 and the microphone 109 can be represented by:



[0063] Where x[n] is the input signal and y[n] is the output signal, h1[k1] andh2[k1, k1] are the first and second-order discrete Volterra kernels. This equation can also be written as:

where:



[0064] And w1 and w2 are weight vectors. The filter coefficients can be obtained using a Normalised least mean square algorithm. The update rule for the weights w can be written as:



[0065] Where:



[0066] And



[0067] And µ is the step size and η is a small positive number to avoid division by zero and e(n) is the error between the output of the model 303 and the output of the microphone 109. Other types of functions could be used in other examples.

[0068] The training of the model 303 can be performed offline. The training of the model 303 only needs to be performed once. For example, the training can be performed when new earphone 103 is designed and built. The training of the model 303 only needs to be performed for a new type of earphone 103 and not for all units of the earphone 103 that are manufactured. The trained model 303 can then be provided to the relevant devices for deployment.

[0069] Fig. 4 shows deployment of an example trained model 403. The trained model 403 can be deployed after the training of the model 303 from Fig. 3 has been completed. The training could be the process of Fig. 3, or any other suitable training process.

[0070] When the trained model 403 is deployed the earphone 103 comprising the speaker 107 and the microphone 109 is located in or over a user's ear. This means that the microphone 109 will detect leaked external audio and also reflections and distortions of the acoustic signals from the space created by the user's ear and the earphone 103. For example, if the earphone 103 is an in ear phone the reflections and distortions can be from the ear canal that is, at least partly blocked, by the earphone 103.

[0071] In the example deployment shown in Fig. 4 an audio signal 401 is provided as an input to the trained model 403. The audio signal 401 is denoted as x[n] in Fig. 4. The audio signal 401 can comprise any suitable type of audio. The audio signal 401 does not need to be the same as the audio signal 301 that is used for training the model 303. The audio signal 401 does not need to be a reference signal or contain any specific parameters. The audio signal 401 can comprise audio content that a user is listening to, for example it could comprise speech or music or other types of content.

[0072] The trained model 403 provides a reference microphone signal fθ(. ) as an output. The reference microphone signal fθ(. ) is an estimation of a signal that should be expected from a microphone 109.

[0073] The audio signal 401 is also provided to a digital to analogue converter 305 to generate the analogue audio signal (t). The analogue audio signal (t) is then provided as an input to the speaker 107.

[0074] The speaker 107 is configured to convert an input electrical signal into an output acoustic signal 407. The output acoustic signal 407 is denoted (t) in this example. The output acoustic signal 407 provides audio content that a user can hear. The output acoustic signal 407 is dependent upon the audio signal 401. The acoustic signal 407 can then be captured by the microphone 109.

[0075] The microphone 109 captures the acoustic signal 407 and provides a microphone signal as an output in response to the captured acoustic signal 407. The microphone signal is provided to an analogue to digital converter 309. The analogue to digital converter 309 is configured to convert the analogue output from the microphone 109 into a digital signal. This provides an obtained microphone signal.

[0076] The obtained microphone signal can be expressed as:



[0077] Where Λ(x[n]) represents the captured waveform of the signal emitted from the speaker 107, r[n] represents the reflections and distortions, and v[n] is the external or outside noise as measured by the in-ward facing microphone 109. In this example the earphone 103 is in an in ear device and the reflections can come from a user's ear canal and ear drum. If the earphone 103 is an over the ear device then the reflections would come from other parts of the user's ear. The distortions can be caused by putting the speaker 107 under pressure.

[0078] The obtained microphone signal y[n] can be compared to the reference microphone signal fθ(. ) to determine the amount of reflections and external noise 409 in the obtained microphone signal y[n]. This can give an indication of the quality of seal between the earphone and the user's ear.

[0079] Any suitable processing can be used to determine the amount of reflections and external noise in the obtained microphone signal y[n]. In some examples one or more parameters indicative of the amount of reflections and noise in the obtained microphone signal y[n] can be calculated and used to provide an indication of the quality of the seal.

[0080] When the quality of the seal between the earphone 103 and the user's ear is poor the air leakage will be high. This will result in less distortions and reflections. This will result in a smaller value for α in the equation for y[n]. Similarly, if the seal is poor this will increase the amount of external noise that is detected by an inward facing microphone 109. This will result in a higher value for β in the equation for y[n]. Therefore, the ratio between α and β gives information about the quality of the seal between the earphone 103 and the user's ear. Separating α and β from the obtained microphone signal is not trivial because x[n] and r[n] are correlated.

[0081] The obtained microphone signal y[n] can be compared to the reference microphone signal fθ(. ) from the trained model 403. Subtracting the obtained microphone signal y[n] from the reference microphone signal fθ(. ) gives the resultant signal [n] which can be denoted as:



[0082] Where e[n] is the error between two nonlinear functions fθ(.), Λ(.). That is, e[n] gives the error between the trained model 403 of the speaker 107 and microphone 109 and the actual performance of the speaker 107 and microphone 109. The trained model 403 can be assumed to be a good approximation of the speaker 107 and microphone 109, so the error signal can be ignored. This results in:



[0083] The correlation between the resultant signal [n] and the reference microphone signal fθ(. ) is:



[0084] The external noise that is captured by the inward facing microphone 109 will be uncorrelated with the output of the speaker 107 and the trained model 403 that represents the speaker 107. This means that the term relating to the correlation with the external noise can be ignored so that:



[0085] This gives the overall reflection and distortions from the users ear that are captured by the microphone 109. This can be used to estimate a sealing coefficient K. The sealing coefficient K can be obtained normalizing the correlation to give:



[0086] The obtained sealing coefficient K gives information about the overall magnitude of reflections. Lower values of K indicate a lower sealing quality. For example, a value of K that is close to zero indicates that there is a poor sealing and not many reflections while a value of K that is close to one indicates that there is good sealing and a larger amount of reflections.

[0087] The sealing coefficient K is obtained using a magnitude-based normalization. This can provide a more reliable discriminator between poor sealing and good sealing compared to normalizations where the autocorrelations at zero lag equal 1. This is mainly because the amplitude of the input signal also changes the magnitude of reflections and distortions. If a normalization using zero lag is used, for example a Pearson correlation coefficient, then the magnitude of the input signal does not affect the resulting sealing coefficient K and does not give a correct indication of the amount of vibrations in the obtained microphone signal and therefore would not give a good indication of the sealing quality.

[0088] The reflections r [n] that come from the ear canal and eardrum or other parts of the user's ear are a function of frequency. The function will change from person to person. However, the overall pattern of the reflections will be the same for different users because of the occlusion effect which is an enhancement of low-frequency components of sounds in an occluded ear canal. Therefore, in addition to the use of the sealing coefficient K other parameters can be used. The other parameters can be selected so as to address the dependency on the user for estimating the sealing quality.

[0089] A good seal between the earphone 103 and the user's ear will leave traces in the low frequency bands of the obtained microphone signals due to the occlusion effect.

[0090] These traces in the low frequency bands can preserve some structure of the motion or body signals of the user 111. For example, it can preserve structure of the heartbeats chewing motions, walking motions or other suitable body signals. These low frequency traces can be distinguished from the acoustic signal played back by the speaker 107 and detected by the microphone 109 and analysed to give further parameters that can be used as an indication of the quality of the seal.

[0091] In some examples a further parameter that can be used as an indication of the quality of the seal could be a proportion of energy in a low frequency range. Any suitable means or process can be used to determine the proportion of energy in the low frequency range. In some examples the proportion of energy in a low frequency range can be determined by transforming the resultant signal [n] to obtain energy levels for different frequency points. A fast Fourier transform, or any other suitable type of transform, can be used to transform the signal. The sum of energies at low frequencies can then be divided by the sum of the energies across all frequency ranges. In some examples the low frequency cut off could be 50Hz so that the ratio is given as:

where [k] is the FFT of the resultant signal [n]. The FFT can use any suitable resolution. In some examples the FFT could use 1Hz resolution.

[0092] The ratio, KLP, or any other suitable parameter, gives information about the low-frequency boost caused by a seal. The ratio KLP will be higher if there is a good seal between the user's ear and the earphone 103 and will be lower if there is a poor seal between the user's ear and the earphone 103. This enables the ratio KLP to be used as another parameter that indicates the quality of the seal.

[0093] In some examples external artefacts in the ear canal, or other parts of the user's ear, can be removed before the ratio KLP is used as a discriminator. The external artefacts can originate from the user's body. For example, they could be caused by walking, heartbeats, chewing or any other suitable factor. The artefacts can be conducted to the user's ear canal through bone conduction and amplified within the same low frequency region. If the seal is poor then the artefacts could be caused by external sounds.

[0094] Any suitable means or process can be used to discriminate the artefacts from an increase in the low frequency energy levels caused by the seal. In some examples the output reference signal from the trained model 403 can be used. If the trained model 403 has been trained with the speaker 107 and the microphone 109 in open air so that they are not positioned within a user's ear then the output of the trained model 403 represents the signal captured by the microphone 109 when there is no sealing. Therefore a correlation coefficient between the output of the trained model 403 and the captured microphone signal 109 can provide another parameter that can be used to estimate the sealing quality.

[0095] Any suitable correlation coefficient can be used. In some examples a low pass filter can be applied to both the output of the trained model 403 and the resultant signal [n] and then a Pearson coefficient, or other suitable coefficient can be calculated between them.

[0096] In some examples two or more of the parameters can be used to determine the quality of the seal. Using more than one parameter can provide a more robust indication of the quality of the seal. Other parameters and/or combinations of parameters could be used in other examples.

[0097] Fig. 5, 6 and 7 shows results obtained for examples of the disclosure. These results were obtained from experiments performed using twelve participants. The twelve participants included two females and ten males with an age range from 24 to 50. The participants were informed about the goal of the experiments and asked to perform several tasks in their natural way. The experiments were performed in an office environment while the participants wore an earbud prototype at the usual position.

[0098] The participants positioned the earbud for good, average and bad sealing cases according to the mean reduction of six frequencies. For the respective earbud positions (Good, Average and Bad), each participant is asked to repeat all the situations twice, once in a quiet environment and once with external noise provided by a speaker.

[0099] Three different situations were used to represent daily life usage of earbuds. The different situations used for these experiments were running, talking and sitting. For talking, each participant was asked to speak during the recording in their normal tone. Overall, six different cases (three situations in two different environments) were performed by the subjects. In total two minutes of audio data from different users were collected and used for overall evaluation. The system used does not have a user registration or user-specific training and so all recorded data was used for the evaluation. Figs. 5 to 7 represent all data points collected during the study.

[0100] Figs. 5 and 6 show a three-dimensional plot for the result obtained from the experiments conducted. Figs. 5 and 6 show the same plot from two different perspectives. The different axis in the plot show the different parameters for indicating seal quality as described above. The x axis shows a magnitude base correlation, the y axis shows a low frequency correlation, and the z axis shows a low frequency ratio. The different symbols used for the points indicate whether the seal was good or bad or average. Figures 5 and 6 clearly show that good, bad or average regions can be clearly defined using the respective parameters.

[0101] Fig. 7 shows the t-distributed stochastic neighbour embedding of extracted features using implementations of the disclosure. This shows that the above-mentioned parameters are discriminative for the sealing quality. Also, the overlapping region between Good-Average and Average-Bad shows that discrete and continuous sealing quality estimation is crucial for earphones 103 because it is much more informative compared to a simple indication of good or bad.

[0102] Examples of the disclosure could be used in different use cases and scenarios. Examples of the disclosure could be used for long-term continuous monitoring of ear canal sealing with ear buds. In such use cases the examples of the disclosure enable a measurement of the air leakage level without using a test or reference signal. This continuous monitoring will therefore not be intrusive because it will not interfere with audio content that the user is listening to.

[0103] Examples of the disclosure can also be used to provide information about the quality of the seal to other applications. The other applications could be non-audio applications such as face recognition, user authentication, respiration rate monitoring, activity recognition, or any other suitable type of application that uses sensors in an earphone 103. Having information about the quality of the seal can enable the applications to control which sensors or sensor combinations are used for the relevant applications. For example, a respiration rate monitoring application can use any of the inertial measurement unit, Photoplethysmography sensors, and speaker-microphone pairs together or separately. If implementations of the disclosure are used the respiration rate monitoring application can select which of these to use based on the sealing quality and how this affects the respective sensors.

[0104] In some cases implementations of the disclosure can be used to provide feedback to active noise cancellation applications to prevent divergence. Active noise cancellation functions to cancel or remove unwanted noise by introducing an additional, electronically controlled sound field referred to as anti-noise. The anti-noise is electronically designed to have the proper pressure amplitude and phase that destructively interferes with the unwanted noise or disturbance. If the active noise cancellation algorithms are used in scenarios with high leakage then the adaptive filters will increase the gain more and more so as to cancel the noise, leading to a possible divergence. This can be prevented by providing feedback about the seal quality to the active noise cancellation algorithm.

[0105] In some examples implementations of the disclosure could be used to reduce power consumption of the earphones 103. For instance, it can be used to detect the presence or absence of an ear by detecting whether there is a seal or not. The audio content can then be controlled based on the earphone 103 is still positioned next to the user's ear.

[0106] Fig. 8 shows example earphones 103 that could be used in examples of the disclosure. The example earphones 103 comprise earbuds 801. Other types of earphones 103 could be used in other examples.

[0107] The earbuds 801 of Fig. 8 comprise a housing 803 and an in ear portion 805. The in ear portion 805 is sized and shaped to fit into the ear of a user. When the earbuds are in use the in ear portion is inserted into the ear of the user.

[0108] The housing 803 can be configured to house an apparatus or any other suitable control means for controlling the earphones 103. In some examples the apparatus could be housed in a different device such as a mobile phone or other personal electronic device. An example apparatus is shown in Fig. 9.

[0109] Fig. 9 schematically illustrates an apparatus 901 that can be used to implement examples of the disclosure. In this example the apparatus 901 comprises a controller 903. The controller 903 can be a chip or a chip-set. In some examples the controller 903 can be provided within any suitable device such as earphones or a device such as a smartphone that can be configured to communicate with the earphones.

[0110] In the example of Fig. 9 the implementation of the controller 903 can be as controller circuitry. In some examples the controller 903 can be implemented in hardware alone, have certain aspects in software including firmware alone or can be a combination of hardware and software (including firmware).

[0111] As illustrated in Fig. 9 the controller 903 can be implemented using instructions that enable hardware functionality, for example, by using executable instructions of a computer program 909 in a general-purpose or special-purpose processor 905 that may be stored on a computer readable storage medium (disk, memory etc.) to be executed by such a processor 905.

[0112] The processor 905 is configured to read from and write to the memory 907. The processor 905 can also comprise an output interface via which data and/or commands are output by the processor 905 and an input interface via which data and/or commands are input to the processor 905.

[0113] The memory 907 stores a computer program 909 comprising computer program instructions (computer program code) that controls the operation of the controller 903 when loaded into the processor 905. The computer program instructions, of the computer program 909, provide the logic and routines that enables the controller 903. to perform the methods illustrated in the accompanying Figs. The processor 905 by reading the memory 907 is able to load and execute the computer program 909.

[0114] The apparatus 901 comprises:

at least one processor 905; and

at least one memory 907 including computer program code 911;

the at least one memory 907 and the computer program code 911 configured to, with the at least one processor 905, cause the apparatus 901 at least to perform:

comparing 201 a microphone signal obtained from at least one microphone located in an earphone with a reference microphone signal obtained using a model of the at least one microphone and at least one speaker so as to obtain a correlation between the obtained microphone signal and the reference microphone signal;

processing 203 the correlation between the obtained microphone signal and the reference microphone signal to determine a quality of sealing between the earphone and a user's ear.



[0115] As illustrated in Fig. 9, the computer program 909 can arrive at the controller 903 via any suitable delivery mechanism 913. The delivery mechanism 913 can be, for example, a machine readable medium, a computer-readable medium, a non-transitory computer-readable storage medium, a computer program product, a memory device, a record medium such as a Compact Disc Read-Only Memory (CD-ROM) or a Digital Versatile Disc (DVD) or a solid-state memory, an article of manufacture that comprises or tangibly embodies the computer program 909. The delivery mechanism can be a signal configured to reliably transfer the computer program 909. The controller 903 can propagate or transmit the computer program 909 as a computer data signal. In some examples the computer program 909 can be transmitted to the controller 903 using a wireless protocol such as Bluetooth, Bluetooth Low Energy, Bluetooth Smart, 6LoWPan (IPv6 over low power personal area networks) ZigBee, ANT+, near field communication (NFC), Radio frequency identification, wireless local area network (wireless LAN) or any other suitable protocol.

[0116] The computer program 909 comprises computer program instructions for causing an apparatus 901 to perform at least the following or for performing at least the following:

comparing 201 a microphone signal obtained from at least one microphone located in an earphone with a reference microphone signal obtained using a model of the at least one microphone and at least one speaker so as to obtain a correlation between the obtained microphone signal and the reference microphone signal;

processing 203 the correlation between the obtained microphone signal and the reference microphone signal to determine a quality of sealing between the earphone and a user's ear.



[0117] The computer program instructions can be comprised in a computer program 909, a non-transitory computer readable medium, a computer program product, a machine readable medium. In some but not necessarily all examples, the computer program instructions can be distributed over more than one computer program 909.

[0118] Although the memory 907 is illustrated as a single component/circuitry it can be implemented as one or more separate components/circuitry some or all of which can be integrated/removable and/or can provide permanent/semi-permanent/ dynamic/cached storage.

[0119] Although the processor 905 is illustrated as a single component/circuitry it can be implemented as one or more separate components/circuitry some or all of which can be integrated/removable. The processor 905 can be a single core or multi-core processor.

[0120] References to 'computer-readable storage medium', 'computer program product', 'tangibly embodied computer program' etc. or a 'controller', 'computer', 'processor' etc. should be understood to encompass not only computers having different architectures such as single /multi- processor architectures and sequential (Von Neumann)/parallel architectures but also specialized circuits such as field-programmable gate arrays (FPGA), application specific circuits (ASIC), signal processing devices and other processing circuitry. References to computer program, instructions, code etc. should be understood to encompass software for a programmable processor or firmware such as, for example, the programmable content of a hardware device whether instructions for a processor, or configuration settings for a fixed-function device, gate array or programmable logic device etc.

[0121] As used in this application, the term 'circuitry' may refer to one or more or all of the following:
  1. (a) hardware-only circuitry implementations (such as implementations in only analog and/or digital circuitry) and
  2. (b) combinations of hardware circuits and software, such as (as applicable):
    1. (i) a combination of analog and/or digital hardware circuit(s) with software/firmware and
    2. (ii) any portions of hardware processor(s) with software (including digital signal processor(s)), software, and memory or memories that work together to cause an apparatus, such as a mobile phone or server, to perform various functions and
  3. (c) hardware circuit(s) and or processor(s), such as a microprocessor(s) or a portion of a microprocessor(s), that requires software (for example, firmware) for operation, but the software may not be present when it is not needed for operation.
This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor and its (or their) accompanying software and/or firmware. The term circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit for a mobile device or a similar integrated circuit in a server, a cellular network device, or other computing or network device.

[0122] The blocks illustrated in Fig. 2 can represent steps in a method and/or sections of code in the computer program 909. The illustration of a particular order to the blocks does not necessarily imply that there is a required or preferred order for the blocks and the order and arrangement of the blocks can be varied. Furthermore, it can be possible for some blocks to be omitted.

[0123] The term 'comprise' is used in this document with an inclusive not an exclusive meaning. That is any reference to X comprising Y indicates that X may comprise only one Y or may comprise more than one Y. If it is intended to use 'comprise' with an exclusive meaning then it will be made clear in the context by referring to "comprising only one..." or by using "consisting".

[0124] In this description, the wording 'connect', 'couple' and 'communication' and their derivatives mean operationally connected/coupled/in communication. It should be appreciated that any number or combination of intervening components can exist (including no intervening components), i.e., so as to provide direct or indirect connection/coupling/communication. Any such intervening components can include hardware and/or software components.

[0125] As used herein, the term "determine/determining" (and grammatical variants thereof) can include, not least: calculating, computing, processing, deriving, measuring, investigating, identifying, looking up (for example, looking up in a table, a database or another data structure), ascertaining and the like. Also, "determining" can include receiving (for example, receiving information), accessing (for example, accessing data in a memory), obtaining and the like. Also, " determine/determining" can include resolving, selecting, choosing, establishing, and the like.

[0126] In this description, reference has been made to various examples. The description of features or functions in relation to an example indicates that those features or functions are present in that example. The use of the term 'example' or 'for example' or 'can' or 'may' in the text denotes, whether explicitly stated or not, that such features or functions are present in at least the described example, whether described as an example or not, and that they can be, but are not necessarily, present in some of or all other examples. Thus 'example', 'for example', 'can' or 'may' refers to a particular instance in a class of examples. A property of the instance can be a property of only that instance or a property of the class or a property of a sub-class of the class that includes some but not all of the instances in the class. It is therefore implicitly disclosed that a feature described with reference to one example but not with reference to another example, can where possible be used in that other example as part of a working combination but does not necessarily have to be used in that other example.

[0127] Although examples have been described in the preceding paragraphs with reference to various examples, it should be appreciated that modifications to the examples given can be made without departing from the scope of the claims.

[0128] Features described in the preceding description may be used in combinations other than the combinations explicitly described above.

[0129] Although functions have been described with reference to certain features, those functions may be performable by other features whether described or not.

[0130] Although features have been described with reference to certain examples, those features may also be present in other examples whether described or not.

[0131] The term 'a', 'an' or `the' is used in this document with an inclusive not an exclusive meaning. That is any reference to X comprising a/an/the Y indicates that X may comprise only one Y or may comprise more than one Y unless the context clearly indicates the contrary. If it is intended to use 'a', 'an' or `the' with an exclusive meaning then it will be made clear in the context. In some circumstances the use of 'at least one' or 'one or more' may be used to emphasis an inclusive meaning but the absence of these terms should not be taken to infer any exclusive meaning.

[0132] The presence of a feature (or combination of features) in a claim is a reference to that feature or (combination of features) itself and also to features that achieve substantially the same technical effect (equivalent features). The equivalent features include, for example, features that are variants and achieve substantially the same result in substantially the same way. The equivalent features include, for example, features that perform substantially the same function, in substantially the same way to achieve substantially the same result.

[0133] In this description, reference has been made to various examples using adjectives or adjectival phrases to describe characteristics of the examples. Such a description of a characteristic in relation to an example indicates that the characteristic is present in some examples exactly as described and is present in other examples substantially as described.

[0134] The above description describes some examples of the present disclosure however those of ordinary skill in the art will be aware of possible alternative structures and method features which offer equivalent functionality to the specific examples of such structures and features described herein above and which for the sake of brevity and clarity have been omitted from the above description. Nonetheless, the above description should be read as implicitly including reference to such alternative structures and method features which provide equivalent functionality unless such alternative structures or method features are explicitly excluded in the above description of the examples of the present disclosure.

[0135] Whilst endeavoring in the foregoing specification to draw attention to those features believed to be of importance it should be understood that the Applicant may seek protection via the claims in respect of any patentable feature or combination of features hereinbefore referred to and/or shown in the drawings whether or not emphasis has been placed thereon.


Claims

1. An apparatus comprising means for:

comparing a microphone signal obtained from at least one microphone located in an earphone with a reference microphone signal obtained using a model of the at least one microphone and at least one speaker so as to obtain a correlation between the obtained microphone signal and the reference microphone signal; and

processing the correlation between the obtained microphone signal and the reference microphone signal to determine a quality of sealing between the earphone and a user's ear.


 
2. An apparatus as claimed in claim 1 wherein the model of the at least one microphone and at least one speaker is trained with the at least one microphone and the at least one speaker not positioned in a user's ear.
 
3. An apparatus as claimed in any preceding claim wherein the model of the at least one microphone and at least one speaker comprises a non-linear model.
 
4. An apparatus as claimed in any preceding claim wherein a machine learning program is used to model the at least one microphone and at least one speaker.
 
5. An apparatus as claimed in any preceding claim wherein the processing of the correlation determines two or more parameters and the two or more parameters are used to determine the quality of sealing between the earphone and the user's ear.
 
6. An apparatus as claimed in any preceding claim wherein the means are also for estimating a sealing coefficient by normalizing the correlation wherein the sealing coefficient is used to determine the quality of sealing between the earphone and the user's ear.
 
7. An apparatus as claimed in any preceding claim wherein the means are also for calculating a proportion of energy in a low frequency range of the obtained microphone signal.
 
8. An apparatus as claimed in any preceding claim wherein the means are also for estimating a correlation coefficient between the obtained microphone signal and the reference microphone signal.
 
9. An apparatus as claimed in any preceding claim wherein the microphone signal obtained from at least one microphone comprises at least one of; external noise, reflections or distortions from at least part of an ear.
 
10. An apparatus as claimed in any preceding claim wherein the earphone is configured to be worn inside a user's ear.
 
11. An apparatus as claimed in any of claims 1 to 10 wherein the earphone is configured to be worn over a user's ear.
 
12. An earphone comprising an apparatus as claimed in any preceding claim.
 
13. A method comprising:

comparing a microphone signal obtained from at least one microphone located in an earphone with a reference microphone signal obtained using a model of the at least one microphone and at least one speaker so as to obtain a correlation between the obtained microphone signal and the reference microphone signal; and

processing the correlation between the obtained microphone signal and the reference microphone signal to determine a quality of sealing between the earphone and a user's ear.


 
14. A method as claimed in claim 13 wherein the model of the at least one microphone and at least one speaker is trained with the at least one microphone and the at least one speaker not positioned in a user's ear.
 
15. A computer program comprising instructions which, when executed by an apparatus, cause the apparatus to perform:

comparing a microphone signal obtained from at least one microphone located in an earphone with a reference microphone signal obtained using a model of the at least one microphone and at least one speaker so as to obtain a correlation between the obtained microphone signal and the reference microphone signal; and

processing the correlation between the obtained microphone signal and the reference microphone signal to determine a quality of sealing between the earphone and a user's ear.


 




Drawing




























Search report









Search report