TECHNOLOGICAL FIELD
[0001] Examples of the disclosure relate to apparatus, methods and computer programs for
analyzing earphone sealing. Some relate to apparatus, methods and computer programs
for determining a quality of sealing between the earphone and a user's ear.
BACKGROUND
[0002] Earphones are audio output devices that are configured to be worn in or on a user's
ear. When a user is wearing an earphone a seal, or partial seal, can be created between
the earphone and the user's ear. The quality of this seal and/or the air leakage through
this seal can affect the sound level within the user's ear canal. This can affect
measurements made by sensors in the earphones, audio signal quality, pressure levels
within the user's ear and/or other functions of the earphones.
BRIEF SUMMARY
[0003] According to various, but not necessarily all, examples of the disclosure there may
be provided an apparatus comprising means for:
comparing a microphone signal obtained from at least one microphone located in an
earphone with a reference microphone signal obtained using a model of the at least
one microphone and at least one speaker so as to obtain a correlation between the
obtained microphone signal and the reference microphone signal;
processing the correlation between the obtained microphone signal and the reference
microphone signal to determine a quality of sealing between the earphone and a user's
ear.
[0004] The model of the at least one microphone and at least one speaker may be trained
with the at least one microphone and the at least one speaker not positioned in a
user's ear.
[0005] The model of the at least one microphone and at least one speaker may comprise a
non-linear model.
[0006] A machine learning program may be used to model the at least one microphone and at
least one speaker.
[0007] The processing of the correlation may determines two or more parameters and the two
or more parameters may be used to determine the quality of sealing between the earphone
and a user's ear.
[0008] The means may be for estimating a sealing coefficient by normalizing the correlation
wherein the sealing coefficient is used to determine the quality of sealing between
the earphone and the user's ear.
[0009] The means may be for calculating a proportion of energy in a low frequency range
of the obtained microphone signal.
[0010] The means may be for estimating a correlation coefficient between the obtained microphone
signal and the reference microphone signal.
[0011] The microphone signal obtained from at least one microphone may comprise at least
one of; external noise, reflections, distortions from at least part of an ear.
[0012] The earphone may be configured to be worn inside a user's ear.
[0013] The earphone may be configured to be worn over a user's ear.
[0014] According to various, but not necessarily all, examples of the disclosure there may
be provided an earphone comprising an apparatus as described herein.
[0015] According to various, but not necessarily all, examples of the disclosure there may
be provided a method comprising:
comparing a microphone signal obtained from at least one microphone located in an
earphone with a reference microphone signal obtained using a model of the at least
one microphone and at least one speaker so as to obtain a correlation between the
obtained microphone signal and the reference microphone signal;
processing the correlation between the obtained microphone signal and the reference
microphone signal to determine a quality of sealing between the earphone and a user's
ear.
[0016] According to various, but not necessarily all, examples of the disclosure there may
be provided a computer program comprising instructions which, when executed by an
apparatus, cause the apparatus to perform:
comparing a microphone signal obtained from at least one microphone located in an
earphone with a reference microphone signal obtained using a model of the at least
one microphone and at least one speaker so as to obtain a correlation between the
obtained microphone signal and the reference microphone signal;
processing the correlation between the obtained microphone signal and the reference
microphone signal to determine a quality of sealing between the earphone and a user's
ear.
[0017] While the above examples of the disclosure and optional features are described separately,
it is to be understood that their provision in all possible combinations and permutations
is contained within the disclosure. It is to be understood that various examples of
the disclosure can comprise any or all of the features described in respect of other
examples of the disclosure, and vice versa. Also, it is to be appreciated that any
one or more or all of the features, in any combination, may be implemented by/comprised
in/performable by an apparatus, a method, and/or computer program instructions as
desired, and as appropriate.
BRIEF DESCRIPTION
[0018] Some examples will now be described with reference to the accompanying drawings in
which:
FIG. 1 shows an example system;
FIG. 2 shows an example method;
FIG. 3 shows training of an example model;
FIG. 4 shows deployment of an example model;
FIG. 5 shows results obtained for examples of the disclosure;
FIG. 6 shows results obtained for examples of the disclosure;
FIG. 7 shows results obtained for examples of the disclosure;
FIG. 8 shows example earphones according to examples of the disclosure; and
FIG. 9 shows an apparatus according to examples of the disclosure.
[0019] The figures are not necessarily to scale. Certain features and views of the figures
can be shown schematically or exaggerated in scale in the interest of clarity and
conciseness. For example, the dimensions of some elements in the figures can be exaggerated
relative to other elements to aid explication. Corresponding reference numerals are
used in the figures to designate corresponding features. For clarity, all reference
numerals are not necessarily displayed in all figures.
DETAILED DESCRIPTION
[0020] Earphones can be worn by a user to provide audio signals to a user's ears. The earphones
can be worn in a user's ear, for example the earphones could comprise ear bud devices.
In some examples the earphones could be worn over a user's ear. For instance, they
could comprise head phones or headsets.
[0021] In some examples the earphones can also comprise additional sensors that can be configured
to monitor one or more characteristics of the user. For example, the earphones could
comprise a microphone, vibration sensors, infrared sensors, air pressure sensors,
motion sensors or other suitable types of sensors. These sensors can be used for monitoring
health characteristics of a user, identifying a user, identifying activities or actions
of a user or for any other suitable purpose.
[0022] The earphone can form a seal with the user's ear. The quality of this seal and/or
the air leakage through this seal can affect the sound level within the user's ear
canal. This can affect measurements made by sensors in the earphones, audio signal
quality, pressure levels within the user's ear and/or other functions of the earphones.
It is therefore useful to determine the quality of the seal between the earphone and
the user's ear.
[0023] Examples of the disclosure provide a reliable method for determining the quality
of the seal.
[0024] Fig. 1 schematically shows an example system 101 that can be used to implement some
examples of the disclosure. The system 101 comprises an earphone 103 and an apparatus
105. Only components that are referred to in the following description are shown in
Fig. 1. The system 101 could comprise other components in other examples.
[0025] The earphone 103 can be a device that is configured to be worn by a user 111. The
earphone 103 can be configured to be worn in or over a user's ear. This can enable
the earphone 103 to be used to provide acoustic signals to the user 111. For example,
the earphone 103 can be used for the playback of audio content.
[0026] The earphone 103 comprises at least one speaker 107 and at least one microphone 109.
The earphone 103 could comprise additional components in some examples.
[0027] The speakers 107 can comprise any means that can be configured to generate an acoustic
signal. The speakers 107 can be configured to convert an electrical input signal into
an output acoustic signal. The speakers 107 can be positioned within the earphone
103 so that when a user 111 is using the earphone the speaker 107 is positioned in,
or close to, the user's ear.
[0028] The microphones 109 can comprise any means that can be configured to detect acoustic
signals. The microphones 109 can be configured to detect acoustic signals and convert
the acoustic signals into an output electric signal. The microphones 109 therefore
provide microphone signals as an output. The microphone signals can comprise audio
signals.
[0029] The microphones 109 can be positioned within the earphones so that they can detect
acoustic signals generated by the speaker 107 and/or reflections and reverberations
of the acoustic signals.
[0030] In the system 101 of Fig. 1 the earphone 103 and the apparatus 105 are shown as separate
devices. For example, the apparatus 105 could be provided within a different device
to the earphone 103. Example devices that could comprise the apparatus 105 could be
a personal device, such as a mobile phone, belonging to the user 111 or any other
suitable device. The apparatus 105 can be configured to communicate with the earphone
via a wireless communication link or by any other suitable means.
[0031] In some examples the apparatus 105 could be comprised within the earphone. For example,
the earphone could be part of a headset or other wearable device that comprises an
apparatus 105.
[0032] The apparatus 105 can comprise a controller comprising a processor and memory. Examples
of an apparatus 105 are shown in Fig. 9. The apparatus 105 can be configured to process
microphone signals or to perform any other suitable function.
[0033] As mentioned above the system 101 could comprise additional components that are not
shown in Fig. 1 in some examples of the disclosure. For instance, the system 101 could
comprise one or more sensors within the earphone 103. The sensors could comprise any
means that are configured to detect a physical characteristic of a user 111. The sensors
could be located in any suitable position within the earphone 103.
[0034] In some examples the sensors could be positioned so that, when the earphone 103 is
in use, the sensors are located within the ear of the user 111. The sensors could
comprise microphones, vibration sensors, infrared sensors, air pressure sensors, motion
sensors or other suitable types of sensors. These sensors can be used for monitoring
health characteristics of a user, identifying a user, identifying activities or actions
of a user, or for any other suitable purpose.
[0035] In the example of Fig. 1 only one earphone 103 is shown. In some examples there could
be two earphones 103 so that one is provided for each ear of the user 111.
[0036] Fig. 2 shows an example method. The method could be implemented using the system
of Fig.1 or any other suitable system.
[0037] The method comprises, at block 201, comparing a microphone signal obtained from at
least one microphone with a reference microphone signal. The microphone signal can
be obtained from a microphone 109 within an earphone 103. The reference microphone
signal can be obtained using a model of the microphone 109 and the speaker 107. The
reference microphone signal is an estimated microphone signal rather than one that
has been captured using a microphone.
[0038] The microphone that is used to obtain the microphone signal can be positioned within
the earphone so that it is facing inwards. The microphone 109 could be an in ear microphone
109. The microphone signal obtained from the microphone 109 comprises external noise,
and also reflections and/or distortions from at least part of an ear. The mount of
external noise in the microphone signal will be increased if the seal is of poor quality.
The part of the ear that causes the reflections and/or distortions will be determined
by the type of earphones used and the position of the microphones 109 within the earphone
103. If the earphone 103 is an in ear device, then the reflections and distortions
will come from the user's ear canal while if the earphone is an over ear device the
reflections and distortions could come from other parts of the ear.
[0039] The model that is used for the reference microphone signal can be a pre-trained model.
The model can be trained using a microphone 109 and a speaker 107 that are not positioned
in a user's ear.
[0040] The model can comprise a non-linear model. In some examples the model can comprise
a machine learning program such as a neural network, or any other suitable type of
machine learning program.
[0041] In some examples a correlation between the obtained microphone signal and the reference
microphone signal can be used to obtain a difference between the obtained microphone
signal and the reference microphone signal. The correlation gives an indication of
the differences between the respective signals. These differences can be due to leakage
of external noise into the microphone 109 and reflections and reverberations of the
acoustic signal within the user's ear.
[0042] At block 203 the method comprises processing the correlation between the obtained
microphone signal and the reference microphone signal to determine a quality of sealing
between the earphone and a user's ear. The amount of leakage of external noise into
the microphone 109 and the relative levels of reflections and reverberations are affected
by the quality of the seal between the earphone 103 and the user's ear. Therefore,
by determining and analysing these components of the signal a quality level of the
seal can be determined.
[0043] Any suitable methods can be used to process the correlation between the obtained
microphone signal and the reference microphone signal. In some examples the process
could comprise determining two or more parameters. The parameters could comprise a
measurement of a physical characteristic of the response of the user's ear to the
acoustic signal. The parameters could then be used to determine the quality of sealing
between the earphone and a user's ear.
[0044] In some examples the processing of the correlation could comprise estimating a sealing
coefficient by normalizing the correlation. In some examples the processing of the
correlation could comprise calculating a proportion of energy in a low frequency range.
This could be calculated as a ratio of low frequency energy to all frequency energy
or could be calculated using any other suitable method. In some examples the processing
of the correlation could comprise estimating a correlation coefficient such as a Pearson
coefficient or any other suitable coefficient.
[0045] In some examples multiple parameters can be determined and used in combination to
determine the quality of sealing between the earphone and a user's ear. Using multiple
different parameters can provide a more reliable estimate of the quality of the seal.
[0046] Fig. 3 shows training of an example model 303. The model 303 can be a model of the
speaker 107 and the microphone 109. The model 303 can account for non-linearities
within the system of the speaker 107 and the microphone 109. The model 303 can be
trained with the speaker 107 and the microphone 109 in open air so that they are not
positioned within a user's ear during training. In this case there is no seal.
[0047] The model 303 can be trained to predict an output microphone signal for a given input
audio signal. The model 303 can be trained to account for nonlinearities of the speaker
107 and/or the microphone 109. For example, the model 303 can account for intermodulation
distortions. The intermodulation distortions are multi-tone distortion products of
two or more signals being present in the output of the non-linear speaker 107 The
intermodulation distortions arise due to the non-linearity of the speaker. The intermodulation
distortions arise due to the interaction of the respective frequency components with
each other. The effect of intermodulation distortions can be particularly pronounced
for devices such as earbuds which have to be small enough to fit into a user's ear
and so only have space for a small speaker. The small speakers have limitations on
the movement of the cone and so can show higher non-linear behaviours than bigger
speakers.
[0048] The model 303 could comprise a machine learning program. The machine learning program
can comprise a neural network or any other suitable type of trainable model. The term
"Machine Learning program" refers to any kind of artificial intelligence (Al), intelligent
or other method that is trainable or tunable using data. The machine learning program
can comprise a computer program. The machine learning program can be trained to perform
a task, such as obtaining a reference a microphone signal for given input to a speaker
107, without being explicitly programmed to perform that task. The machine learning
program can be configured to learn from experience E with respect to some class of
tasks T and performance measure P if its performance at tasks in T, as measured by
P, improves with experience E. In these examples the machine learning program can
often learn from reference data to make estimations on future data. The machine learning
program can also be a trainable computer program. Other types of machine learning
programs could be used in other examples.
[0049] The training of the machine learning program can be performed by any suitable apparatus
or system. For example, the machine learning program could be trained by a system
or other apparatus that has a high processing capacity. In some examples the machine
learning program could be trained by a system comprising one or more graphical processing
units (GPUs) or any other suitable type of processor.
[0050] In the example training process shown in Fig. 3 an audio signal 301 is provided as
an input to the model 303. The audio signal 301 is denoted as
x[n] in Fig. 3. The model 303 provides a reference microphone signal as an output. The
reference microphone signal is denoted
fθ (. ) in Fig. 3.
[0051] The audio signal 301 can comprise any suitable type of audio content. In some examples
the audio signals could comprise sound signals with varying magnitude. The sound signals
could comprise different languages or other types of sound signals. The different
languages could be Turkish, English, Italian, Arabic, or any other suitable languages.
[0052] Before the model 303 is trained it can be configured in an initial state. In the
initial state the weights or other parameters of the model 303 can be set to any suitable
value. In some examples the initial weights of the model 303 can be set to unity.
[0053] The audio signal 301 is also provided to a digital to analogue converter 305 to generate
the analogue audio signal
x̃(
t). The analogue audio signal
x̃(
t) is then provided as an input to a speaker 107.
[0054] The speaker 107 is configured to convert an input electrical signal into an output
acoustic signal 307. In this example the acoustic signal 307 comprises a pressure
wave that could be detected by a user's ear. The output signal can be any suitable
acoustic signal. The output acoustic signal could comprise content that a user might
be listening to such as music or people talking or any other suitable type of audio.
The output acoustic signal 307 is denoted
ỹ(
t) in this example.
[0055] To enable training of the model 303 one or more microphones 109 are configured to
detect the acoustic signal 307 that is output from the speaker 107. The microphone
109 can be an inward facing microphone so that when the earphone 103 is in use the
microphone 109 would be facing inwards to the user's ear canal. The one or more microphones
109 can comprise any means that can be configured to convert an acoustic input signal
into a corresponding electrical output signal. The microphones 109 can be part of
an earphone 103.
[0056] The output microphone signal from the microphone 109 is provided to an analogue to
digital converter 309. The analogue to digital converter 309 is configured to convert
the analogue output from the microphone 109 into a digital signal. The digital microphone
signal is denoted
y[
n] in Fig. 3. This provides an obtained microphone signal that can be compared to the
reference microphone signal of the model 303.
[0057] For the purposes of training the model 303 the output of the microphone 109 is the
ground truth and the audio signal 301 is the stimuli.
[0058] The comparison between the predicted microphone signal
fθ(.) from the model 303 and the obtained output of the microphones 109 enables the
output of the model 303 to be evaluated relative to the actual output of the microphone
109. The comparison provides an error signal 311 as an output. The error signal 311
can comprise an error value or other indication of the difference between the estimated
speaker output and the actual speaker output.
[0059] The error value or other indication can be provided as an input to the model 303.
One or more weights or other parameters of the model 303 can be adjusted in response
to the error input. For example, if the model 303 comprises a neural network one or
more weights of the neural network can be adjusted. The weights or other parameters
can be updated to decrease the loss between the output of the model 303 and the output
of the microphone 109.
[0060] The process shown in Fig. 3 can be repeated until the outputs of the model 303 have
sufficiently converged with the output of the microphone 109. The outputs of the model
303 can be determined to have converged with the output of the microphone if the error
value is below a given threshold or if any other suitable criteria has been satisfied.
[0061] Once the outputs of the model 303 have been determined to be converged the model
303 is considered to be trained. The trained model 303 can then be used in systems
such as the system 101 of Fig. 1 and used to determine a quality of a seal between
a user's ear and an earphone 103.
[0062] Any suitable functions can be used in the model 303 to approximate the non-linearities
of the speaker 107 and/or microphone 109. In some examples a Volterra filter can be
used. It can be assumed that Volterra kernels have a finite memory length N. The not
cubic or higher order terms can be ignored for computational ease because the effect
of higher distortions is less. The input-output relation of the speaker 107 and the
microphone 109 can be represented by:

[0063] Where
x[n] is the input signal and
y[n] is the output signal,
h1[
k1] and
h2[
k1,
k1] are the first and second-order discrete Volterra kernels. This equation can also
be written as:

where:

[0064] And
w1 and w2 are weight vectors. The filter coefficients can be obtained using a Normalised least
mean square algorithm. The update rule for the weights
w can be written as:

[0065] Where:

[0066] And

[0067] And
µ is the step size and
η is a small positive number to avoid division by zero and
e(n) is the error between the output of the model 303 and the output of the microphone
109. Other types of functions could be used in other examples.
[0068] The training of the model 303 can be performed offline. The training of the model
303 only needs to be performed once. For example, the training can be performed when
new earphone 103 is designed and built. The training of the model 303 only needs to
be performed for a new type of earphone 103 and not for all units of the earphone
103 that are manufactured. The trained model 303 can then be provided to the relevant
devices for deployment.
[0069] Fig. 4 shows deployment of an example trained model 403. The trained model 403 can
be deployed after the training of the model 303 from Fig. 3 has been completed. The
training could be the process of Fig. 3, or any other suitable training process.
[0070] When the trained model 403 is deployed the earphone 103 comprising the speaker 107
and the microphone 109 is located in or over a user's ear. This means that the microphone
109 will detect leaked external audio and also reflections and distortions of the
acoustic signals from the space created by the user's ear and the earphone 103. For
example, if the earphone 103 is an in ear phone the reflections and distortions can
be from the ear canal that is, at least partly blocked, by the earphone 103.
[0071] In the example deployment shown in Fig. 4 an audio signal 401 is provided as an input
to the trained model 403. The audio signal 401 is denoted as
x[n] in Fig. 4. The audio signal 401 can comprise any suitable type of audio. The audio
signal 401 does not need to be the same as the audio signal 301 that is used for training
the model 303. The audio signal 401 does not need to be a reference signal or contain
any specific parameters. The audio signal 401 can comprise audio content that a user
is listening to, for example it could comprise speech or music or other types of content.
[0072] The trained model 403 provides a reference microphone signal
fθ(. ) as an output. The reference microphone signal
fθ(. ) is an estimation of a signal that should be expected from a microphone 109.
[0073] The audio signal 401 is also provided to a digital to analogue converter 305 to generate
the analogue audio signal
x̃(
t). The analogue audio signal
x̃(
t) is then provided as an input to the speaker 107.
[0074] The speaker 107 is configured to convert an input electrical signal into an output
acoustic signal 407. The output acoustic signal 407 is denoted
ỹ(
t) in this example. The output acoustic signal 407 provides audio content that a user
can hear. The output acoustic signal 407 is dependent upon the audio signal 401. The
acoustic signal 407 can then be captured by the microphone 109.
[0075] The microphone 109 captures the acoustic signal 407 and provides a microphone signal
as an output in response to the captured acoustic signal 407. The microphone signal
is provided to an analogue to digital converter 309. The analogue to digital converter
309 is configured to convert the analogue output from the microphone 109 into a digital
signal. This provides an obtained microphone signal.
[0076] The obtained microphone signal can be expressed as:

[0077] Where Λ(
x[
n]) represents the captured waveform of the signal emitted from the speaker 107, r[n]
represents the reflections and distortions, and
v[
n] is the external or outside noise as measured by the in-ward facing microphone 109.
In this example the earphone 103 is in an in ear device and the reflections can come
from a user's ear canal and ear drum. If the earphone 103 is an over the ear device
then the reflections would come from other parts of the user's ear. The distortions
can be caused by putting the speaker 107 under pressure.
[0078] The obtained microphone signal
y[
n] can be compared to the reference microphone signal
fθ(
. ) to determine the amount of reflections and external noise 409 in the obtained microphone
signal y[n]. This can give an indication of the quality of seal between the earphone
and the user's ear.
[0079] Any suitable processing can be used to determine the amount of reflections and external
noise in the obtained microphone signal
y[
n]. In some examples one or more parameters indicative of the amount of reflections
and noise in the obtained microphone signal
y[
n] can be calculated and used to provide an indication of the quality of the seal.
[0080] When the quality of the seal between the earphone 103 and the user's ear is poor
the air leakage will be high. This will result in less distortions and reflections.
This will result in a smaller value for α in the equation for y[
n]. Similarly, if the seal is poor this will increase the amount of external noise
that is detected by an inward facing microphone 109. This will result in a higher
value for β in the equation for y[
n]. Therefore, the ratio between α and β gives information about the quality of the
seal between the earphone 103 and the user's ear. Separating α and β from the obtained
microphone signal is not trivial because x[n] and r[n] are correlated.
[0081] The obtained microphone signal y[n] can be compared to the reference microphone signal
fθ(. ) from the trained model 403. Subtracting the obtained microphone signal
y[
n] from the reference microphone signal
fθ(
. ) gives the resultant signal
ỹ[
n] which can be denoted as:

[0082] Where
e[
n] is the error between two nonlinear functions f
θ(.), Λ(.). That is,
e[
n] gives the error between the trained model 403 of the speaker 107 and microphone
109 and the actual performance of the speaker 107 and microphone 109. The trained
model 403 can be assumed to be a good approximation of the speaker 107 and microphone
109, so the error signal can be ignored. This results in:

[0083] The correlation between the resultant signal
ỹ[
n] and the reference microphone signal
fθ(
. ) is:

[0084] The external noise that is captured by the inward facing microphone 109 will be uncorrelated
with the output of the speaker 107 and the trained model 403 that represents the speaker
107. This means that the term relating to the correlation with the external noise
can be ignored so that:

[0085] This gives the overall reflection and distortions from the users ear that are captured
by the microphone 109. This can be used to estimate a sealing coefficient
K. The sealing coefficient
K can be obtained normalizing the correlation to give:

[0086] The obtained sealing coefficient
K gives information about the overall magnitude of reflections. Lower values of
K indicate a lower sealing quality. For example, a value of
K that is close to zero indicates that there is a poor sealing and not many reflections
while a value of
K that is close to one indicates that there is good sealing and a larger amount of
reflections.
[0087] The sealing coefficient
K is obtained using a magnitude-based normalization. This can provide a more reliable
discriminator between poor sealing and good sealing compared to normalizations where
the autocorrelations at zero lag equal 1. This is mainly because the amplitude of
the input signal also changes the magnitude of reflections and distortions. If a normalization
using zero lag is used, for example a Pearson correlation coefficient, then the magnitude
of the input signal does not affect the resulting sealing coefficient
K and does not give a correct indication of the amount of vibrations in the obtained
microphone signal and therefore would not give a good indication of the sealing quality.
[0088] The reflections
r [
n] that come from the ear canal and eardrum or other parts of the user's ear are a
function of frequency. The function will change from person to person. However, the
overall pattern of the reflections will be the same for different users because of
the occlusion effect which is an enhancement of low-frequency components of sounds
in an occluded ear canal. Therefore, in addition to the use of the sealing coefficient
K other parameters can be used. The other parameters can be selected so as to address
the dependency on the user for estimating the sealing quality.
[0089] A good seal between the earphone 103 and the user's ear will leave traces in the
low frequency bands of the obtained microphone signals due to the occlusion effect.
[0090] These traces in the low frequency bands can preserve some structure of the motion
or body signals of the user 111. For example, it can preserve structure of the heartbeats
chewing motions, walking motions or other suitable body signals. These low frequency
traces can be distinguished from the acoustic signal played back by the speaker 107
and detected by the microphone 109 and analysed to give further parameters that can
be used as an indication of the quality of the seal.
[0091] In some examples a further parameter that can be used as an indication of the quality
of the seal could be a proportion of energy in a low frequency range. Any suitable
means or process can be used to determine the proportion of energy in the low frequency
range. In some examples the proportion of energy in a low frequency range can be determined
by transforming the resultant signal
ỹ[
n] to obtain energy levels for different frequency points. A fast Fourier transform,
or any other suitable type of transform, can be used to transform the signal. The
sum of energies at low frequencies can then be divided by the sum of the energies
across all frequency ranges. In some examples the low frequency cut off could be 50Hz
so that the ratio is given as:

where
Ỹ[
k] is the FFT of the resultant signal
ỹ[
n]. The FFT can use any suitable resolution. In some examples the FFT could use 1Hz
resolution.
[0092] The ratio,
KLP, or any other suitable parameter, gives information about the low-frequency boost
caused by a seal. The ratio
KLP will be higher if there is a good seal between the user's ear and the earphone 103
and will be lower if there is a poor seal between the user's ear and the earphone
103. This enables the ratio
KLP to be used as another parameter that indicates the quality of the seal.
[0093] In some examples external artefacts in the ear canal, or other parts of the user's
ear, can be removed before the ratio
KLP is used as a discriminator. The external artefacts can originate from the user's
body. For example, they could be caused by walking, heartbeats, chewing or any other
suitable factor. The artefacts can be conducted to the user's ear canal through bone
conduction and amplified within the same low frequency region. If the seal is poor
then the artefacts could be caused by external sounds.
[0094] Any suitable means or process can be used to discriminate the artefacts from an increase
in the low frequency energy levels caused by the seal. In some examples the output
reference signal from the trained model 403 can be used. If the trained model 403
has been trained with the speaker 107 and the microphone 109 in open air so that they
are not positioned within a user's ear then the output of the trained model 403 represents
the signal captured by the microphone 109 when there is no sealing. Therefore a correlation
coefficient between the output of the trained model 403 and the captured microphone
signal 109 can provide another parameter that can be used to estimate the sealing
quality.
[0095] Any suitable correlation coefficient can be used. In some examples a low pass filter
can be applied to both the output of the trained model 403 and the resultant signal
ỹ[
n] and then a Pearson coefficient, or other suitable coefficient can be calculated
between them.
[0096] In some examples two or more of the parameters can be used to determine the quality
of the seal. Using more than one parameter can provide a more robust indication of
the quality of the seal. Other parameters and/or combinations of parameters could
be used in other examples.
[0097] Fig. 5, 6 and 7 shows results obtained for examples of the disclosure. These results
were obtained from experiments performed using twelve participants. The twelve participants
included two females and ten males with an age range from 24 to 50. The participants
were informed about the goal of the experiments and asked to perform several tasks
in their natural way. The experiments were performed in an office environment while
the participants wore an earbud prototype at the usual position.
[0098] The participants positioned the earbud for good, average and bad sealing cases according
to the mean reduction of six frequencies. For the respective earbud positions (Good,
Average and Bad), each participant is asked to repeat all the situations twice, once
in a quiet environment and once with external noise provided by a speaker.
[0099] Three different situations were used to represent daily life usage of earbuds. The
different situations used for these experiments were running, talking and sitting.
For talking, each participant was asked to speak during the recording in their normal
tone. Overall, six different cases (three situations in two different environments)
were performed by the subjects. In total two minutes of audio data from different
users were collected and used for overall evaluation. The system used does not have
a user registration or user-specific training and so all recorded data was used for
the evaluation. Figs. 5 to 7 represent all data points collected during the study.
[0100] Figs. 5 and 6 show a three-dimensional plot for the result obtained from the experiments
conducted. Figs. 5 and 6 show the same plot from two different perspectives. The different
axis in the plot show the different parameters for indicating seal quality as described
above. The x axis shows a magnitude base correlation, the y axis shows a low frequency
correlation, and the z axis shows a low frequency ratio. The different symbols used
for the points indicate whether the seal was good or bad or average. Figures 5 and
6 clearly show that good, bad or average regions can be clearly defined using the
respective parameters.
[0101] Fig. 7 shows the t-distributed stochastic neighbour embedding of extracted features
using implementations of the disclosure. This shows that the above-mentioned parameters
are discriminative for the sealing quality. Also, the overlapping region between Good-Average
and Average-Bad shows that discrete and continuous sealing quality estimation is crucial
for earphones 103 because it is much more informative compared to a simple indication
of good or bad.
[0102] Examples of the disclosure could be used in different use cases and scenarios. Examples
of the disclosure could be used for long-term continuous monitoring of ear canal sealing
with ear buds. In such use cases the examples of the disclosure enable a measurement
of the air leakage level without using a test or reference signal. This continuous
monitoring will therefore not be intrusive because it will not interfere with audio
content that the user is listening to.
[0103] Examples of the disclosure can also be used to provide information about the quality
of the seal to other applications. The other applications could be non-audio applications
such as face recognition, user authentication, respiration rate monitoring, activity
recognition, or any other suitable type of application that uses sensors in an earphone
103. Having information about the quality of the seal can enable the applications
to control which sensors or sensor combinations are used for the relevant applications.
For example, a respiration rate monitoring application can use any of the inertial
measurement unit, Photoplethysmography sensors, and speaker-microphone pairs together
or separately. If implementations of the disclosure are used the respiration rate
monitoring application can select which of these to use based on the sealing quality
and how this affects the respective sensors.
[0104] In some cases implementations of the disclosure can be used to provide feedback to
active noise cancellation applications to prevent divergence. Active noise cancellation
functions to cancel or remove unwanted noise by introducing an additional, electronically
controlled sound field referred to as anti-noise. The anti-noise is electronically
designed to have the proper pressure amplitude and phase that destructively interferes
with the unwanted noise or disturbance. If the active noise cancellation algorithms
are used in scenarios with high leakage then the adaptive filters will increase the
gain more and more so as to cancel the noise, leading to a possible divergence. This
can be prevented by providing feedback about the seal quality to the active noise
cancellation algorithm.
[0105] In some examples implementations of the disclosure could be used to reduce power
consumption of the earphones 103. For instance, it can be used to detect the presence
or absence of an ear by detecting whether there is a seal or not. The audio content
can then be controlled based on the earphone 103 is still positioned next to the user's
ear.
[0106] Fig. 8 shows example earphones 103 that could be used in examples of the disclosure.
The example earphones 103 comprise earbuds 801. Other types of earphones 103 could
be used in other examples.
[0107] The earbuds 801 of Fig. 8 comprise a housing 803 and an in ear portion 805. The in
ear portion 805 is sized and shaped to fit into the ear of a user. When the earbuds
are in use the in ear portion is inserted into the ear of the user.
[0108] The housing 803 can be configured to house an apparatus or any other suitable control
means for controlling the earphones 103. In some examples the apparatus could be housed
in a different device such as a mobile phone or other personal electronic device.
An example apparatus is shown in Fig. 9.
[0109] Fig. 9 schematically illustrates an apparatus 901 that can be used to implement examples
of the disclosure. In this example the apparatus 901 comprises a controller 903. The
controller 903 can be a chip or a chip-set. In some examples the controller 903 can
be provided within any suitable device such as earphones or a device such as a smartphone
that can be configured to communicate with the earphones.
[0110] In the example of Fig. 9 the implementation of the controller 903 can be as controller
circuitry. In some examples the controller 903 can be implemented in hardware alone,
have certain aspects in software including firmware alone or can be a combination
of hardware and software (including firmware).
[0111] As illustrated in Fig. 9 the controller 903 can be implemented using instructions
that enable hardware functionality, for example, by using executable instructions
of a computer program 909 in a general-purpose or special-purpose processor 905 that
may be stored on a computer readable storage medium (disk, memory etc.) to be executed
by such a processor 905.
[0112] The processor 905 is configured to read from and write to the memory 907. The processor
905 can also comprise an output interface via which data and/or commands are output
by the processor 905 and an input interface via which data and/or commands are input
to the processor 905.
[0113] The memory 907 stores a computer program 909 comprising computer program instructions
(computer program code) that controls the operation of the controller 903 when loaded
into the processor 905. The computer program instructions, of the computer program
909, provide the logic and routines that enables the controller 903. to perform the
methods illustrated in the accompanying Figs. The processor 905 by reading the memory
907 is able to load and execute the computer program 909.
[0114] The apparatus 901 comprises:
at least one processor 905; and
at least one memory 907 including computer program code 911;
the at least one memory 907 and the computer program code 911 configured to, with
the at least one processor 905, cause the apparatus 901 at least to perform:
comparing 201 a microphone signal obtained from at least one microphone located in
an earphone with a reference microphone signal obtained using a model of the at least
one microphone and at least one speaker so as to obtain a correlation between the
obtained microphone signal and the reference microphone signal;
processing 203 the correlation between the obtained microphone signal and the reference
microphone signal to determine a quality of sealing between the earphone and a user's
ear.
[0115] As illustrated in Fig. 9, the computer program 909 can arrive at the controller 903
via any suitable delivery mechanism 913. The delivery mechanism 913 can be, for example,
a machine readable medium, a computer-readable medium, a non-transitory computer-readable
storage medium, a computer program product, a memory device, a record medium such
as a Compact Disc Read-Only Memory (CD-ROM) or a Digital Versatile Disc (DVD) or a
solid-state memory, an article of manufacture that comprises or tangibly embodies
the computer program 909. The delivery mechanism can be a signal configured to reliably
transfer the computer program 909. The controller 903 can propagate or transmit the
computer program 909 as a computer data signal. In some examples the computer program
909 can be transmitted to the controller 903 using a wireless protocol such as Bluetooth,
Bluetooth Low Energy, Bluetooth Smart, 6LoWPan (IP
v6 over low power personal area networks) ZigBee, ANT+, near field communication (NFC),
Radio frequency identification, wireless local area network (wireless LAN) or any
other suitable protocol.
[0116] The computer program 909 comprises computer program instructions for causing an apparatus
901 to perform at least the following or for performing at least the following:
comparing 201 a microphone signal obtained from at least one microphone located in
an earphone with a reference microphone signal obtained using a model of the at least
one microphone and at least one speaker so as to obtain a correlation between the
obtained microphone signal and the reference microphone signal;
processing 203 the correlation between the obtained microphone signal and the reference
microphone signal to determine a quality of sealing between the earphone and a user's
ear.
[0117] The computer program instructions can be comprised in a computer program 909, a non-transitory
computer readable medium, a computer program product, a machine readable medium. In
some but not necessarily all examples, the computer program instructions can be distributed
over more than one computer program 909.
[0118] Although the memory 907 is illustrated as a single component/circuitry it can be
implemented as one or more separate components/circuitry some or all of which can
be integrated/removable and/or can provide permanent/semi-permanent/ dynamic/cached
storage.
[0119] Although the processor 905 is illustrated as a single component/circuitry it can
be implemented as one or more separate components/circuitry some or all of which can
be integrated/removable. The processor 905 can be a single core or multi-core processor.
[0120] References to 'computer-readable storage medium', 'computer program product', 'tangibly
embodied computer program' etc. or a 'controller', 'computer', 'processor' etc. should
be understood to encompass not only computers having different architectures such
as single /multi- processor architectures and sequential (Von Neumann)/parallel architectures
but also specialized circuits such as field-programmable gate arrays (FPGA), application
specific circuits (ASIC), signal processing devices and other processing circuitry.
References to computer program, instructions, code etc. should be understood to encompass
software for a programmable processor or firmware such as, for example, the programmable
content of a hardware device whether instructions for a processor, or configuration
settings for a fixed-function device, gate array or programmable logic device etc.
[0121] As used in this application, the term 'circuitry' may refer to one or more or all
of the following:
- (a) hardware-only circuitry implementations (such as implementations in only analog
and/or digital circuitry) and
- (b) combinations of hardware circuits and software, such as (as applicable):
- (i) a combination of analog and/or digital hardware circuit(s) with software/firmware
and
- (ii) any portions of hardware processor(s) with software (including digital signal
processor(s)), software, and memory or memories that work together to cause an apparatus,
such as a mobile phone or server, to perform various functions and
- (c) hardware circuit(s) and or processor(s), such as a microprocessor(s) or a portion
of a microprocessor(s), that requires software (for example, firmware) for operation,
but the software may not be present when it is not needed for operation.
This definition of circuitry applies to all uses of this term in this application,
including in any claims. As a further example, as used in this application, the term
circuitry also covers an implementation of merely a hardware circuit or processor
and its (or their) accompanying software and/or firmware. The term circuitry also
covers, for example and if applicable to the particular claim element, a baseband
integrated circuit for a mobile device or a similar integrated circuit in a server,
a cellular network device, or other computing or network device.
[0122] The blocks illustrated in Fig. 2 can represent steps in a method and/or sections
of code in the computer program 909. The illustration of a particular order to the
blocks does not necessarily imply that there is a required or preferred order for
the blocks and the order and arrangement of the blocks can be varied. Furthermore,
it can be possible for some blocks to be omitted.
[0123] The term 'comprise' is used in this document with an inclusive not an exclusive meaning.
That is any reference to X comprising Y indicates that X may comprise only one Y or
may comprise more than one Y. If it is intended to use 'comprise' with an exclusive
meaning then it will be made clear in the context by referring to "comprising only
one..." or by using "consisting".
[0124] In this description, the wording 'connect', 'couple' and 'communication' and their
derivatives mean operationally connected/coupled/in communication. It should be appreciated
that any number or combination of intervening components can exist (including no intervening
components), i.e., so as to provide direct or indirect connection/coupling/communication.
Any such intervening components can include hardware and/or software components.
[0125] As used herein, the term "determine/determining" (and grammatical variants thereof)
can include, not least: calculating, computing, processing, deriving, measuring, investigating,
identifying, looking up (for example, looking up in a table, a database or another
data structure), ascertaining and the like. Also, "determining" can include receiving
(for example, receiving information), accessing (for example, accessing data in a
memory), obtaining and the like. Also, " determine/determining" can include resolving,
selecting, choosing, establishing, and the like.
[0126] In this description, reference has been made to various examples. The description
of features or functions in relation to an example indicates that those features or
functions are present in that example. The use of the term 'example' or 'for example'
or 'can' or 'may' in the text denotes, whether explicitly stated or not, that such
features or functions are present in at least the described example, whether described
as an example or not, and that they can be, but are not necessarily, present in some
of or all other examples. Thus 'example', 'for example', 'can' or 'may' refers to
a particular instance in a class of examples. A property of the instance can be a
property of only that instance or a property of the class or a property of a sub-class
of the class that includes some but not all of the instances in the class. It is therefore
implicitly disclosed that a feature described with reference to one example but not
with reference to another example, can where possible be used in that other example
as part of a working combination but does not necessarily have to be used in that
other example.
[0127] Although examples have been described in the preceding paragraphs with reference
to various examples, it should be appreciated that modifications to the examples given
can be made without departing from the scope of the claims.
[0128] Features described in the preceding description may be used in combinations other
than the combinations explicitly described above.
[0129] Although functions have been described with reference to certain features, those
functions may be performable by other features whether described or not.
[0130] Although features have been described with reference to certain examples, those features
may also be present in other examples whether described or not.
[0131] The term 'a', 'an' or `the' is used in this document with an inclusive not an exclusive
meaning. That is any reference to X comprising a/an/the Y indicates that X may comprise
only one Y or may comprise more than one Y unless the context clearly indicates the
contrary. If it is intended to use 'a', 'an' or `the' with an exclusive meaning then
it will be made clear in the context. In some circumstances the use of 'at least one'
or 'one or more' may be used to emphasis an inclusive meaning but the absence of these
terms should not be taken to infer any exclusive meaning.
[0132] The presence of a feature (or combination of features) in a claim is a reference
to that feature or (combination of features) itself and also to features that achieve
substantially the same technical effect (equivalent features). The equivalent features
include, for example, features that are variants and achieve substantially the same
result in substantially the same way. The equivalent features include, for example,
features that perform substantially the same function, in substantially the same way
to achieve substantially the same result.
[0133] In this description, reference has been made to various examples using adjectives
or adjectival phrases to describe characteristics of the examples. Such a description
of a characteristic in relation to an example indicates that the characteristic is
present in some examples exactly as described and is present in other examples substantially
as described.
[0134] The above description describes some examples of the present disclosure however those
of ordinary skill in the art will be aware of possible alternative structures and
method features which offer equivalent functionality to the specific examples of such
structures and features described herein above and which for the sake of brevity and
clarity have been omitted from the above description. Nonetheless, the above description
should be read as implicitly including reference to such alternative structures and
method features which provide equivalent functionality unless such alternative structures
or method features are explicitly excluded in the above description of the examples
of the present disclosure.
[0135] Whilst endeavoring in the foregoing specification to draw attention to those features
believed to be of importance it should be understood that the Applicant may seek protection
via the claims in respect of any patentable feature or combination of features hereinbefore
referred to and/or shown in the drawings whether or not emphasis has been placed thereon.