FIELD OF THE INVENTION
[0001] The present invention relates to a computer-implemented method for updating at least
one frequency-domain filter coefficient of an echo canceller having at least one channel
and at least one segment per channel, the filter coefficients of the echo canceller
being updatable in the frequency-domain at a time block. The invention further relates
to an echo canceller configured to execute said method.
BACKGROUND OF THE INVENTION
[0002] Acoustic echo is a major impairment present in audio communications, such as in videoconferencing,
in-car voice communications, voice inter-faces, and human-machine dialogue systems.
Multi-channel acoustic echo (see
"Acoustic multi-channel echo cancellation," EP 2438766 B1 and
"Advances in digital speech transmission," Wiley, 2008) manifests when the sound from a plurality of loudspeakers connected to a terminal
are captured by the microphone connected to said terminal. The terminal can refer
to a desktop computer, or mobile phone, tablet, voice-commanded assistant, dedicated
audioconferencing equipment, hand-free car telephony, et cetera.
[0003] As said terminal is usually placed within a room, the multiple sound reflections
at the room walls arrive at the microphone at different time instants and intensities,
hence creating a large number of acoustic echoes. The characteristics of said echo
"fingerprint" is sensitive to the location of the microphone and loudspeaker(s), the
room geometry and the objects and persons present therein.
[0004] The echo has a negative impact in the communication because participants in a videoconference
hear their voices echoed as they speak, speech interfaces get "drowned" by the sounds
(such as music) played by their loudspeaker system, and persosn engaged in a mobile
phone call to a car driver hear their own voices echoed inside the car chamber.
[0005] While mono systems still represent a good number of said terminals, multi-channel
audio systems are increasingly appearing: car sound systems are at least stereo, dedicated
videoconferencing equipment may use several loudspeakers for different participants,
thus creating a more realistic meeting experience, and embedded voice interfaces are
commonly integrated within stereo-sound equipment.
[0006] The present invention focuses on the problem scenario illustrated in Fig. 1, wherein
a plurality of audio sources 10, e. g. audio channels, are played by a plurality of
loudspeakers 20 in a room, and all the sounds in the room are captured by only one
microphone 30. Its extension to more than one microphone (for instance to transmit
stereo sound) is deemed straightforward, as the present invention can be applied on
every micro-phone signal separately. Henceforth, "effective" loudspeaker refers to
all electroacoustic device(s) that reproduce the same audio source or a linearly filtered
version thereof. For instance if an stereo sound-reproduction system plays a single-channel
(mono) audio source, the number of effective loudspeakers is one. Another example
is a car stereo system, which may count several (more than two) physical loudspeakers,
but if the car sound system is for instance reproducing stereo music, the number of
effective loudspeakers, e. g. audio channels, is two.
[0007] The signal
z(
n) captured by the microphone 30 upon undergoing an analogue to digital conversion
can be expressed as follows

wherein
n is discrete time,
C is the number of far-end audio channels,
si(
n) is the far-end audio source at the
ith-channel,
hi(
l) is the equivalent discrete-time echo path between the
ith effective loudspeaker and the microphone,
LH is the total length in samples of the echo path,
d(n) is the main (or desired) near-end sound signal, and
v(
n) is the near-end background noise. The last term
p(
n) refers to the deviations from the linear echo model in (1), such as due to non-linear
characteristics of the loudspeaker(s). The main near-end signal
d(
n) may result from several sources within the room. Usually
d(
n) is speech, music, or both, et cetera, of semantic value to the far-end listener,
hence it is to be transmitted with minimum distortion. On the other hand, the near-end
background noise
u(
n), assumed to be nearly stationary, carries relevant contextual information of the
near-end acoustic scene.
[0008] The objective of any acoustic echo reduction system is threefold:
- 1. to reduce the echo components in the microphone signal z(n) to the extent that they become inaudible,
- 2. to preserve the quality of the main near-end signal d(n) present in the microphone signal z(n) as much as possible, and
- 3. to maintain the perceptual impression of the near-end background noise v(n) present in the microphone signal z(n).
[0009] An acoustic echo canceller (AEC), a residual-echo non-linear processor (NLP), and
a comfort-noise injector (CNI), all governed by an echo control apparatus, are meant
to execute respectively said threefold objective.
[0010] In the following, a short overview of the technological background of the echo reduction
system is given with reference to Figs. 1 and 2.
[0011] A basic multi-channel and multi-segment acoustic echo canceller 200 comprises a plurality
of digital linear transversal filters 210, such that the response related to the
ith channel is built as

wherein
S is the number of segments per channel,
L is the filter length in each segment,
wi,j(
l) corresponds to the filter coefficients of the
ith channel and
jth segment, and
xi,j(
n) is the input signal to filter
wi,j(
l) defined as

[0012] The concatenation of the filters from all
S segments in the
ith channel builds the impulse response of the echo canceller for said
ith channel.
[0013] The synthetic echo model (2) is meant to deliver a signal that resembles the echoes
in
z(
n) related to the
ith channel. The global canceller output is built in 220 additively with the contributions
from all
C channels as follows

resulting in an output
e(
n) of reduced echo content.
[0014] A criterion to select the filter length
L is namely to enclose large quasi-stationary intervals of the far-end signals
si(
n). For instance, most speech phonemes are found within few tens of milliseconds, but,
in longer ranges, speech usually contains at least two phonemes of different spectral
characteristics. A stereo speech-based echo control for automotive spaces with echo
path lengths in the order of 40-milliseconds could be designed as follows: the echo
canceller would have two channels, e. g.
C = 2, and at least two segments per channel, e. g.
S ≥ 2, with a filter length
L equivalent to 20-milliseconds worth of signal samples, hence with four transversal
filters,
C ×
S = 2 × 2 = 4. Having different lengths for each filter could be possible too.
[0015] Because
L is usually a large number, the time-domain convolution (2) turns out a costly operation.
As alternative thereto, the response of the canceller can be efficiently computed
in the frequency domain with
N-point blocks, such that
N is chosen as a power of two for better computational efficiency. Said frequency-domain
block-based convolution is obtained as follows

wherein
F is the
N-point discrete Fourier transform (DFT) matrix (
F-1 corresponds thus to the inverse discrete Fourier transform), and the operation o
denotes element-by-element multiplication. The
N-point column vectors
Wi,j and
Xi,j, which refer respectively to the frequency-domain weights and far-end input block
of the
ith channel and
jth segment, are built as follows

wherein
T denotes transpose.
[0016] The operation (5) delivers exactly
M =
N -
L + 1 valid samples of the linear convolution, said samples being located in the last
M rightmost elements of vector
u. The
M valid errors, obtained with the difference between the microphone signal and the
overall canceller response as per (4), are stored in vector e for further processing

[0017] In order to obtain a canceller response (5) closer to the actual echo in the microphone
signal
z(
n), the frequency-domain weights
Wi,j of the
ith channel and
jth segment are updated via the fast LMS algorithm (see "
Acoustic echo devices and methods," US 2006/0018460 A1) as follows

wherein

contains the updated weights,
E is the frequency-domain counterpart of the canceller output
e, obtained as follows

the operator * denotes complex conjugate, the vector operator / represents element-by-element
division, |
Xi,j|
2 is a vector built with the square magnitude of every complex-valued element in vector
Xi,j, ε > 0 prevents division by zero or values near zero, and
µ is the step size.
[0018] Because the time-domain counterpart of the new weight vector

should contain zeroes in its rightmost side (6), the new spectral weights (9) can
be further updated by returning said weights to the time domain, resetting (to zero)
the last
N -
L taps, and returning the result back to the frequency domain. Said time-constraint
operation demands two additional DFT operations, hence increasing the computation
complexity. If desired, this step can be avoided at the price of slower convergence
(see
"Unconstrained frequency-domain adaptive filter," IEEE Transactions on Acoustics, Speech
and Signal Processing, October, 1982). As this operation is well known by any expert in the art, we use (9) as the generic
representation of the weight update rule, which encompasses the option to perform
(or not) said time-constraint operation.
[0019] It is known that the convergence of said adaptive system (9) is assured for a step
size in the range 0 <
µ < 2. However, the range 0 <
µ ≤ 1 is considered in the practice, whose limits correspond to frozen update
µ = 0 and fastest update
µ = 1 in interference-free conditions.
[0020] The generic multi-channel and multi-segment adaptive system (9) has been matter of
thorough study (see
"Generalized multichannel frequency-domain adaptive filtering: efficient realization
and application to hands-free speech communication," Signal Processing 85, pp. 549-570,
2005, "
Acoustic echo devices and methods," US 2006/0018460 A1, "
Acoustic multi-channel echo cancellation," EP 2438766 B1, "Method and apparatus for multi-channel audio processing using single-channel components," US 8233632 B1, and
"Subband echo cancellation method for multi-channel audio teleconference and echo
canceller using the same," US 6246760 B1).
[0021] In order for the prior state of art to succeed in real scenarios, several challenges
are yet to be effectively solved:
- The presence of non-cancellable terms, such as d(n) and v(n), in error E corrupts the update (9), which as result exhibits unstable behaviour.
Double-talk detectors (DTT) aim to prevent this drawback by detecting when it is safe
to train the adaptive system (see "Enhanced echo cancellation," US 8873740 B2). So far, DTTs have not proven accurate nor quick enough, especially under severe
double-talk. Hence, robustness against any type of double-talk is yet to be effectively
solved.
- As the global error E is used for the update (9) of every filter, the misalignment
of any filter is propagated to the remaining C × S - 1 filters, and vice versa. Despite being considered intrinsic in adaptive filtering,
this fact harms the canceller's learning abilities. For instance, in web-based audioconferencing,
user motion or change of the terminal orientation has larger impact on early echoes.
It is thus desirable to count with a mechanism of true filter-independent update.
[0022] Upon the previous two drawbacks, attempting to adapt the filters with the canceller
output E according to the state of art results in unstable and slow convergence, hence
failing to reduce the echo quick enough to acceptable levels.
[0023] Upon the echo cancellation, the canceller output E may still contain audible echo
because:
- the echo path length LH is larger than the canceller length, LH ≫ S × L, hence the late-arriving echoes cannot be cancelled,
- the adaptive canceller is yet to converge to the "steady state", and
- non-linear echo components, characterized by p(n) in (1), cannot be removed by the linear echo canceller (5).
[0024] The canceller output can be thus rewritten in the following additive model

wherein
D corresponds to the main near-end signal,
V is the near-end background noise,
P corresponds to the non-linear echo components,
Q represents the echo beyond the time scope of the canceller, and
Ri,j is the residual echo related to the
ith channel and
jth segment.
[0025] In order to reduce effectively the residual echo
P, Q, and
Ri,j to inaudible levels, a non-linear processing 300 (NLP) stage is to act over the canceller
output
E, delivering the echo-free output
O. Several strategies are possible thereto, such as filtering, spectral subtraction,
or by following principles of psychoacoustics, this last one being the preferred option
when dealing with music signals. Let us consider for the purpose of illustration a
filtering-based approach

wherein 0 ≤
BNLP ≤ 1 is the frequency-response of the NLP filter.
[0026] It is important to remark that the non-linear processing 300 (12), unlike echo cancellation
200, introduces distortion to the main near-end signal
D present in the canceller output
E, which creates a delicate tradeoff between residual echo reduction versus main near-end
signal distortion. The stage of echo cancellation plays a decisive role in said tradeoff:
if the echo canceller is notably misaligned, the NLP filtering must act aggressively
over the residual echo at the risk of damaging the main near-end signal; conversely,
if the canceller manages to remove efficiently most of the echo, the impact of the
NLP filter over the main near-end signal may go unnoticed. Therefore, the accurate
estimation of all residual echo terms,
P, Q, and
Ri,j present in the canceller output
E is important for an effective NLP stage.
[0027] Residual echo suppression is concern of recent prior art, such as in "
Acoustic echo suppression," US 2017/0118326 A1, wherein the activation of the echo suppressor depends on one or more transient or
steady-state parameters. In another recent invention,
"Robust acoustic echo cancellation for loosely paired devices based on semi-blind
multichannel demixing," US 2016/0029120 A1, a semi--blind multichannel source separation is performed to decompose the audio
signals into a near-end source signal and residual echoes based on independent component
analysis, thus requiring more than one (multiple) microphones to operate.
[0028] The near-end background noise V is distorted by the NLP operation (12) as well. This
fact causes periods of silence and/or sudden changes in the background noise level,
which can be distracting, if not annoying, for the listener. Comfort noise injection
400 (see
"Comfort noise generator for echo cancelers," US 5949888 A1) solves this drawback by filling up the gaps with synthetic noise, hence giving the
illusion of a uniform background noise. The injected synthetic noise must resemble
the actual near-end background noise
v(
n) in both spectral content and level. The final output
Y of the echo reduction system can be obtained as follows

wherein
V̂ is the synthetic replica of the near-end background noise. Accurate spectral estimation
of the near-end background noise
V turns out important to maintain a pleasant listening experience.
[0029] In the prior art
"Method and apparatus for comfort noise generation in speech communication systems" US 7610197 B2, the background noise is obtained by accumulating and smoothing over time the spectral
samples
E; in the inventions
"Comfort noise generation method and system," US 2011/0228946 A1, "Enhanced echo cancellation," US 8873740 B2, the estimation of the background noise is carried out during periods when both far-end
and near-end are inactive. The state of the art may not always work: in case of audioconferencing
in moderately reverberated rooms, the residual echo beyond the canceller scope
Q may fill up periods of silence in the near-end signal, thus acting de-facto as background
noise; this problem is further stressed with gapless audio content such as music,
in which the residual echo can be mistaken as noise in the background.
SUMMARY OF THE INVENTION
[0030] In view of the problems discussed above, is an object of the invention to provide
an improved acoustic echo canceller that is more stable and yields a faster convergence.
[0031] The problems are solved in a first aspect of the invention by a computer-implemented
method for updating at least one frequency-domain filter coefficient
Wi,j(
k) of an echo canceller having at least one channel and at least one segment per channel,
the filter coefficients of the echo canceller being updateable in the frequency domain
at a time block
m comprising:
determining a canceller output Em(k) over the mth time block as the difference between Zm(k) and a cancelling term based on

determining a canceller error ε(k) over the m - ℓth time block as the difference between Zm-ℓ(k) and a cancelling term based on

determining a look-backward error ε(k) as

wherein Δ(k) is based on

determining a look-forward error ε(k) as

wherein Δ(k) is based on

determining an optimal update step size µi,j,m(k) from said canceller output Em(k) over the mth time block, from said canceller error ε(k) over the m - ℓth time block, from said look-backward error ε(k), and from said look-forward error ε(k); and
updating said at least one filter coefficient Wi,j(k) by using said optimal update step-size µi,j,m(k);
wherein
- C
- corresponds to the number of channels i = 0,...,C - 1 of the echo canceller,
- S
- corresponds to the number of segments j - 0,..., S - 1 per channel of the echo canceller,
- ℓ
- is an integer different than zero,
- k
- denotes a frequency bin index,
- ε
- is a positive value,
- Zm(k)
- corresponds to the kth spectral bin of the microphone signal at the mth time block,
- Zm-ℓ(k)
- corresponds to the kth spectral bin of the microphone signal at the m - ℓth time block,
- Xi,j,m(k)
- corresponds to the kth spectral bin of the far-end signal at the ith channel, jth segment, and mth time block,
- Xi,j,m-ℓ(k)
- corresponds to the kth spectral bin of the far-end signal at the ith channel, jth segment, and m - ℓth time block,
- λ
- is the overshoot factor, and
- a, b
- are channel and segment indices, respectively.
[0032] The claimed method yields an optimal step size
µi,j,m(
k) that is very accurate thanks to using the information from the canceller output
Em(
k) at the
mth block, the canceller error
ε(
k) at the
m - ℓth block, and the look-forward error
ε(
k) in addition to the look-backward error
ε(
k). Because the integer
ℓ can take a positive value, the
m - ℓth block is already available, hence not incurring any delay in the update, that
is, the optimal step size
µi,j,m(
k) can be immediately obtained as soon as the
mth time block is available. Said filter update with the optimal step size
µi,j,m(
k) as computed with the inventive method yields a faster convergence and a more effective
reduction of echo than echo cancellers of the state of the art.
[0033] The wording "based on", adopted when introducing
Em(
k) and
ε(
k) with (14) and (15), respectively, refers to additional standard signal processing,
not explicitly included therein for the sake of simplicity in the notation as well
as for being known to anyone skilled with the art, namely, discarding the invalid
samples that result from the circular convolution inherent to the product of two frequency-domain
sequences, such as in said (14) and (15). These arguments can be extended to the definition
of Δ(
k) and Δ(
k) "based on" (17) and (19), respectively, as well as to terms computed in preferred
embodiments.
[0034] Preferably, the optimal step-size
µi,j,m(
k) is computed by either

or

wherein Ψ
m(
k) is a power spectrum of the non-cancellable components at the
mth time block and Φ
i,j(
k) is a power spectrum of the misalignment of each
i,jth filter, wherein Ψ
m(
k) and Φ
i,j(
k) are determined by solving the set of linear equations

subject to inequality constraints

wherein

(
k) corresponds to the
kth power spectral bin of the far-end signal at the
ith channel,
jth segment, and
mth time block, wherein said signal block is preferably windowed,

(
k) corresponds to the
kth power spectral bin of the far-end signal at the
ith channel,
ith segment, and
m - ℓth time block, wherein said signal block is preferably windowed,
Υ(
k) corresponds to a power spectrum of the non-cancellable components at the
m - ℓth block,
T(
k) is the normalized look-backward update, computed as

and
T(
k) is the normalized look-forward update, computed as

[0035] The set of linear equations with inequality constraints is one of the main foundations
of the inventive method. Because the unknowns Υ(
k) and Ψ
m(
k) appear in said set of linear equations in a balanced way, the condition of the problem
is excellent. Removing one of the four equations would make one of those unknowns
less visible, thus deteriorating significantly the condition of the whole problem
(as the overall condition is bound by the worst of its parts). Because the number
of equations is equal to four, the inventive method is particularly suited to stereo
systems
C = 2 with one segment per channel
S = 1, as this case leads to four unknowns.
[0036] Moreover, the inventive method is especially useful for general multi-channel setups,
e. g.
C > 1, that are under frequent near-end activity since the power spectrum of the non-cancellable
components Ψ
m(
k) is obtained with high accuracy, and for the first time optimal continuous adaptation
can be achieved regardless of the presence of near-end components, while in the state
of the art a double-talk detector needs to assess whether there is a near-end signal
to temporarily stop the update of all filter coefficients.
[0037] The first formula (20) for the optimal step size is simpler than the second one (21),
as the latter involves the root square operation; however, formula (21) delivers better
results. Therefore, if the second formula is chosen, its Taylor approximation can
be used, delivering similar performance at much lower computational complexity.
[0038] Another problem that can arise is when at least the number of channels or the number
of segments per channel is larger than one, e. g. when the canceller is composed of
several transversal filters: as the global error
Em(
k) is used for the update of every filter, the misalignment of a given filter is propagated
to the remaining filters upon the update, and vice versa, and thus slowing down the
convergence of the entire adaptive system. To overcome this undesired drawback, one
at a time any chosen
r,
sth filter can be considered not to be part of the canceller, e. g. its residual echo
to be treated as non-cancellable component, and an additional look-backward error
εr,s(
k) and an additional look-forward error
εr,s(
k) involving the update off-line of all filters except the r, sth one are computed.
[0039] In order to employ the latter strategy, the invention provides a computer-implemented
method for updating at least one frequency-domain filter coefficient W
i,j(
k) of the echo canceller, wherein at least one of the number of channels or the number
of segments per channel is larger than one; and
the optimal step size
µi,j,m is determined from said canceller output
Em(
k) over the
mth time block, from said canceller error
ε(
k) over the
m - ℓth time block, from said look-backward error
ε(
k), from said look-forward error
ε(
k) as well as from at least one additional look-backward error
εr,s(
k) and from at least one additional look-forward error
εr,s(
k) for at least one
r,
sth filter,
r and
s being indices of the channel and the segment, respectively, wherein

wherein Δ
r,s(
k) is based on

and

wherein Δ
r,s(
k) is based on

[0040] Including the information from the additional look-backward
εr,s(
k) and look-forward
εr,s(
k) errors allows the inventive method to explicitly acomplish a more accurate estimation
of the optimal step size(s)
µr,s,m(
k), as well as implicitly improving the optimal step size for other
i, jth filters, which translates in faster convergence.
[0041] In this embodiment, the optimal step-size
µi,j,m(
k) is preferably computed as per either (20) or (21), wherein the power spectrum Ψ
m(
k) of the non-cancellable components and the power spectrum Φ
i,j(
k) of the misalignment of each
i,jth filter are determined by solving said set of linear equations (22), (23), (24),
(25), with additional linear equations for the at least one
r,s th filter

subject to inequality constraints

wherein
- Tr,s(k)
- is the normalized look-backward update excluding the r,sth filter, computed as

and
- Tr,s(k)
- is the normalized look-forward update excluding the r,sth filter, computed as

[0042] These embodiments are especially useful not only in scenarios with frequent near-end
activity, as discussed previously, but also for multi-channel and multi-segment setups,
e. g.
C > 1 or
S > 1, as the global learning convergence is substantially improved, in particular
by improving explicitly the estimation of the filter misalignment Φ
r,s(
k), hence the convergence for each
r, sth filter(s) therewith, while in the state of art the global error is spread uniformly
through all filters, which leads to slower convergence.
[0043] Until now, it was assumed that the overshoot factor
λ needed to be equal to 1 as this led to a good trade-off between misalignment reduction
and amplification of the non-cancellable components. However, it has been analytically
found that even better results can be achieved if the overshoot factor
λ is chosen to be greater than one, in the range 1 <
λ ≤ 2, and particularly substantially when
λ = 2. At said value the filter misalignment terms in the look-backward
ε(
k) and look-forward
ε(
k) errors have the same expected level as in the errors
ε(
k) and
Em(
k), but the non-cancellable components turn out exactly two times larger in amplitude,
hence more "visible;" this fact translates into an estimation gain boost of the non-cancellable
components on more than 6 dB. Likewise, an overshoot factor equal to 2 makes in particular
the
r, sth filter misalignment appear exactly two times larger in magnitude in both look-backward
εr,s(
k) and look-forward
εr,s(
k) errors, hence more "visible," which translates into an estimation gain boost of
said filter misalignment on more than 6 dB.
[0045] The filter misalignment terms Φ
i,j(
k) and the power spectra
χi,j,m(
k) employed to evaluate

(
k) and

(
k) correspond to the same terms used to evaluate the optimal step size(s) (20) and
(21). In consequence, those terms are re-used, hence keeping the computational complexity
low while profiting from the high estimation accuracy of said terms. On the other
hand, the term

(
k) involves the filter
Wi,S-1(
k), which is quickly updated at every block by the optimal mechanism of the inventive
method.
[0046] One way to obtain said improved canceller output
Om(
k) with the canceller output
Em(
k) while including the above mentioned terms (38), (39), and (40) is to use a Wiener
filter or any other type known in the state of the art. If the preferred Wiener filter
is to be implemented, the improved canceller output
Om(
k) is determined as

wherein
- └ ┘ε
- denotes low clipping by ε > 0,
- Em(k)
- is the canceller output,
- Rm(k)
- is a power spectrum of the residual echo due to canceller filter misalignment at the
mth block,
- Qm(k)
- is a power spectrum of the echo beyond the canceller's time scope at the mth block, and
- Pm(k)
- is a power spectrum of the non-linear echo component(s) at the mth block.
[0047] The determination of the spectral power of the three residual echo terms (38), (39),
and (40) are used for suppressing, e. g. filtering out (41), otherwise said residual
echo components present in the canceller output
Em(
k). In some cases, it can be preferred to only compute one or two of the above-mentioned
three terms because one or two terms might be significantly lower with respect to
the other(s). In order to deliver a listening signal, the improved canceller output
Om(
k) is to be transformed to the time domain. Standard overlap-add processing, well known
to anybody familiar with the art, can be used in order to avoid block-effects in the
resulting time-domain output.
[0048] Preferably, the at least one filter coefficient
Wi,j(
k) is updated based on

[0049] Even though the inventive method governs a multi-channel/-segment echo canceller,
the preferred update (44) resembles a single-filter adaptive system, as the normalization
term involves only the
i,jth far-end power spectrum |
Xi,j,m(
k)|
2, in contrast with an standard multi-filter canceller, whose normalization term involves
the accumulation of all far-end power spectra. This subtle detail, a real novelty
in the art, implies ultimately that the
i,jth input data

is effectively normalized, and that the main responsibility of the update lies on
the optimal step size
µi,j,m(
k) determined by the inventive method.
[0050] In a second aspect, the invention provides for an echo canceller having at least
one channel and at least one segment, the echo canceller having filter coefficients
Wi,j(
k) being updateable in the frequency domain at a time block
m, wherein the echo canceller is configured to:
determine a canceller output Em(k) over the mth time block as the difference between Zm(k) and a cancelling term based on

determine a canceller error ε(k) over the m - ℓth time block as the difference between Zm-ℓ(k) and a cancelling term based on

determine a look-backward error ε(k) as

wherein Δ(k) is based on

determine a look-forward error ε(k) as

wherein Δ(k) is based on

determine an optimal update step-size µi,j,m(k) from said canceller output Em(k) over the mth time block, from said canceller error ε(k) over the m - ℓth time block, from said look-backward error ε(k), and from said look-forward error ε(k); and
update said at least one filter coefficient Wi,j(k) by using said optimal update step-size µi,j,m(k);
wherein
- C
- corresponds to the number of channels i = 0,...,C - 1 of the echo canceller,
- S
- corresponds to the number of segments j = 0,...,S - 1 per channel of the echo canceller,
- ℓ
- is an integer different than zero,
- k
- denotes a frequency bin,
- ε
- is a positive value,
- Zm(k)
- corresponds to the kth spectral bin of the microphone signal at the mth time block,
- Zm-ℓ(k)
- corresponds to the kth spectral bin of the microphone signal at the m - ℓth time block,
- Xi,j,m(k)
- corresponds to the kth spectral bin of the far-end signal at the ith channel, jth segment, and mth time block,
- Xi,j,m-ℓ(k)
- corresponds to the kth spectral bin of the far-end signal at the ith channel, jth segment, and m - ℓth time block,
- λ
- is the overshoot factor, and
- a, b
- are channel and segment indices, respectively.
[0051] All advantages of the embodiments and variants of the method as described above also
hold for the echo canceller.
BRIEF DESCRIPTION OF THE DRAWINGS
[0052] The invention shall not be explained in more detail below on the basis of the preferred
exemplary embodiments thereof with reference to the accompanying drawings, in which:
Fig. 1 illustrates the problem and scenario of multi-channel acoustic echo reduction.
Fig. 2 illustrates the proposed computer-based system to control and update the multi-channel
acoustic echo cancellation, the residual echo suppression, and the comfort noise injection
at every time block.
Fig. 3 illustrates the proposed computer-based apparatus to decompose the echo canceller
output into the non-cancellable components and the residual echo for each channel
and segment.
Fig. 4 discloses a computer-based method to compute the echo canceller errors at two
different time blocks.
Fig. 5 illustrates the impact of the "overshoot" factor in the frequency-domain adaptive
filter.
DETAILED DESCRIPTION OF THE INVENTION
[0053] In several embodiments of the present invention, a method and apparatus for multi-channel
acoustic echo canceller 50 is disclosed, which governs a first stage of multi-filter
adaptive echo cancellation 200 robust to near-end interferences and with improved
update for each transversal filter, followed by a second stage of residual echo suppression
300 that reduces the residual echo present in the canceller output to inaudible levels
while preserving the quality of the main near-end signal, and followed by a third
and last stage of comfort noise injection 400 that replenishes the suppressed near-end
background noise with synthetic noise that resembles in level and spectrum said actual
near-end background noise.
Acoustic Echo Cancellation
[0054] The output (4) of the multi-channel and multi-segment acoustic echo canceller 200
object of the present invention can be either obtained directly in the time domain
(2) or with the frequency-domain mechanism (5). Regardless of the choice, the canceller
50 is trained in the frequency domain.
[0055] In the present invention, the frequency-domain coefficients of the
i,
jth filter, wherein
i and
j are channel and segment indices, respectively, are updated in 230 according to the
rule

wherein index
m denotes the
mth time block, that is, vector E
m contains the frequency-domain conversion of
M time-domain output samples

while X
i,j,m is built as follows

the operation

{ } corresponds to the standard time-domain windowing to disregard invalid weight
taps that result from the inherent circular convolution therein, see explanations
after (10),
ε > 0 prevents division by zero or values near zero, and
µi,j,m is the frequency-selective step size for the
ith channel,
jth segment, and
mth time block. The appropriate selection of the step size(s)
µi,j,m for
i = 0,···,
C - 1, and
j = 0,···,
S - 1, is important to guarantee stable and fast convergence of the adaptive echo canceller.
[0056] The canceller output at the mth time block can be represented as

wherein the first term
Am comprises all non-cancellable components - see (11) for definitions - and the second
term corresponds to the residual echoes
Ri,j,m yet to be cancelled.
[0059] Before proceeding further, it is worth mentioning that the standard update (9) is
a particular case of the proposed update rule (51) when the frequency-domain step
size is chosen as

[0060] Said standard step size (57) neither acknowledges different misalignment levels on
each transversal filter
Ri,j,m nor the presence of the interfering non-cancellable term
Am. As result, the standard update exhibits slow and unstable convergence in most practical
scenarios.
[0061] Computing the optimal frequency-domain step size as per (55) or (56) can be thus
revamped into estimating the power spectral density of the interferences and the residual
echoes. However, none of those
C ×
S + 1 unknown terms are directly measurable. The common belief for an expert in the
art is that the effective and accurate estimation of the square magnitude of said
terms
Am and
Ri,j,m, required in the evaluation of the optimal step size (55) or (56), is near to impossible.
[0062] We introduce hereby the following notation:
Am(
k) represents the
kth frequency bin of the frequency-domain vector
Am,
Em(
k) represents the
kth frequency bin of the frequency-domain vector
Em, and so on. The
i,jth residual echo can be written as

wherein
Gi,j(
k) denotes the misalignment in the
i, jth filter. We can assume the following statistical relations

[0063] Based on the previous assumptions, the frequency-domain step as per (55) can be recast
as follows

wherein Φ
i,j(
k) and ψ
m(
k) are the power spectral density of the misalignment of each
i, jth filter and that of the non-cancellable term respectively

and
χi,j,m(
k) is the expected power spectrum of the
i,
jth far-end signal at the
mth block

[0064] Said expected power spectrum (64) is preferably obtained from the power spectrum
of a windowed signal block.
[0065] The method to accurately estimate said terms (62) and (63) disclosed in the present
invention is carried out in 110 according to the reasoning and arguments exposed in
what follows. It is known in the field of machine learning that the objective assessment
of a learning machine, such as the set of adaptive filters of concern, must be performed
on testing data statistically independent from the training data. This important axiom
implies hereby that the weight update performed with the data from the
mth time block can only be assessed with data from another time block, and never with
the very
mth block.
[0066] The immediately-preceding
m - 1th time block can act as such a "testing" block, hence not incurring any delay
because said block is already available. Moreover, because both testing and updating
time blocks are consecutive in time, the actual echo path at each one is considered
to be the same, which is an important requirement for the inventive method to perform
best. Other blocks from different times, i. e., an
m - ℓth time block, wherein ℓ is an integer different than zero, could be used as long
as the latter requirement is met. For ease of explanation, the
m - 1th time block, e. g. ℓ = 1, is used as an example in the following description.
[0067] Supported on the embodiments from Fig. 3 and Fig. 4, in a first stage two errors
are computed, namely, the canceller output (e. g. error) over the
mth time-block E
m in
wherein Zm corresponds to the spectral samples of the microphone signal z(n) at the mth block, the operator

{ } corresponds to the time-domain windowing upon the filter operation to disregard
invalid samples that result from the inherent circular convolution therein, see (8),
(10) and (52) for explanations; and
the "look-backward" error ε, in 120B, obtained as follows: the canceller weights are updated in 130 according
to the classical frequency-domain adaptive filtering rule (9) with said mth output Em, and the impact thereof is assessed over the m - 1th block, that is,

wherein
Zm-1 corresponds to the spectral samples of the microphone signal z(
n) at the
m - 1th block, and Ω
i,j is the
i, jth look-backward weight update

[0068] The constant
λ in (66) is the "overshoot" factor, to be addressed later.
[0069] A positive outcome from the previous operations is namely that the non-cancellable
term
Am can not only be observed in the
mth error (65), as outlined in (54), but also in the look-backward error (66). However,
the non-cancellable term
Am-1 at the
m - 1th time block unfortunately appears in the look-backward error (66) as new unknown
in the problem. This situation invites us to compute two additional errors, this time
by swapping the training and testing time blocks. Hence, the
m - 1th error
ε is computed as

and the "look-forward" error
ε as

wherein Ω
i,j is the
i, jth look-fonvard weight update

[0071] In the prior art
"Method and apparatus for updating filter coefficients of an adaptive echo canceller," EP 2930917 B1, disclosed by the author of the present invention, the evaluation of only three errors
was thought sufficient to estimate the terms (62) and (63) for a single-channel single-segment
echo canceller. However, such a strategy leads often to ill-conditioned problems because
the presence of the unknown terms in the problem model is unbalanced. Including both
the look-backward (77) and the look-forward (78) errors yields a balanced well-conditioned
estimation problem.
[0072] In order to get the equations that relate said four errors E
m,
ε,
ε, and
ε with the unknown terms required to evaluate the optimal step size(s), said spectral
errors (65) and (66) can be recast in term of the filter misalignment
Gi,j(
k) as follows

wherein ≡ denotes equivalency.
[0073] Translating the previous equation to power spectral densities finally leads to one
of the main advantages of the invention. Supported on the assumptions of uncorrelation
(59) and (60) between the terms involved, it can be concluded

wherein term Y(
k) = E {|A
m-1(
k)|
2} is unknown and not used in the evaluation of the optimal step size(s) (61).
[0074] In order to translate the look-backward (77) and look-forward (78) errors to power
spectral densities, the effect of the overshoot factor
λ needs to be first clarified. It can be deduced that the reduction of the filter misalignment
(62) is ruled by the power factor

[0075] On the other hand, the update equations (77) and (78) yield the amplification of
the non-cancellable terms
Am and
Am-1 proportionally to
A, that is, its power (63) gets amplified by

[0076] The illustration of this new tradeoff is brought in Fig. 5, wherein the solid line
corresponds to (81), and the dashed line to (82). An attractive situation appears
when
λ is set to two (
λ = 2): in this "overshoot" case, the expected misalignment level remains the same,
f(2) = 1, but the power of the non-cancellable term is amplified by a large factor,
e. g.
g(2) = 4.
[0077] Such a value,
λ = 2, has never been used as step size in adaptive filtering because the resulting
adaptive system would not converge. However, in the scope of this invention, said
overshoot off-line update used in (77) and (78) makes the interference term
Am, as well as
Am-1, appear 6 dB larger in the look-backward error and look-forward error respectively.
Henceforth, the value
λ = 2 is adopted in 130, but other λ values can also be used as long as rules (81)
and (82) are observed.
[0078] The power spectral densities of the set of spectral errors (77) and (78) can be thus
written as

wherein
T(
k) and
T(
k) are the normalized update in the look-backward and look-forward errors respectively

[0079] Said four equations (79), (80), (83), and (84) are used to obtain the terms (62)
and (63), which are required in the evaluation of the optimal frequency-domain step
size(s). Said equations can be written as a system of linear equations subject to
inequality constraints. Without loss of generality, we exemplify the solution system
over a stereo scenario,
C = 2, with one segment per channel,
S = 1, in what follows

subject to

[0080] Said inverse linear problem with inequality constraints (87) requires an iterative
methodology, such as linear programming. In order to speed up the process (to reduce
the number of iterations), a good initial guess is welcome. The very solution obtained
at the previous
m - 1th time block can serve as said initialization. The solution to (87) is computed
in 150.
[0081] At this point it is important to remark that the
m -
ℓth time block precedes the
mth one, e. g. ℓ > 0 and preferably ℓ = 1. Adopting a negative value, ℓ < 0, would
incur an undesired processing delay, as the
m - ℓth time block would be yet to come after the
mth one. However, that option does not actually differ in essence from said preferred
case because both time blocks serve as training and testing data on any of the look-forward/-
backward errors: because of this symmetry in the solution system (87), the option
ℓ < 0 becomes moot.
[0082] It is important to point out that any sub-matrix in (87) involving only Ψ
m(
k) and Y(
k) has full rank, e. g. the solution obtained for the non-cancellable terms is accurate.
This proves theoretically that the resulting step size provides "instantaneous" robustness
against double-talk, which was one of the major challenges yet to be effectively solved
by the state of the art.
[0083] On the other hand, the estimation of the filter misalignment becomes ill-conditioned
as the number of filters increases. In the practice, the solution accomplished with
a normal initialization, such as that from the
m - 1th block, outperforms the state of art. Nonetheless, in order to improve the condition
of this problem when the number of channels or the number of segments per channel
is large, a second embodiment of the invention is disclosed in what follows.
[0084] Improving the estimation accuracy for the misalignment power requires evaluating
at most two additional errors, and including the resulting linear equations into the
original system. The strategy behind this methodology is namely not to treat said
r,
sth filter as part of the adaptive system, e. g. to consider its misalignment as a
non-cancellable "interference" term, and to update off-line all canceller filters
but the
r,
sth one, resulting in the additional look-backward error

wherein Δ
r,s is the impact of a look-backward filter update except for the (
r,
s)th filter

and the additional look-forward error

wherein Δ
r,s is the impact of a look-forward filter update except for the (
r,s)th filter

[0085] The additional look-backward error
εr,s and look-forward error
εr,s, for
λ = 2, bring two important additional relations

wherein

[0086] Said four equations (79), (80), (83), (84), in addition to (92) and (93), embody
the main algorithmic foundations to obtain the terms (62) and (63). For the sake of
simplicity let us consider a one-channel system,
C = 1, composed of two segments,
S = 2. Since the filter
W0,0 of the first segment takes care of the early echoes, which are prone to change and
have larger energy than the late ones, it is desired to improve the tracking performance
for said filter. For the sake of simplicity, we drop the frequency bin index
k whenever convenient, that is

denotes

(
k)
, and so on, the final solution system results as follows

subject to

[0087] The misalignment Φ
0,0(
k) is observed in the look-backward/-forward errors
ε0,0(
k) and
ε0,0(
k) with a gain of 6 dB. It is evident for anyone familiar with the art that the inclusion
of said two additional equations, that is, the last two rows in (96), improves the
condition of the problem.
[0088] The solution to (96) is computed in 150. Upon said solution, the optimal FD step
size as per (56) is evaluated in 180 as follows

Residual Echo Suppression
[0089] Due to very long echo paths, to an adaptive system in early convergence, or to non-linear
phenomena, echo components in the canceller output (11) may still be audible. The
non-linear processing (NLP) 300 aims to suppress the residual echo components still
present in the canceller output to inaudible levels while preserving the integrity
of the main near-end sounds as much as possible. In order to perform an effective
NLP, accurate estimation of the residual echo in level and spectrum is important.
[0090] The residual echo present in the canceller output (11) is composed of the terms
Rm(
k)
, Qm(k) and
Pm(k). As said terms are uncorrelated with each other, the estimated power spectrum of the
residual echo is built additively in 330 as follows

wherein

(
k)
, 
(
k)
, and

(
k) are the estimated power spectrum of every term in the residual echo respectively.
[0091] The power spectrum of the residual echo due to filter misalignment is readily given
by

wherein Φ
i,j(
k) is outcome of the generic decomposition problem (96).
[0092] The power spectrum of the residual echo due to long acoustic echo paths (
LH ≫ S × L) can be defined as follows

wherein \
Ni,j(
k) for
j =
S, . . . , ∞ represent filters beyond the time scope of the canceller, hence unknown
"virtual" filters. Said terms in (100) can be approximated according to the reasoning
that follows.
[0093] Echo paths present typically an exponential decay, especially in the last part of
its impulse response. The time-domain filter
wi,S-1(
n), regardless of the channel, is thus assumed to follow the model

wherein
τ is the time constant related to said filter, and the level
α is not relevant henceforth; the time constant
τ can be simply obtained with linear regression over said square filter weights in
logarithm scale, as pointed out in (101). The power spectrum of said "virtual" filters
in (100) can be thus approximated as

wherein
Wi,S-1(
k) is an existing filter,
α ≥ 1 is an integer, and
γ = exp(-τ
L). By replacing (102) into (100), and because
Xi,j,m(
k) =
Xi,j-1,m-1(
k)
, the power spectrum of the residual echo beyond the canceller scope is obtained with
the following accumulative formula

wherein all elements involved in said formula are available.
[0094] Finally, the non-linear echo components are correlated to the input signals
xi(
n) somewhat in an unpredictable way. This means that the adaptive filter usually jitters
or fluctuates when non-linear echo is present in the training error. Based on the
misalignment Φ
i,j(
k) and step size
µi,j,m(
k) obtained from the decomposition problem (96), the non-linear residual echo is readily
obtained as follows

wherein ∝ denotes proportional to.
[0095] Upon the estimated overall residual echo Λ
m(
k), several ways to carry out the NLP operation by 300 are available, such as with
the Wiener filter, minimum mean-square error, perceptual enhancement, et cetera. As
example, the Wiener filter can be approximated as follows

wherein └ ┘ε denotes low clipping by
ε > 0. Finally, the frequency-domain output of the acoustic echo control system is
obtained by

[0096] In order to deliver a listening signal, the frequency-domain output
Om(
k) is transformed to the time domain. Standard overlap-add processing must be considered
to avoid block-effects in the resulting output. Said processing is well known to any
person familiar with the art.
Comfort Noise Injection
[0097] The echo cancellation 200 along with the NLP operation 300 delivers an echo-free
output (first objective in echo control), that is

[0098] Since the magnitude of the residual echo is significantly lower than that of the
main near-end signal, hence
Dm(
k)
Bm(
k) ≃
Dm(
k)
, the main near-end signal goes through nearly undistorted upon the NLP (second objective
in echo control). We can thus approximate the previous output (107) as

[0099] However, the near-end background noise in (108), e. g.
Vm(
k)
Bm(
k)
, suffers fluctuations in level and spectrum. In order to replenish the background
noise in 400 as per (13) or equivalent technique and deliver the same perceptual impression
(third and last objective in echo control), accurate estimation thereof in level and
spectrum is important.
[0100] In case of far-end inactivity (an event that is straightforward to detect), hence
Bm(
k) = 1, the background noise can be estimated from the power spectrum |
Om(
k)|
2 during (short) periods of silence. However, the injection of comfort noise is actually
required in the opposite case, that is, during far-end activity, hence 0 ≤
Bm(
k) < 1. Therefore, being able to track the background noise on any situation, hence
from the original canceller output |
Em(
k)|
2, represents the main challenge to address hereby.
[0101] The usual approach to background noise estimation is to obtain the power spectral
floor from the most recent
K canceller outputs

either by averaging selected items within Γ
m, or with minimum statistics over the set Γ
m(
k)
, et cetera. Let

{Γ
m(
k)} denote a generic method to obtain said spectral floor from the sets Γ
m(
k)
.
[0102] The present invention proposes to estimate the power spectrum of the background noise
in 430 according to

wherein
δ(
k) represents the minimum comfort noise for the
kth frequency bin, and

(
k) contains the
K most recent time spectral snapshots of the residual echo power, obtained with (98)
during the NLP phase

[0103] The proposed rule (110) involves the computation of a spectral floor "two times",
once for the set Γ
m(
k) and then for

(
k)
. Although it demands larger computational complexity, it delivers accurate results.
Evaluating only one spectral floor over the difference set Γ
m(
k) -

(
k) does not lead in general to the same result, that is

[0104] The invention is not restricted to the specific embodiments described in detail herein,
but encompasses all variants, combinations and modifications thereof that fall within
the framework of the appended claims.
1. A computer-implemented method for updating at least one frequency-domain filter coefficient
Wi,j(
k) of an echo canceller (50) having at least one channel and at least one segment per
channel, the filter coefficients of the echo canceller (50) being updateable in the
frequency domain at a time block
m,
characterized by:
determining a canceller output Em(k) over the mth time block as the difference between Zm(k) and a cancelling term based on

determining a canceller error ε(k) over the m - ℓth time block as the difference between Zm-ℓ(k) and a cancelling term based on

determining a look-backward error ε(k) as

wherein Δ(k) is based on

determining a look-forward error ε(k) as

wherein Δ(k) is based on

determining an optimal update step size µi,j,m(k) from said canceller output Em(k) over the mth time block, from said canceller error ε(k) over the m - ℓth time block, from said look-backward error ε(k), and from said look-forward error ε(k); and
updating said at least one filter coefficient Wi,j(k) by using said optimal update step-size µi,j,m(k);
wherein
C corresponds to the number of channels i = 0, ··· , C - 1 of the echo canceller (50),
S corresponds to the number of segments j = 0, ··· , S - 1 per channel of the echo canceller (50),
ℓ is an integer different than zero,
k denotes a frequency bin index,
ε is a positive value,
Zm(k) corresponds to the kth spectral bin of a microphone signal at the mth time block,
Zm-ℓ(k) corresponds to the kth spectral bin of the microphone signal at the m - ℓth time block,
Xi,j,m(k) corresponds to the kth spectral bin of a far-end signal at the ith channel, jth segment, and mth time block,
Xi,j,m-ℓ(k) corresponds to the kth spectral bin of the far-end signal at the ith channel, jth segment, and m - ℓth time block,
λ is an overshoot factor, and
a, b are channel and segment indices, respectively.
2. The method according to claim 1, wherein the optimal step size
µi,j,m(
k) is computed by either

or

wherein Ψ
m(
k) is a power spectrum of the non-cancellable components at the
mth time block and Φ
i,j(
k) is a power spectrum of the misalignment of each
i, jth filter, wherein Ψ
m(
k) and Φ
i,j(
k) are determined by solving the set of linear equations

subject to inequality constraints

wherein

(k) corresponds to the kth power spectral bin of the far-end signal at the ith channel, jth segment, and mth time block, wherein said signal block is preferably windowed,
χi,j,m-ℓ(k) corresponds to the kth power spectral bin of the far-end signal at the ith channel, jth segment, and m - ℓth time block, wherein said signal block is preferably windowed,
Y(k) corresponds to the kth power spectral bin of the non-cancellable components at the
m - ℓth block,
T(k) is the normalized look-backward update, computed as

and
T(k) is the normalized look-forward update, computed as
3. The method according to claim 1, wherein at least one of the number of channels and
the number of segments per channel is larger than one; and
the optimal step size
µi,j,m(
k) is determined from said canceller output
Em(
k) over the
mth time block, from said canceller error
ε(
k) over the
m - ℓth time block, from said look-backward error
ε(
k)
, from said look-forward error
ε(
k) as well as from at least one additional look-backward error
εr,s(
k) and from at least one additional look-forward error
εr,s(
k) for at least one
r,
sth filter,
r and s being indices of the channel and the segment, respectively, wherein

wherein
Δr,s(
k) is based on

and

wherein
Δr,s(
k) is based on
4. The method according to claims 2 and 3, wherein the power spectrum Ψ
m,(
k) of the non-cancellable components and the power spectrum Φ
i,j(
k) of the misalignment of each
i, jth filter are determined by solving said set of linear equations with additional linear
equations for the at least one
r, s th filter

subject to inequality constraints

wherein
Tr,s(k) is the normalized look-backward update excluding the r, sth filter, computed as

and
Tr,s(k) is the normalized look-forward update excluding the r, sth filter, computed as
5. The method according to any one of the claims 1 to 4, wherein the overshoot factor
λ is in the range 1 < λ ≤ 2, and preferably equals 2.
6. The method according to any one of the claims 1 to 5,
characterized by determining a power spectrum

(
k) of the residual echo due to filter misalignment at the
mth block by means of

and subsequently an improved canceller output
Om(
k) based on the canceller output
Em(
k) and on said determined power spectrum

(
k) of the residual echo due to the filter misalignment.
7. The method according to any one of the claims 1 to 6,
characterized by determining a power spectrum

(
k) of the echo beyond the canceller's time scope at the
mth block by means of the accumulative method

wherein 0 <
γ < 1,
and subsequently an improved canceller output
Om(
k) based on the canceller output
Em(
k) and on said determined power spectrum

(
k) of the echo beyond the canceller's time scope.
8. The method according to any one of the claims 1 to 7,
characterized by determining a power spectrum

(
k) of the non-linear echo components at the
mth block by means of

wherein ∝ denotes proportional,
and subsequently an improved canceller output
Om(
k) based on the canceller output
Em(
k) and on said determined power spectrum

(
k) of the non-linear echo components.
9. The method according to any one of the claims 1 to 8, wherein the method determines
an improved canceller output
Om(
k) as

wherein
└ ┘ε denotes low clipping by ε > 0,
Em(k) is the canceller output,

(k) is a power spectrum of the residual echo due to filter mis-alignment at the mth block,
Qm(k) is a power spectrum of the echo beyond the canceller's time scope at the mth block, and
Pm(k) is a power spectrum of the non-linear echo components at the mth block.
10. The method according to any one of claims 1 to 9, wherein the at least one filter
coefficient
Wi,j(
k) is updated based on
11. An echo canceller (50) having at least one channel and at least one segment per channel,
the echo canceller (50) having filter coefficients
Wi,j(
k) being updateable in the frequency domain at a time block
m,
characterised in that the echo canceller is configured to:
determine a canceller output Em(k) over the mth time block as the difference between Zm(k) and a cancelling term based on

determine a canceller error ε(k) over the m - ℓth time block as the difference between Zm-ℓ(k) and a cancelling term based on

determine a look-backward error ε(k) as

wherein Δ(k) is based on

determine a look-forward error ε(k) as

wherein Δ(k) is based on

determine an optimal update step-size µi,j,m(k) from said canceller output Em(k) over the mth time block, from said canceller error ε(k) over the m - ℓth time block, from said look-backward error ε(k), and from said look-forward error ε(k); and
update said at least one filter coefficient Wi,j(k) by using said optimal update step-size µi,j,m(k);
wherein
C corresponds to the number of channels i = 0, ··· , C - 1 of the echo canceller (50),
S corresponds to the number of segments j = 0, ··· , S - 1 per channel of the echo canceller (50),
ℓ is an integer different than zero,
k denotes a frequency bin,
ε is a positive value,
Zm(k) corresponds to the kth spectral bin of a microphone signal at the mth time block,
Zm-ℓ(k) corresponds to the kth spectral bin of the microphone signal at the m - ℓth time block,
Xi,j,m(k) corresponds to the kth spectral bin of a far-end signal at the ith channel, jth segment, and mth time block,
Xi,j,m-ℓ(k) corresponds to the kth spectral bin of the far-end signal at the ith channel, jth segment, and m - ℓth time block,
λ is an overshoot factor, and
a, b are channel and segment indices, respectively.
12. The echo canceller (50) according to claim 11, wherein the echo canceller (50) is
further configured to compute the optimal step size
µi,j,m(
k) by either

or

wherein Ψ
m(
k) is a power spectrum of the non-cancellable components at the
mth time block and Φ
i,j(
k) is a power spectrum of the misalignment of each
i, jth filter, wherein the echo canceller is configured to determine Ψ
m(
k) and Φ
i,j(
k) by solving the set of linear equations

subject to inequality constraints

wherein
χi,j,m(k) corresponds to the kth power spectral bin of the far-end signal at the ith channel, jth segment, and mth time block, wherein said signal block is preferably windowed,
xi,j,m-l(k) corresponds to the kth power spectral bin of the far-end signal at the ith channel, jth segment, and m - ℓth time block, wherein said signal block is preferably windowed,
Y(k) corresponds to the kth power spectral bin of a power spectrum of the non-cancellable components at the
m - ℓth block,
T(k) is the kth frequency bin of the normalized look-backward up-date, computed as

and
T(k) is the kth frequency bin of the normalized look-forward up-date, computed as
13. The echo canceller (50) according to claim 11 or 12, wherein the overshoot factor
λ is in the range 1 < λ ≤ 2, and preferably equals 2.
14. The echo canceller (50) according to any one of the claims 11 to 13, wherein the echo
canceller (50) is configured to determine an improved canceller output
Om(
k) as

wherein
└ ┘ε denotes low clipping by ε > 0,
Em(k) is the canceller output,
Rm(k) is a power spectrum of the residual echo due to filter mis-alignment at the mth block,

(k) is a power spectrum of the echo beyond the canceller's time scope at the mth block, and

(k) is a power spectrum of the non-linear echo components at the mth block.
15. The echo canceller (50) according to any one of claims 11 to 14, wherein the echo
canceller (50) is configured to update the at least one filter coefficient
Wi,j(
k) based on
1. Computer-implementiertes Verfahren zum Aktualisieren zumindest eines Filterkoeffizienten
Wi,j(
k) eines Echokompensators (50) im Frequenzbereich, wobei der Echokompensator zumindest
einen Kanal und zumindest einen Abschnitt pro Kanal hat, und wobei die Filterkoeffizienten
des Echokompensators (50) im Frequenzbereich in einem Zeitblock
m aktualisiert werden können,
gekennzeichnet durch:
Ermitteln einer Kompensatorausgabe Em(k) über den m-ten Zeitblock als Differenz zwischen Zm(k) und einem Kompensationsterm basierend auf

Ermitteln eines Kompensatorfehlers ε(k) über den (m - ℓ)-ten Zeitblock als Differenz zwischen Zm-ℓ(k) und einem Kompensationsterm basierend auf

Ermitteln eines zurückschauenden Fehlers ε(k) als

wobei Δ(k) basiert auf

Ermitteln eines vorausschauenden Fehlers ε(k) als

wobei Δ(k) basiert auf

Ermitteln einer optimalen Aktualisierungsschrittweite µi,j,m(k) aus der genannten Kompensatorausgabe Em(k) über den m-ten Zeitblock, aus dem genannten Kompensatorfehler ε(k) über den (m - ℓ)-ten Zeitblock, aus dem genannten zurückschauenden Fehler ε(k) und aus dem genannten vorausschauenden Fehler ε(k) ; und
Aktualisieren des zumindest einen Filterkoeffizienten Wi,j(k) unter Verwendung dieser optimalen Aktualisierungsschrittweite µi,j,m(k) ;
wobei
C der Anzahl an Kanälen i = 0, ..., C - 1 des Echokompensators (50) entspricht,
S der Anzahl an Abschnitten j = 0, ..., S - 1 pro Kanal des Echokompensators (50) entspricht,
ℓ eine ganze Zahl ungleich 0 ist,
k einen Frequenzklassenindex bezeichnet,
ε ein positiver Wert ist,
Zm(k) der k-ten Spektralklasse eines Mikrofonsignals im m-ten Zeitblock entspricht,
Zm-ℓ(k) der k-ten Spektralklasse des Mikrofonsignals im (m - ℓ)-ten Zeitblock entspricht,
Xi,j,m(k) der k-ten Spektralklasse eines entfernten Signals im i-ten Kanal, j-ten Abschnitt und m-ten Zeitblock entspricht,
Xi,j,m-ℓ(k) der k-ten Spektralklasse des entfernten Signals im i-ten Kanal, j-ten Abschnitt, und (m - ℓ)-ten Zeitblock entspricht,
λ ein Überschreitungsfaktor ist, und
α, b Kanal- bzw. Abschnittsindizes sind.
2. Verfahren nach Anspruch 1, wobei die optimale Schrittweite
µi,j,m(k) entweder als

oder als

berechnet wird,
wobei
Ψm(k) ein Leistungsspektrum der nicht-kompensierbaren Komponenten im
m-ten Zeitblock und
Φi,j(k) ein Leistungsspektrum der Fehlausrichtung jedes
i,j-ten Filters ist, wobei
Ψm(k) und
Φi,j(k) durch Lösen des Satzes von linearen Gleichungen

unter den Ungleichungs-Nebenbedingungen

ermittelt werden; wobei
Xi,j,m(k) der k-ten Leistungsspektralklasse des entfernten Signals im i-ten Kanal, j-ten Abschnitt und m-ten Zeitblock entspricht, wobei der genannte Signal-block bevorzugt
gefenstert ist,
Xi,j,m-ℓ(k) der k-ten Leistungsspektralklasse des entfernten Signals im i-ten Kanal, j-ten Abschnitt und (m - ℓ)-ten Zeitblock entspricht, wobei der genannte Signalblock bevorzugt gefenstert
ist,
Y(k) der k-ten Leistungsspektralklasse der nicht-kompensierbaren Komponenten im (m - ℓ)-ten Block entspricht,
T(k) die normierte zurückschauende Aktualisierung ist, die als

berechnet wird, und
T(k) die normierte vorausschauende Aktualisierung ist, die als

berechnet wird.
3. Verfahren nach Anspruch 1, wobei zumindest eine Anzahl der Anzahl an Kanälen und Anzahl
an Abschnitten pro Kanal größer als 1 ist; und
die optimale Schrittweite
µi,j,m(k) aus der genannten Kompensatorausgabe
Em(k) über den
m-ten Zeitblock; aus dem genannten Kompensatorfehler
ε(
k) über den (
m - ℓ)-ten Zeitblock, aus dem genannten zurückschauenden Fehler
ε(
k), aus dem genannten vorausschauenden Fehler
ε(
k) sowie aus zumindest einem zusätzlichen zurückschauenden Fehler
εr,s(
k) und aus zumindest einem zusätzlichen vorausschauenden Fehler
εr,s(
k) für zumindest ein
r,s-tes Filter ermittelt wird, wobei
r und
s die Inzides des Kanals bzw. des Abschnitts sind, wobei

wobei Δ
r,s(
k) basiert auf

und

wobei Δ
r,s(
k) basiert auf
4. Verfahren nach den Ansprüchen 2 und 3, wobei das Leistungsspektrum
Ψm(k) der nicht-kompensierbaren Komponenten und das Leistungsspektrum
Φi,j(k) der Fehlausrichtung jedes
i,j-ten Filters durch Lösen des genannten Satzes von linearen Gleichungen ermittelt werden
mit den zusätzlichen linearen Gleichungen für das zumindest eine
r, s-te Filter

unter den Ungleichungs-Nebenbedingugen

wobei
Ts,r(k) die normierte zurückschauende Aktualisierung unter Ausschluss des r,s-ten Filters ist, die berechnet wird als

und
Tr,s(k) die normierte vorausschauende Aktualisierung unter Ausschluss des r,s-ten Filters ist, die berechnet wird als
5. Verfahren nach einem der Ansprüche 1 bis 4, wobei der Überschreitungsfaktor λ im Bereich
1 < λ ≤ 2 liegt und bevorzugt gleich 2 ist.
6. Verfahren nach einem der Ansprüche 1 bis 5,
gekennzeichnet durch Ermitteln eines Leistungsspektrums
(k) des Restechos aufgrund der Filter-Fehlausrichtung im
m-ten Block mittels

und danach einer verbesserten Kompensatorausgabe
Om(
k) basierend auf der Kompensatorausgabe
Em(k) und auf dem genannten ermittelten Leistungsspektrum
(k) des Restechos aufgrund der Filter-Fehlausrichtung.
7. Verfahren nach einem der Ansprüche 1 bis 6,
gekennzeichnet durch Ermitteln eines Leistungsspektrums
(k) des Echos jenseits des Zeiterfassungsbereichs des Kompensators im
m-ten Block mittels des akkumulierenden Verfahrens

wobei 0<γ< 1,
und danach einer verbesserten Kompensatorausgabe
Om(
k) basierend auf der Kompensatorausgabe
Em(k) und auf dem genannten ermittelten Leistungsspektrum
(k) des Echos jenseits des Zeiterfassungsbereiches des Kompensators.
8. Verfahren nach einem der Ansprüche 1 bis 7,
gekennzeichnet durch Ermitteln eines Leistungsspektrums
Pm(k) der nicht-linearen Echokomponenten im
m-ten Block mittels

wobei α eine Proportionalität bezeichnet,
und danach einer verbesserten Kompensatorausgabe
Om(
k) basierend auf der Kompensatorausgabe
Em(k) und auf dem genannten ermittelten Leistungsspektrum
Pm(k) der nicht-linearen Echokomponenten.
9. Verfahren nach einem der Ansprüche 1 bis 8, wobei das Verfahren eine verbesserte Kompensatorausgabe
Om(
k) ermittelt als

wobei
└ ┘ε ein unteres Begrenzen durch ε > 0 bezeichnet,
Em(k) die Kompensatorausgabe ist,
Rm(k) ein Leistungsspektrum des Restechos aufgrund der Filter-Fehlausrichtung im m-ten Block ist,
Qm(k) ein Leistungsspektrum des Echos jenseits des Zeiterfassungsbereichs des Kompensators
im m-ten Block ist, und
Pm(k) ein Leistungsspektrum der nicht-linearen Echokomponenten im m-ten Block ist.
10. Verfahren nach einem der Ansprüche 1 bis 9, wobei der zumindest eine Filterkoeffizient
Wi,j(
k) aktualisiert wird basierend auf
11. Echokompensator (50), der zumindest einen Kanal und zumindest einen Abschnitt pro
Kanal hat, wobei der Echokompensator (50) Filterkoeffizienten hat, die im Frequenzbereich
in einem Zeitblock
m aktualisierbar sind,
dadurch gekennzeichnet, dass der Echokompensator ausgebildet ist zum:
Ermitteln einer Kompensatorausgabe Em(k) über den m-ten Zeitblock als Differenz zwischen Zm(k) und einem Kompensationsterm basierend auf

Ermitteln eines Kompensatorfehlers ε(k) über den (m - ℓ)-ten Zeitblock als Differenz zwischen Zm-ℓ(k) und einem Kompensationsterm basierend auf

Ermitteln eines zurückschauenden Fehlers ε(k) als

wobei Δ(k) basiert auf

Ermitteln eines vorausschauenden Fehlers ε(k) als

wobei Δ(k) basiert auf

Ermitteln einer optimalen Aktualisierungsschrittweite µi,j,m(k) aus der genannten Kompensatorausgabe Em(k) über den m-ten Zeitblock, aus dem genannten Kompensatorfehler ε(k) über den (m - ℓ)-ten Zeitblock, aus dem genannten zurückschauenden Fehler ε(k) und aus dem genannten vorausschauenden Fehler ε(k) ; und
Aktualisieren des zumindest einen Filterkoeffizienten Wi,j(k) unter Verwendung dieser optimalen Aktualisierungsschrittweite µi,j,m(k) ;
wobei
C der Anzahl an Kanälen i = 0, ..., C - 1 des Echokompensators (50) entspricht,
S der Anzahl an Abschnitten j = 0, ..., S - 1 pro Kanal des Echokompensators (50) entspricht,
ℓ eine ganze Zahl ungleich 0 ist,
k einen Frequenzklassenindex bezeichnet,
ε ein positiver Wert ist,
Zm(k) der k-ten Spektralklasse eines Mikrofonsignals im m-ten Zeitblock entspricht,
Zm-ℓ(k) der k-ten Spektralklasse des Mikrofonsignals im (m - ℓ)-ten Zeitblock entspricht,
Xi,j,m(k) der k-ten Spektralklasse eines entfernten Signals im i-ten Kanal, j-ten Abschnitt und m-ten Zeitblock entspricht,
Xi,j,m-ℓ(k) der k-ten Spektralklasse des entfernten Signals im i-ten Kanal, j-ten Abschnitt und (m - ℓ)-ten Zeitblock entspricht,
λ ein Überschreitungsfaktor ist, und
α, b Kanal- bzw. Abschnittsindizes sind.
12. Echokompensator (50) nach Anspruch 11, wobei der Echokompensator (50) ferner dazu
ausgebildet ist, die optimale Schrittweise
µi,j,m(k) entweder als

oder als

zu berechnen,
wobei
Ψm(k) ein Leistungsspektrum der nicht-kompensierbaren Komponenten im m-ten Zeitblock und
Φi,j(k) ein Leistungsspektrum einer Fehlausrichtung jedes
i,j-ten Filters ist, wobei der Echokompensator dazu ausgebildet ist,
Ψm(
k) und
Φi,j(k) durch Lösen des Satzes von linearen Gleichungen

unter den Ungleichungs-Nebenbedingungen

zu ermitteln, wobei
Xi,j,m(k) der k-ten Leistungsspektralklasse des entfernten Signals im i-ten Kanal, j-ten Abschnitt und m-ten Zeitblock entspricht, wobei der genannte Signal-block bevorzugt gefenstert ist,
Xi,j,m-ℓ(k) der k-ten Leistungsspektralklasse des entfernten Signals im i-ten Kanal, j-ten Abschnitt und (m - ℓ)-ten Zeitblock entspricht, wobei der genannte Signalblock bevorzugt gefenstert
ist,
Y(k) der k-ten Leistungsspektralklasse der nicht-kompensierbaren Komponenten im (m - ℓ)-ten Block entspricht,
T(k) die k-te Frequenzklasse der normierten zurückschauenden Aktualisierung ist, die als

berechnet wird, und
T(k) die k-te Frequenzklasse der normierten vorausschauenden Aktualisierung ist die als

berechnet wird.
13. Echokompensator (50) nach Anspruch 11 oder 12, wobei der Überschreitungsfaktor λ im
Bereich 1 < λ ≤ 2 liegt und bevorzugt gleich 2 ist.
14. Echokompensator (50) nach einem der Ansprüche 11 bis 13, wobei der Echokompensator
(50) dazu ausgebildet ist, eine verbesserte Kompensatorausgabe
Om(k) als

zu ermitteln,
wobei
└ ┘ε ein unteres Begrenzen durch ε > 0 bezeichnet,
Em(k) die Kompensatorausgabe ist,,
Rm(k) ein Leistungsspektrum des Restechos aufgrund von Filterfehlausrichtung im m-ten Block ist,
Qm(k) ein Leistungsspektrum des Echos jenseits des Zeiterfassungsbereichs des Kompensators
im m-ten Block ist, und
Pm(k) ein Leistungsspektrum der nicht-linearen Echokomponenten im m-ten Block ist.
15. Echokompensator (50) nach einem der Ansprüche 11 bis 14, wobei der Echokompensator
(50) dazu ausgebildet ist, den zumindest einen Filterkoeffizienten
Wi,j(k) zu aktualisieren basierend auf
1. Procédé mis en œuvre par ordinateur pour actualiser au moins un coefficient de filtre
Wi,j(k) d'un dispositif d'annulation d'écho (50) dans le domaine de fréquence, le dispositif
d'annulation d'écho (50) ayant au moins un canal et au moins un segment par canal,
les coefficients de filtre du dispositif d'annulation d'écho (50) pouvant être actualisés
dans le domaine de fréquence à un bloc temporel
m, caractérisé par :
la détermination d'une sortie du dispositif d'annulation Em(k) sur le mième bloc temporel sous forme de la différence entre Zm(k) et un terme d'annulation basé sur

la détermination d'une erreur du dispositif d'annulation ε(k) sur le m - ℓième bloc temporel sous la forme de la différence entre Zm-ℓ(k) et un terme d'annulation basé sur

la détermination d'une erreur vers l'arrière ε(k) sous la forme

où Δ(k) est basé sur

la détermination d'une erreur vers l'avant ε(k) sous la forme

où Δ(k) est basé sur

la détermination d'une largeur de pas d'actualisation optimale µi,j,m(k) à partir de ladite sortie du dispositif d'annulation Em(k) sur le mième bloc temporel, à partir de ladite erreur de dispositif d'annulation ε(k) sur le m - ℓième bloc temporel, à partir de ladite erreur vers l'arrière ε(k) et à partir de ladite erreur vers l'avant ε(k) ; et
l'actualisation dudit au moins un coefficient de filtre Wi,j(k) en utilisant ladite largeur de pas d'actualisation optimale µi,j,m(k) ;
où
C correspond au nombre de canaux i = 0, ..., C -1 du dispositif d'annulation d'écho (50),
S correspond au nombre de segments j = 0, ..., S - 1 par canal du dispositif d'annulation d'écho (50),
ℓ est un nombre entier différent de 0,
k désigne un indice de classe de fréquence,
ε est une valeur positive,
Zm(k) correspond á la kième classe spectrale d'un signal de microphone au mième bloc temporel,
Zm-ℓ(k) correspond á la kième classe spectrale du signal de microphone au m - ℓième bloc temporel,
Xi,j,m(k) correspond á la kième classe spectrale d'un signal lointain au iième canal, jième segment et mième bloc temporel,
Xi,j,m-ℓ(k) correspond á la kième classe spectrale d'un signal lointain au iième canal, jième segment et m - ℓième bloc temporel,
λ est un facteur de dépassement, et
α, b sont des indices de classe et de segment, respectivement.
2. Procédé selon la revendication 1, dans lequel la largeur de pas optimale
µi,j,m(k) est calculée soit par

soit par

où
Ψm(k) est un spectre de puissance des composants non annulables au
mième bloc temporel et
Φi,j(k) est un spectre de puissance du désalignement de chaque
i,jième filtre, où
Ψm(k) et
Φi,j(k) sont déterminés en résolvant l'ensemble des équations linéaires

soumises à des contraintes d'inégalité

où
Xi,j,m(k) correspond á la kième classe spectrale de puissance d'un signal lointain au iième canal, jième segment et mième bloc temporel, où ledit bloc de signal est de préférence soumis à une fenêtre,
Xi,j,m-ℓ(k) correspond á la kième classe spectrale de puissance d'un signal lointain au iième canal, jième segment et m - ℓième bloc temporel, où ledit bloc de signal est de préférence soumis à une fenêtre,
Y(k) correspond á la kième classe spectrale de puissance des composants non annulables au m - ℓième bloc,
T(k) est l'actualisation en arrière normalisée, calculée sous la forme

et
T(k) est l'actualisation en avant normalisée, calculée sous la forme

3. Procédé selon la revendication 1, dans lequel au moins un parmi le nombre de canaux
et le nombre de segments par canal est plus grand que un ; et
la largeur de pas optimale
µi,j,m(
k) est déterminée à partir de ladite sortie du dispositif d'annulation
Em(k) sur le
mième bloc temporel, à partir de ladite erreur du dispositif d'annulation
ε(k) sur le
m - ℓ
ième bloc temporel, à partir de ladite erreur vers l'arrière
ε(
k) et à partir de ladite erreur vers l'avant
ε(
k) ainsi qu'à partir d'au moins une erreur vers l'arrière supplémentaire
εr,s(
k) et à partir d'au moins une erreur vers l'avant supplémentaire
εr,s(
k) pour au moins un
r,sième filtre,
r et s étant des indices du canal et du segment, respectivement, où

où Δ
r,s(
k) est basé sur

et

où Δ
r,s(
k) est basé sur
4. Procédé selon les revendications 2 et 3, dans lequel le spectre de puissance
Ψm(k) des composants non annulables et le spectre de puissance
Φi,j(k) de désalignement de chaque
i,
jième filtre sont déterminés en résolvant ledit ensemble d'équations linéaires avec des
équations linéaires supplémentaires pour au moins un
r,sième filtre

soumises à des contraintes d'inégalité

où
Tr,s(k) est l'actualisation vers l'arrière normalisée excluant le r,sième filtre, calculée sous la forme

et
Tr,s(k) est l'actualisation vers l'avant normalisée excluant le r,sième filtre, calculée sous la forme
5. Procédé selon l'une quelconque des revendications 1 à 4, dans lequel le facteur de
dépassement λ est dans la plage 1 < λ ≤ 2, et est de préférence égal à 2.
6. Procédé selon l'une quelconque des revendications 1 à 5,
caractérisé par la détermination d'un spectre de puissance

(
k) de l'écho résiduel dû à un désalignement de filtre au
mième bloc au moyen de

et par la suite d'une sortie du dispositif d'annulation améliorée
Om(k) basée sur la sortie du dispositif d'annulation
Em(k) et sur ledit spectre de puissance

(
k) déterminé de l'écho résiduel dû au désalignement de filtre.
7. Procédé selon l'une quelconque des revendications 1 à 6,
caractérisé par la détermination d'un spectre de puissance
Qm(k) de l'écho au-delà de la portée temporelle du dispositif d'annulation au
mième bloc au moyen d'un procédé accumulatif

où 0<γ<1,
et par la suite d'une sortie du dispositif d'annulation améliorée
Om(k) basée sur la sortie du dispositif d'annulation
Em(k) et sur ledit spectre de puissance
Qm(k) déterminé de l'écho au-delà de la portée temporelle du dispositif d'annulation.
8. Procédé selon l'une quelconque des revendications 1 à 7,
caractérisé par la détermination d'un spectre de puissance
Pm(k) des composants d'écho non linéaires au
mième bloc au moyen de

où α désigne une proportionnalité,
et par la suite d'une sortie du dispositif d'annulation améliorée
Om(k) basée sur la sortie du dispositif d'annulation
Em(k) et sur ledit spectre de puissance
Pm(k) déterminé des composants d'écho non linéaires.
9. Procédé selon l'une quelconque des revendications 1 à 8, dans lequel le procédé détermine
une sortie de dispositif d'annulation améliorée
Om(k) sous la forme

où
└ ┘ε désigne un écrêtage à bas par ε > 0,
Em(k) est la sortie du dispositif d'annulation,
Rm(k) est un spectre de puissance de l'écho résiduel dû au désalignement de filtre au
mième bloc,
Qm(k) est un spectre de puissance de l'écho au-delà de la portée temporelle du dispositif
d'annulation au mième bloc, et
Pm(k) est un spectre de puissance des composants d'écho non linéaires au mième bloc.
10. Procédé selon l'une quelconque des revendications 1 à 9, dans lequel l'au moins un
coefficient de filtre
Wi,j(k) est actualisé sur la base de
11. Dispositif d'annulation d'écho (50) ayant au moins un canal et au moins un segment
par canal, le dispositif d'annulation d'écho (50) ayant des coefficients de filtre
Wi,j(k) pouvant être actualisés dans le domaine de fréquence à un bloc temporel
m, caractérisé en ce que le dispositif d'annulation d'écho est conçu pour :
la détermination d'une sortie de dispositif d'annulation Em(k) sur le mième bloc temporel lorsque la différence entre Zm(k) et un terme d'annulation basé sur

la détermination d'une erreur du dispositif d'annulation ε(k) sur le m - ℓième bloc temporel lorsque la différence entre Zm-ℓ (k) et un terme d'annulation basé sur

la détermination d'une erreur vers l'arrière ε(k) sous la forme

où Δ(k) est basé sur

la détermination d'une erreur vers l'avant ε(k) sous la forme

où Δ(k) est basé sur

la détermination d'une largeur de pas d'actualisation optimale µi,j,m(k) à partir de ladite sortie du dispositif d'annulation Em(k) sur le mième bloc temporel, à partir de ladite erreur de dispositif d'annulation ε(k) sur le m - ℓième bloc temporel, à partir de ladite erreur vers l'arrière ε(k) et à partir de ladite erreur vers l'avant ε(k) ; et
l'actualisation dudit au moins un coefficient de filtre Wi,j(k) en utilisant ladite largeur de pas d'actualisation optimale µi,j,m(k) ;
où
C correspond au nombre de canaux i = 0, ..., C - 1 du dispositif d'annulation d'écho (50),
S correspond au nombre de segments j = 0, ..., S - 1 par canal du dispositif d'annulation d'écho (50),
ℓ est un nombre entier différent de 0,
k désigne une classe de fréquence,
ε est une valeur positive,
Zm (k) correspond à la kième classe spectrale d'un signal de microphone au mième bloc temporel,
Zm-ℓ (k) correspond à la kième classe spectrale du signal de microphone au m - ℓième bloc temporel,
Xi,j,m(k) correspond à la kième classe spectrale d'un signal lointain au iième canal, jième segment et mième bloc temporel,
Xi,j,m-ℓ (k) correspond à la kième classe spectrale d'un signal lointain au iième canal, jième segment et m - ℓième bloc temporel,
λ est un facteur de dépassement, et
a, b sont des indices de canal et de segment, respectivement.
12. Dispositif d'annulation d'écho (50) selon la revendication 11, dans lequel le dispositif
d'annulation de l'écho (50) est en outre conçu pour calculer la largeur de pas optimale
µi,j,m(k) soit par

soit par

où
Ψm(k) est un spectre de puissance des composants non annulables au
mième bloc temporel et
Φi,j(k) est un spectre de puissance du désalignement de chaque
i,jième filtre, où le dispositif d'annulation de l'écho (50) est conçu pour déterminer
Ψm(k) et
Φi,j(k) en résolvant l'ensemble des équations linéaires

soumises à des contraintes d'inégalité

où
Xi,j,m(k) correspond à la kième classe spectrale de puissance d'un signal lointain au iième canal, jième segment et mième bloc temporel, où ledit bloc de signal est de préférence soumis à une fenêtre,
Xi,j,m-ℓ(k) correspond à la kième classe spectrale de puissance d'un signal lointain au iième canal, jième segment et m - ℓième bloc temporel, où ledit bloc de signal est de préférence soumis à une fenêtre,
Y(k) correspond à la kième classe spectrale de puissance des composants non annulables au m - ℓième bloc,
T(k) est la kième classe de fréquence de l'actualisation en arrière normalisée, calculée sous la forme

et
T(k) est la kième classe de fréquence de l'actualisation en avant normalisée, calculée sous la forme
13. Dispositif d'annulation d'écho (50) selon la revendication 11 ou 12, dans lequel le
facteur de dépassement λ est dans la plage 1 < λ ≤ 2, et est de préférence égal à
2.
14. Dispositif d'annulation d'écho (50) selon l'une quelconque des revendications 11 à
13, dans lequel le dispositif d'annulation d'écho (50) est conçu pour déterminer une
sortie du dispositif d'annulation améliorée
Om(k) sous la forme

où
└ ┘ε désigne un écrêtage à bas par ε > 0,
Em(k) est la sortie de dispositif d'annulation,
Rm(k) est un spectre de puissance de l'écho résiduel dû au désalignement de filtre au
mième bloc,
Qm(k) est un spectre de puissance de l'écho au-delà de la portée temporelle du dispositif
d'annulation au mième bloc, et
Pm(k) est un spectre de puissance des composants d'écho non linéaires au mième bloc.
15. Dispositif d'annulation d'écho (50) selon l'une quelconque des revendications 11 à
14, dans lequel le dispositif d'annulation d'écho (50) est conçu pour actualiser l'au
moins un coefficient de filtre
Wi,j(k) sur la base de