FIELD OF THE INVENTION
[0001] The present invention relates generally to the field of acoustic echo cancellation
in telecommunications, and particularly to a pseudo spectrum-based acoustic echo canceller
which adaptively cancels echoes arising in hands-free audio and video teleconferencing
and related systems without requiring a state machine or training.
BACKGROUND OF THE INVENTION
[0002] Acoustic echo cancellers and their applications in the field of telecommunication
are well known to those skilled in the art. Many such cancellers and related technologies
have been described in various publications including the following patent documents:
U.S. Patent No. 5,548,642
U.S. Patent No. 5,530,724
U.S. Patent No. 5,506,901
U.S. Patent No. 5,428,562
U.S. Patent No. 5,406,583
U.S. Patent No. 5,394,392
U.S. Patent No. 5,384,806
U.S. Patent No. 5,329,586
U.S. Patent No. 5,206,854
U.S. Patent No. 5,163,044
U.S. Patent No. 5,146,494
U.S. Patent No. 5,016,271
U.S. Patent No. 5,001,701
U.S. Patent No. 4,918,685
U.S. Patent No. 4,817,081
U.S. Patent No. 4,464,545
[0003] A typical acoustic echo canceller currently available uses what-is-known-as an adaptive
filter which employs a well-known algorithm such as the algorithm known as the Least-Mean-Square
algorithm, or LMS. This algorithm continuously adapts to changes in the placement
of both the speaker and microphone and to changes in loudspeaker volume. For these
cancellers, a state machine is needed to automatically determine each of the four
states, i.e., receiving, transmitting, double-talk, and idle. In addition, in order
to cancel the echoes, these cancellers much be trained, that is, they must "leam"
the loudspeaker-to-microphone acoustic response function for the room it is servicing.
Also, the acoustic compensation length is determined by the length of the filter that
is determined by the host resource availability.
[0004] Kosaka et al discloses in "A Novel Frequency Domain Filtered-X LMS Algorithm For
Active Noise Reduction" 1997 IEEE April 1997, pages 403-406 a novel Frequency Filtered-X
LMS algorithm. The frequency domain algorithm is able to converge channel systems
by compensating for the coupling between control channels.
[0005] Duttweiler in US Patent No, 4,562,312 discloses the estimation of delays in incoming
and outgoing signals from a communication circuit. Obtaining the correlation between
the signals and estimating the delay between the signals so as to employ an echo canceller
to cancel echos developed in the delay.
OBJECT OF THE INVENTION
[0006] It is an object of the present invention to provide an acoustic echo canceller which
adaptively cancels echo arising in hands-free audio and video teleconferencing systems
and other related systems where echo cancellation is required.
[0007] It is an another object of the present invention to provide an acoustic echo canceller
which provides high-quality and low cost full duplex speech communication typical
of dedicated video conferencing systems.
[0008] It is yet another object of the present invention to provide an acoustic echo canceller
which does not require a state machine.
[0009] It is still yet another object of the present invention to provide an acoustic echo
canceller which does not require training.
[0010] It is still yet another object of the present invention to provide an acoustic echo
canceller which continuously adapts to changes in microphone and loudspeaker placement,
loudspeaker volume setting, and the movement of people.
[0011] It is still yet another object of the present invention to provide an acoustic echo
canceller which is independent of any standard.
[0012] It is still yet another object of the present invention to provide an acoustic echo
canceller which can be connected directly to a PC soundcard and an ordinary telephone
set.
SUMMARY OF THE INVENTION
[0013] A microphone array is used together with a block adaptive algorithm to effectively
suppress acoustic echo arising in hands free voice communication. A the same time,
the system is also capable of suppressing environmental noise.
[0014] The present echo canceller utilizes the principle that the spectrum pattem of human
speech does not change much in the short run. The present echo canceller takes 256
overlap 128 samples in 16 ms intervals, or sample blocks. The power spectrum taken
at time 0 and at any time within the 16 ms interval are generally the same. This is
true even though the waveform of the speech may change over time even in the short
run. The echoes are simply a delayed form of a speech signal. Therefore, in following
the principle described above, the spectrum of the speech signal and the spectrum
of the echo taking are substantially the same.
[0015] The inputs to the present echo canceller are
x(
t) and
y(
t),
y(
t) representing the incoming speech signal from a far-end speaker and
x(
t) representing the combination of speech signal from a near-end speaker and the echo.
The well-known normalized cross-correlation estimation between
x(
t) and
y(
t) is performed to determine the level of correlation between
x(
t) and
y(
t) which is quantitatively represented by the correlation coefficient
C, a value of 1 for
C being perfect correlation.
[0016] When the far-end speaker is speaking and the near-end speaker is not speaking,
x(
t) comprises of only the echo portion which is essentially a delayed form of
y(
t). In that case, there is almost a perfect correlation between
x(
t) and
y(
t) and the
C value is near 1. When the near-end speaker is speaking and the far-end speaker is
not speaking, the
x(
t) comprises only of the signal and the
C value is near 0. When both the near-end and the far-end speakers are speaking simultaneously,
the
C value may be between 0 and 1, but typically near to 0 since the two speech signals
will not be highly correlated. And of course, silence would result in a near 0 also,
since respective noises will not be highly correlated. Certain decisions are based
on whether the
C value exceeds certain thresholds.
[0017] Since the echo is generally a delayed
y(
t), the amount of delay is estimated by measuring the time shift required to produce
the maximum
C value. Once the delay is determined, the two channels of inputs are aligned by time-shifting
x(
t) to match
y(
t). The amplitude of the
x(
t) and
y(
t) is then normalized by first determining a certain gain factor, and then multiplying
y(
t) by the gain factor.
[0018] The processed forms of the input
x(
t) and
y(
t) are next processed by applying the well-known Hanning window. They are then transformed
into their respective frequency domain using the well-known fast Fourier transform
(FFT) and then to Bark Scales,
Px(
b) and
Py(
b), using the Bark Frequency Warping technique. The transfer function
H(
b) is then estimated using the Bark Scales. The transfer function is used to normalize
Py(
b), which, in turn together with
Px(
b), is used to estimate the gain
G(
b) which will be used to suppress the echo. Subsequently, the Bark Scales are unwarped
and the gain function is then used to suppress the echo from the input
x(
t). The well-known inverse FFT (IFFT) and overlap add are performed to yield an echo-free
signal.
BRIEF DESCRIPTION OF THE DRAWINGS
[0019]
FIG. 1 is a functional diagram illustrating the present echo canceller deployed in
a teleconference room setting.
FIG. 2 is functional block diagram illustrating the circuitry of the present echo
canceller.
FIGS. 3a through 3c is a continuous flow diagram illustrating the echo cancelling
process employed by the present echo canceller.
FIG. 4 is a lookup Table 1 listing values for Gs.
FIG. 5 is a lookup Table 2 listing values for L̅b(b).
FIG. 6 is a lookup Table 3 listing values for Wi.
DETAILED DESCRIPTION OF THE INVENTION
[0020] FIG. 1 illustrates schematically the present echo canceller 1 placed in a telephone
conference system operating in a room 10. The echo canceller 1 is serially connected
to a telecommunications network through incoming line 15 and outgoing line 16. Room
reverberative surfaces 18 define multiple echo paths which depend on room geometry.
Two such echo paths, 20 and 21, are illustrated. Speech originating from a far-end
speaker (speaker not shown) emanating from the room loudspeaker 1 travels along the
echo paths 20 and 21, among others paths, and enters microphone 25 with various time
delays. Speech 31 from the near-end speaker 30 also enters the microphone 25. Both
the speech signal and the echo, denoted as
x(
t), travel along the line 35 and into the echo canceller 1. The speech signal from
the far-end speaker, denoted as
y(
t), which is essentially the same as the echo without the delay, is also an input to
the echo canceller 1 via line 15.
[0021] To optimize the performance of the present echo canceller, a microphone array consisting
of 3 microphones is used instead of a single microphone, and a well-known beam-forming
technique is employed. This arrangement enhances the strength of the near-end speech
signal while reducing the strength of the echo signal from the loudspeaker. This occurs
because the array forms an acoustic beam at the signal direction and a null in the
speaker direction. It has been found that this microphone array significantly enhances
the performance of embodiments of the present invention.
[0022] The present echo canceller utilizes the principle that the spectrum pattern of human
speech does not change much in the short run. The present echo canceller takes 256
overlap 128 samples in 16 ms intervals, or sample blocks. The power spectrum of a
speech signal taken at time 0, for instance, and at any time within the 16 ms interval
are essentially the same. This is true even though the waveform of the speech may
change over time even in the short run. In referring to FIG. 1, the echo taking the
paths 21 and 20, for instance, are simply a delayed form of
y(
t). Therefore, in following the principle described above, the spectrum of the speech
signal from the far-end speaker's speech signal
y(
t) and the spectrum of the echo taking the paths 21 and 20 are substantially the same.
The following description will make it clearer to the those skilled in the art, how
this principle is utilized in the present echo canceller to cancel the echo in a manner
which is more effective than the currently-available systems.
[0023] FIG. 2 illustrates a functional block diagram representing the circuitry for the
echo canceller 1 referred to in FIG. 1. Typically, the circuitry would be implemented
in a DSP chip or a microprocessor, though it can be implemented in other ways which
are known to one skilled in the art. A brief description of the blocks will be given
for FIG. 2. A more detailed flow diagram and description for the echo cancellation
process employed by the circuit of FIG. 2 shall follow thereafter.
[0024] Referring to FIG. 2 in conjunction with FIG. 1, the inputs to the circuit are
x(
t) and
y(
t),
y(
t) representing the incoming speech signal from the far-end speaker and
x(
t) representing the combination of speech signal from the near-end speaker and the
echo. The well-known normalized cross-correlation estimation between
x(
t) and
y(
t) is performed in block 100 to determine the level of correlation between
x(
t) and
y(
t) which is quantitatively represented by the correlation coefficient
C, a value of 1 for
C being perfect correlation.
[0025] When the far-end speaker is speaking and the near-end speaker 30 (see FIG. 1) is
not speaking,
x(
t) comprises of only the echo portion which is essentially a delayed form of
y(
t). In that case, there is almost a perfect correlation between
x(
t) and
y(
t) and the
C value is near 1. When the near-end speaker 30 is speaking and the far-end speaker
is not speaking, the
x(
t) comprises only of the signal and the
C value is near 0. When both the near-end 30 and the far-end speakers are speaking
simultaneously, the
C value may be between 0 and 1, but typically near to 0 since the two speech signals
will not be highly correlated. And of course, silence would result in a near 0 also,
since respective noises will not be highly correlated. Certain decisions are based
on whether the
C value exceeds certain thresholds.
[0026] Since the echo is generally a delayed
y(
t)
, the amount of delay is estimated in block 120 by measuring the time shift required
to produce the maximum
C value. Once the delay is determined, the two channels of inputs are aligned in block
130 by time-shifting
y(
t) to match
x(
t). The amplitude of the
x(
t) and
y(
t) is then normalized in block 140 by first determining a certain gain factor, and
then multiplying
y(
t) by the gain factor in block 145.
[0027] The processed form of the input
x(
t) is next processed in blocks 150 through 165; the processed form of the input
y(
t) is next processed 151 through 166. Because both channels are processed in an identical
manner which is well known and understood, only a brief description will be provided.
In blocks 150 and 151, the well-known Hanning window is applied to the processed inputs.
They are then transformed into their respective frequency domain using the well-known
fast Fourier transform (FFT), blocks 155 and 156, and then to Bark Scales,
Px(
b) and
Py(
b), using the Bark Frequency Warping technique in blocks 165 and 166.
[0028] In block 170, the transfer function
H(
b) is estimated. The transfer function is then used in block 175 to normalize
Py(
b), which is then used to estimate the gain
G(
b) which will be used to suppress the echo. In block 180, the Bark Scales are unwarped.
The gain function is then used to suppress the echo from the input spectrum in block
185. The well-known inverse FFT (IFFT) is performed in block 190 and the overlap add
in block 195 to yield an echo-free signal.
[0029] Using the flow diagrams of FIGs. 3a through 3c and the circuit diagram of FIG. 2,
the echo cancellation process employed by embodiments of the present invention will
now be described in greater detail.
[0030] Referring now to FIG. 3a, M samples (in this case 256 overlap 128, though other values
are possible) are taken from the inputs
x(
t) and
y(
t) in step 205 at 16 ms block intervals (8 KHz sampling rate). Sometimes a dc component
exists with the inputs and so it is removed, step 210, using a common procedure well
known to those skilled in the art. The next step, 220, is to compute the normalized
cross-correlation as represented by a value
C where,

where T denotes the transpose of a vector. A number of
C values will result from this calculation so in step 220, the maximum value representing
C, or
Cmax, is chosen.
[0031] Once
Cmax is found, the amount of delay between the two inputs, or
Dn, is estimated in step 230. A comparison is made in step 232 to determine if
Cmax > ρ
new where ρ
new initially has a value of 0. If the condition is met, i.e.,
Cmax > ρ
new, then ρ
new is updated following the formula
ρnew = γ
Cmax where a value for
γ is empirically chosen to be 0.8. The delay
Dn is then updated based on the most current value of
Cmax. On the other hand, if the condition
Cmax > ρ
new is not met, then ρ
new is updated using the formula
ρnew = y
ρold where
ρold simply represents the previous ρ
new, and the delay
Dn from the previous sample block is used. Whether or not the delay
Dn is updated or not, the two inputs,
x(
t) and
y(
t), are aligned by delaying the
y(
t) by the amount
Dn in step 245.
[0032] It is important to note here that while the updating of the delay
Dn is a process included in the preferred embodiment of present invention, it is not
crucial. For instance, the present canceller can still function, though not as optimally,
even if steps 232, 234, 236, and 240 were eliminated, and step 245 were to be performed
immediately after 230 using the same
Dn each time.
[0033] After the alignment of the inputs in step 245, in step 250, an amplitude normalization
is performed on the inputs using a gain normalization factor, Z which is initially
set at 1, but which is continually updated in step 269 when the stated condition is
met. In step 255, the well-known Hanning Window is applied and the FFT is computed
as follows:

Coherence estimation is performed in step 257, where the coherence factor, Φ, is computed
is as follows:

It can be seen from this formula that if
X(
f) and
Y(
f) are coherent, Φ will be near to 1 which indicates that only the echo is present.
However, if Φ is near to 0, that indicates either a double-talk or only near-end speech
or only silence. The coherence factor, Φ, is used together with a non-linear energy
function described in step 267 (see below) to further control the echo suppression.
[0034] Thereafter, in step 260,
Px and
Py are computed as follows:

where ε is a scaling factor which controls the amount of echo to be suppressed and
is a trade-off between speech quality and echo suppression. In step 265,
Px and
Py are converted to Bark Scales
Px(
b) and
Py(
b) using the well-known Bark Frequency Warping technique.
[0035] A non-linear energy computation is performed in step 267 where the energy,
En, is computed as follows:

where
L represents the number of Bark frequency band. In the preferred embodiment
L = 18 is used.
[0036] In step 269, the gain normalization factor, Z, is updated if the following conditions
are met: Φ > τ and E
n>T
n. The gain normalization factor, Z, is computed as follows:

where σ < 1. It is important to note that while this is the preferred method other
gain normalization methods may be used.
[0037] In step 271, it is determined if the condition E
n < T
n is met. If yes, T
n is updated in step 273. T
n is computed as follows:

where

is the T
n from the previous run where
V < 1. It is important to note here that the noise threshold, T
n, is initially estimated during the silence period. It is computed as follows:

where θ is chosen between the range 1.125 and 1.25.
[0038] In step 275, it is determined whether Φ > τ and E
n > T
n. In the preferred embodiments, τ = 0.65, though a different value may be optimal
for τ under different configuration, e.g., different microphone set-up. If the condition
in step 275 is met, the transfer function
H(
b) is updated from its initial value of 1 in step 280. If the condition in step 275
is not met, then the step 285 is performed without updating the
H(
b). The
H(
b) is calculated as follows:

where α < 1
In step 285,
Py(
b) is normalised by
H(
b) as follows:

A buffer is provided to store M old values of
P̃y (
b). In step 295, the total echo power is computed as follows:
Wi is a weighting fraction and its value depends on the echo path characteristics. The
typical values are listed in Table 3. In step 300, the value of
Gs is found by referring to lookup Table 1 using the current value for Φ.
[0039] In step 310,
Rrpr(
b) is computed as follows:

where

where γ is a smooth factor with γ < 1 (γ ≈ 0.02) and
P̅rpo initially has a value of 0.
and

where if
Prpo(
b) < 0 then
Prpo(
b) = 0
and

In step 315, L(b) is computed as follows:

In step 320, a look-up Table 2 is used to find a value for
L̅b(
b). In step 325, the gain
G(
b) is computed as follows:

In step 330,
P̅rpo is computed as follows:

After the step 330, the steps 310, 315, 320, 325 and 330 are repeatedly performed
for each sample block of input, each loop producing an updated value for the parameters
involved.
[0040] In step 335, the
G(
b) is unwarped to produce
G(
f). The output spectrum is then computed in step 340 as follows:

In step 345 the well-known inverse FFT (IFFT) and overlap add are performed on
X̃(
f) and to produce an echo-free signal
X̃(
t).
[0041] It is very important for one of ordinary skilled in the art to understand that many
of the steps and/or components of the preferred embodiment of the echo canceller of
embodiments of the present invention are included as a way of optimizing the performance
of the canceller, and, therefore, may be substituted or even eliminated in some instances
without negating the function and the purpose of the present invention. In addition,
although the preferred embodiment of the present invention was described in the context
of a teleconferencing system, it is clear that the present echo canceller may be used
in other telecommunications systems where echoes are present in the similar manner
as the scenarios described herein. While one skilled in the art could certainly appreciate
these principles, some examples will be given for illustration purposes.
[0042] For instance, in referring to FIG. 2 and FIG. 3, the cross-correlation estimation
technique employed here may be substituted with other techniques for determining the
correction between two signals. Also, the amplitude normalisation, the use of Hanning
Window and Bark Scales, while contributing to the effectiveness of the preferred embodiment
of the present invention, may be eliminated under some circumstances without completely
negating the function of the present invention. The Bark Scales, for instance, are
used in this case as way of reducing computation time and, therefore, may not unduly
affect the performance of the present echo canceller. In addition, although the Hanning
Window was found to be optimal in this case, it may be replaced with other windows.
Similarly, while the choice to take 256 overlap 128 samples in 16 ms intervals was
found to be optimal in this case, other sample sizes and intervals may be chosen.
The presently disclosed embodiments are, therefore, to be considered in all respects
as illustrative and not restrictive, the scope of the invention being indicated by
the appended claims.
1. An acoustic echo canceller (1) for a telecommunications system adapted for communication
between a far-end speaker and a near-end speaker (30), said system having a microphone
(25) and a loud-speaker (23), said microphone (25) receiving a speech signal from
said near-end speaker (30) and an echo signal of a speech signal from said far-end
speaker emanating from said loud-speaker, (23) said echo canceller (1) comprising:
a means adapted for collecting samples of two inputs from said telecommunications
system, x(t) and y(t), said y(t) being said speech signal from said far-end speaker,
said x(t) being a combination of said speech signal from said near-end speaker and
said echo signal of said y(t);
a means (100) adapted for estimating a correlation between the x(t) and y(t);
a means (120) adapted for estimating a delay between the x(t) and y(t);
a means (130) adapted for aligning x(t) and y(t) by time-shifting y(t) by said delay
between x(t) and y(t);
said echo canceller characterized in that it comprises:
a means adapted for transforming the x(t) and time shifted y(t) signals into their
spectrums in a frequency domain;
a means (170) adapted for estimating a transfer function using the spectrum of x(t)
and the spectrum of time-shifted y(t) and adapted for normalising the spectrum of
time-shifted y(t) using the transfer function;
a means (175) adapted for estimating a gain function using the spectrum of x(t) and
the normalised spectrum of time-shifted y(t);
a means adapted for multiplying the spectrum of x(t) by said gain function; and a
means adapted for transforming the resulting spectrum into a time domain signal producing
an echo-free signal.
2. The echo canceller (1) as claimed in claim 1 further comprising a means adapted for
applying a Hanning Window (150, 151).
3. The echo canceller (1) as claimed in claim 1 or 2 further comprising a means (165,
166) adapted for Bark Scale Warping and means (180) adapted for Bark Scale Unwarping.
4. The echo canceller (1) as claimed in any preceding claim wherein 256 samples are collected
overlapping 128 samples in 16 ms block intervals.
5. The echo canceller (1) as claimed in any preceding claim further comprising a means
(195) adapted for overlap add.
6. A method of cancelling an acoustic echo in a telecommunications system adapted for
communication between a far-end speaker and a near-end speaker (30), said system having
a microphone (25) and a loud-speaker (23), said microphone (25) receiving a speech
signal from said near-end speaker (30) and an echo signal of a speech signal from
said far-end speaker emanating from said loud-speaker (23), said method comprising
the steps of:
a) collecting (205) samples of two inputs from said telecommunications system, x(t)
and y(t), said y(t) being said speech signal from said far-end speaker, said x(t)
being a combination of said speech signal from said near-end speaker (30) and said
echo signal of said y(t);
b) estimating (220) a correlation between the x(t) and y(t);
c) estimating (230) a delay between the x(t) and y(t);
d) aligning (245) x(t) and y(t) by time-shifting y(t) by said delay estimated in step
c);
said method characterized in that it comprises the following steps:
e) transforming (255) the x(t) and time-shifted y(t) signals into their spectrums,
in a frequency domain;
f) estimating a transfer function using the spectrum of x(t) and the spectrum of time-shifted
y(t);
g) normalising the spectrum of time-shifted y(t) using the transfer function;
h) estimating (325) a gain function using the spectrum of x(t) and the normalised
spectrum of time-shifted y(t)
i) multiplying (340) the spectrum of x(t) by said gain function; and
j) transforming (345) said resulting spectrum of step i) into a time domain signal
producing an echo-free signal.
7. The method as claimed in claim 6 further comprising the step of applying (255) a Hanning
Window after step d).
8. The method as claimed in claim 6 or 7 further comprising the step of Bark Scale Warping
(265) said spectrums after step e) and Bark Scale Unwarping (335) after step h).
9. The method as claimed in any of claims 6 to 8 wherein said 256 samples are collected
overlapping 128 samples in 16 ms block intervals.
10. The method as claimed in any of claims 6 to 9 further comprising the step of performing
overlap add (345).
11. The acoustic echo canceller (1) as claimed in any of claims 1 to 5, wherein
the means adapted for transforming the signals into their spectrum further comprises
a means (140) adapted for normalizing an amplitude of x(t) and time-shifted y(t);
a means (150) adapted for applying a Hanning Window and transforming said normalised
x(t) signal and normalised time-shifted y(t) signal into a frequency domain where
X(f) = xr(f) + jxi(f) and Y(f)=yr(f)+jyi(f) and where X(f)=FFT(x) and Y(f)=FFT(y);
a means (155, 156) adapted for computing a power spectrum Px and Py where Px = |xr(f)|+|xt(f)| + ε|xr(f)||xi(f)| and
Py = |yr(f)|+|yi(f)|+ ε|yr(f)||yi(f)| where ε is a scaling factor which controls
the amount of echo and noise to be suppressed;
a means (165,166) for Bark-scale warping Px and Py to yield bark scales Ps(b) and Py(b);
the means adapted for estimating the transfer function further comprises
a means (170) adapted for estimating a transfer function H(b) and normalizing Py(b) by said transfer function;
the means for estimating the gain function further comprises
a means (175) adapted for estimating a gain function G(b), said G(b) being calculated from Px(b) and Py(b);
the means adapted for multiplying further comprises
a means (180) adapted for Bark-scale uwarping said G(b) to yield G(f);
a means adapted for multiplying said signal X(f) by said gain function G(f) to yield X̃(f); and
the means adapted for transforming into a time-domain signal further comprises
a means (190) adapted for performing an inverse transform and overlap add (195) to
convert said signal X̃(f) to yield an echo-free signal X̃(t).
12. The acoustic echo canceller as claimed in claim 13 further comprising a means adapted
for updating said delay, said delay being updated based on a value of said correlation
between x(t) and y(t).
13. The acoustic echo canceller as recited in Claim 11 or 12 wherein said
H(
b) is calculated as follows:
14. The acoustic echo canceller as recited in any of Claims 11 to 13 wherein

where
L̅(b) is a factor which varies with respect to
Lb and
Lb =
Rrpr(
b)
Rpo(
b);
where

where

γ is a smooth factor with γ < 1;

and
P̅rpo initially has a value of 0;

where if
Prpo(
b) < 0 then
Prpo(
b) = 0
and

where
P̅y(b) is normalised Bark-scale
Py(
b);
where
Gs is a factor which varies with respect to coherence factor Φ and
1. Akustischer Echokompensator (1) für ein Telekommunikationssystem, das für eine Kommunikation
zwischen einem Sprecher an einem fernen Ende und einem Sprecher (30) an einem nahen
Ende ausgelegt ist, wobei das System ein Mikrophon (25) und einen Lautsprecher (23)
aufweist, wobei das Mikrophon (25) ein Sprachsignal von dem Sprecher (30) an dem nahen
Ende und ein Echosignal eines Sprachsignals von dem Sprecher an dem fernen Ende erhält,
das von dem Lautsprecher (23) stammt, wobei der Echokompensator (1) umfaßt:
ein Mittel, das zum Sammeln von Proben zweier Eingaben x(t) und y(t) von dem Telekommunikationssystem
ausgelegt ist, wobei y(t) das Sprachsignal von dem Sprecher an dem fernen Ende ist
und wobei x(t) eine Kombination des Sprachsignals des Sprechers an dem nahen Ende
und dem Echosignal von y(t) ist;
ein Mittel (100), das ausgelegt ist, um eine Korrelation zwischen x(t) und y(t) zu
berechnen;
ein Mittel (120), das ausgelegt ist, um eine Verzögerung zwischen x(t) und y(t) zu
berechnen;
ein Mittel (130), das ausgelegt ist, um x(t) und y(t) durch Zeitversetzung durch die
Verzögerung zwischen x(t) und y(t) auszurichten;
wobei der Echokompensator
dadurch gekennzeichnet ist, daß er umfaßt:
ein Mittel, das ausgelegt ist, das x(t)-Signal und das zeitversetzte y(t)-Signal in
deren Spektren in einem Frequenzbereich zu transformieren;
ein Mittel (170), das ausgelegt ist, eine Transferfunktion unter Verwendung des Spektrums
von x(t) und des Spektrums des zeitversetzten y(t) zu berechnen, und das ausgelegt
ist, das Spektrum des zeitversetzten y(t) unter Verwendung der Transferfunktion zu
normieren;
ein Mittel (175), das ausgelegt ist, um eine Verstärkungsfunktion unter Verwendung
des Spektrums x(t) und des normierten Spektrums des zeitversetzten y(t) zu berechnen;
ein Mittel, das ausgelegt ist, das Spektrum von x(t) durch die Verstärkungsfunktion
zu vervielfachen;
und ein Mittel, das ausgelegt ist, das resultierende Spektrum in ein Zeitbereichssignal
zu transformieren, das ein Echo freies Signal herstellt.
2. Echokompensator (1) nach Anspruch 1, des weiteren ein Mittel zur Anwendung eines Hann-Fensters
(150, 151) umfassend.
3. Echokompensator (1) nach Anspruch 1 oder 2, des weiteren ein Mittel (165, 166), das
für Bark-Skala-Skalierung ausgelegt ist, und Mittel (180) umfassend, das für Bark-Skala-Reskalierung
ausgelegt ist.
4. Echokompensator (1) nach einem der vorangehenden Ansprüche, wobei 256 Proben gesammelt
werden, die 128 Proben in 16ms Blockintervallen überlappen.
5. Echokompensator (1) nach einem der vorangehenden Ansprüche, des weiteren umfassend
ein Mittel (195), das für einen Überlappungszusatz ausgelegt ist.
6. Verfahren zum Löschen eines akustischen Echos in einem Telekommunikationssystem, das
zur Kommunikation zwischen einem Sprecher an einem fernen Ende und einem Sprecher
(30) an einem nahen Ende ausgelegt ist, wobei das System ein Mikrophon (25) und einen
Lautsprecher (23) aufweist, wobei das Mikrophon (25) ein Sprachsignal des Sprechers
(30) an dem nahen Ende und ein Echosignal des Sprechers an dem fernen Ende erhält,
welches von dem Lautsprecher (23) stammt, wobei das Verfahren die Schritte umfaßt:
(a) Sammeln (295) von Proben zweier Eingaben, x(t) und y(t), von dem Telekommunikationssystem,
wobei y(t) das Sprachsignal des Sprechers an dem fernen Ende ist, wobei x(t) eine
Kombination des Sprachsignals von dem Sprecher (30) an dem nahen Ende und des Echosignals
von dem y(t) ist;
(b) Berechnen (220) einer Korrelation zwischen x(t) und y(t);
(c) Berechnen (230) einer Verzögerung zwischen x(t) und y(t);
(d) Ausrichten (245) von x(t) und y(t) durch Zeitversetzen von y(t) durch die in Schritt
c) berechnete Verzögerung;
wobei das Verfahren
dadurch gekennzeichnet ist, daß es die folgenden Schritte umfaßt:
(e) Transformieren (255) des x(t)-Signals und des zeitversetzten y(t)-Signals in deren
Spektren in einem Frequenzbereich;
(f) Berechnen einer Transferfunktion unter Verwendung des Spektrums von x(t) und des
Spektrums des zeitversetzten y(t);
(g) Normieren des Spektrums des zeitversetzten y(t) unter Verwendung der Transferfunktion;
(h) Berechnen (325) einer Verstärkungsfunktion unter Verwendung des Spektrums von
x(t) und des normierten Spektrums des zeitversetzten y(t);
(i) Vervielfachen (340) des Spektrums von x(t) durch die Verstärkungsfunktion; und
(j) Transformieren (345) des resultierenden Spektrums des Schritts i) in ein Zeitbereichssignal,
das ein Echo freies Signal erzeugt.
7. Verfahren nach Anspruch 6, des weiteren umfassend den Schritt des Anwendens (255)
eines Hann-Fensters nach Schritt d).
8. Verfahren nach Anspruch 6 oder 7, des weiteren umfassend den Schritt der Bark-Skala-Skalierung
(265) des Spektrums nach Schritt e) und der Bark-Skala-Reskalierung (335) nach Schritt
h).
9. Verfahren nach einem der Ansprüche 6 bis 8, wobei die 256 Proben gesammelt werden,
die 128 Proben in 16ms Blockintervallen überlappen.
10. Verfahren nach einem der Ansprüche 6 bis 9, des weiteren den Schritt des Ausführens
eines Überlappungszusatzes umfassend.
11. Akustischer Echokompensator (1) nach einem der Ansprüche 1 bis 5, wobei das Mittel,
das zum Transformieren der Signale in deren Spektren ausgelegt ist, des weiteren umfaßt:
ein Mittel (140), das ausgelegt ist, um eine Amplitude von x(t) und des zeitversetzten
y(t) zu normieren;
ein Mittel (150), das ausgelegt ist, ein Hann-Fenster anzuwenden und das normierte
x(t)-Signal und das normierte zeitversetzte y(t)-Signal in einen Frequenzbereich zu
transformieren, wo X(f)=xr(f) + jxi(f) und Y(f)=yr(f) + jyi(f) und wo X(f)=FFT(x) und Y(f)=FFT(y);
ein Mittel (155, 156), das ausgelegt ist, ein Leistungsspektrum Px und Py zu berechnen, wo Px = |xr(f)|+|xi(f)|+ε|xr(f)||xi(f)| und Py =|yr(f)|+|yi(f)|+ε|yr(f)||yi(f)|, wobei ε ein Skalierungsfaktor ist, welcher die Menge des zu unterdrückenden Echos
und des Rauschens steuert;
ein Mittel (165, 166) zur Bark-Skala-Skalierung von Px und Py, um Px(b) und Py(b) zu erhalten;
wobei das Mittel, das zur Berechnung der Transferfunktion ausgelegt ist, des weiteren
umfaßt:
ein Mittel (170), das zur Berechnung einer Transferfunktion H(b) und zur Normierung
von Py(b) durch die Transferfunktion ausgelegt ist;
wobei das Mittel zur Berechnung der Verstärkungsfunktion des weiteren umfaßt:
ein Mittel (175), das zur Berechung einer Verstärkungsfunktion G(b) ausgelegt ist,
wobei G(b) von Px(b) und Py(b) berechnet wird;
wobei das Mittel zur Vervielfachung des weiteren umfaßt:
ein Mittel (180), das zur Bark-Skala-Skalierung von G(b) ausgelegt ist, um G(f) zu
erhalten;
ein Mittel, das zur Vervielfachung des Signals X(f) durch die Verstärkungsfunktion
G(f) ausgelegt ist, um X̃(f) zu erhalten; und
wobei das Mittel, das zum Transformieren in ein Zeitbereichssignal ausgelegt ist,
des weiteren umfaßt:
ein Mittel (190), das zum Ausführen eines inversen Transformierungs- und Überlappungszusatzes
ausgelegt ist, um das Signal X̃(f) umzuwandeln, um ein Echo freies Signal X̃(t) zu erhalten.
12. Akustischer Echokompensator nach Anspruch 11, des weiteren umfassend ein Mittel, das
ausgelegt ist, um die Verzögerung zu aktualisieren, wobei die Verzögerung basierend
auf einem Wert der Korrelation zwischen x(t) und y(t) aktualisiert wird.
13. Akustischer Echokompensator nach Anspruch 11 oder 12, wobei H(b) wie folgt berechnet
wird:

wobei α < 1.
14. Akustischer Echokompensator nach einem der Ansprüche 11 bis 13, wobei

wo
L̅(b) ein Faktor ist, der hinsichtlich L
b variiert, wobei
Lb =
Rrpr(
b)
Rpo(
b);
wo

wo

γ ist ein Glättungsfaktor mit γ <1;

und
P̅rpo hat anfänglich einen Wert von 0;

wo, wenn P
rpo(b) <0, dann P
rpo(b)=0
und

wo
P̅y(
b) eine normierte Bark-Skala P
y(b) ist;
wo G
s ein Faktor ist, der hinsichtlich des Koheränzfaktors Φ variiert, wobei
1. Annuleur (1) d'écho acoustique pour un système de télécommunication apte à communiquer
entre un haut parleur éloigné et un haut parleur proche, dit système ayant un microphone
(25) et un haut parleur (23), le dit microphone recevant un signal de parole du haut
parleur (30) proche, et un signal d'écho d'un signal de parole du haut parleur éloigné,
émanant du dit haut parleur (23), le dit annuleur (1) d'écho comprenant :
- des moyens aptes à réceptionner des échantillons de deux entrées provenant du système
de télécommunication, x(t) et y(t), le dit y(t) étant le dit signal de parole provenant
du dit haut parleur éloigné, le dit x(t) étant une combinaison du dit signal de parole
du haut parleur proche et du dit signal d'écho de y(t) ;
- des moyens (100) aptes à estimer une corrélation entre le x(t) et le y(t) ;
- des moyens (120) aptes à estimer un retard entre le x (t) et le y(t) ;
- des moyens (130) aptes à aligner x(t) et y(t) par décalage de temps de y (t) par
le dit retard entre x(t) et y(t) ;
Le dit annuleur d'écho étant
caractérisé en ce qu'il comporte :
- un moyen apte à transformer les signaux x(t) et décalés en temps y(t) en utilisant
leur spectre dans un domaine de fréquence ;
- des moyens (170) aptes à estimer une fonction de transfert en utilisant le spectre
de x(t) et le spectre y(t) décalé dans le temps, et apte à normaliser le spectre de
y(t) H décalé dans le temps, en utilisant la fonction de transfert ;
- un moyen (175) apte à estimer une fonction de gain utilisant le spectre de x(t)
et le spectre normalisé du y(t) décalé dans le temps ;
- des moyens aptes à multiplier le spectre de x(t) par la dite fonction de gain ;
- un moyen apte à transformer le spectre résultant en un signal en domaine de temps
produisant un signal libre d'écho.
2. Annuleur d'écho selon la revendication 1, caractérisé en ce qu'il comporte en outre des moyens aptes à appliquer une fenêtre de Hanning (150, 151).
3. Annuleur d'écho selon l'une des revendications 1 ou 2, caractérisé en ce qu'il comporte en autre un moyen (165, 166) de distorsion d'échelle de Bark et des moyens
(180) de redressement de distorsion d'échelle de Bark.
4. Annuleur d'écho selon l'une des revendications précédentes, caractérisé en ce que 256 échantillons sont recueillis et recouvrant 128 échantillons dans des intervalles
de blocs de 16ms.
5. Annuleur d'écho selon l'une des revendications précédentes, caractérisé en ce qu'il comporte en outre un moyen (195) apte à l'addition de recouvrement.
6. Procédé pour annuler un écho acoustique dans un système de télécommunication apte
à la communication entre un haut parleur éloigné et un haut parleur proche (30), dit
système comprenant un microphone (25) et un haut parleur (23), le dit microphone (25)
recevant un signal de parole du dit haut parleur (30) proche et un signal d'écho du
signal de parole du haut parleur proche, émanant du dit haut parleur (23), dit procédé
comprenant les étapes de :
a) recueillir (205) des échantillons de deux entrées provenant du système de télécommunication,
x(t) et y(t), le dit y(t) étant le dit signal de parole du dit haut parleur éloigné,
le dit x(t) étant une combinaison du dit signal de parole du haut parleur proche (30)
et du dit signal d'écho du dit y(t) ;
b) estimer (220) une corrélation entre le x(t) et y(t) ;
c) estimer (230) un retard entre le x(t) et y(t) ;
d) aligner (245) x(t) et y(t) par décalage de temps de y(t) du dit retard estimé dans
l'étape c) ;
Le dit procédé étant
caractérisé en ce qu'il comporte les étapes suivantes :
e) transformer (255) les signaux x(t) et y(t) décalés dans le temps en leurs spectres
H dans un domaine de fréquence ;
f) estimer une fonction de transfert en utilisant le spectre de y(t) et spectre de
y(t) décalé dans le temps ;
g) normaliser le spectre de y(t) décalé dans le temps en utilisant la fonction de
transfert ;
h) estimer (325) une fonction de gain H en utilisant le spectre de y(t) et le spectre
normalisé de y(t) décalé dans le temps ;
i) multiplier (340) le spectre de y(t) par la dite fonction de gain ;
j) transformer (345) le dit spectre résultant de l'étape i) en un signal de domaine
de temps produisant un signal libre d'écho.
7. Procédé selon la revendication 6, caractérisé en ce qu'il comporte en outre l'étape d'appliquer (255) une fenêtre de Hanning après l'étape
d).
8. Procédé selon l'une des revendications 6 ou 7, caractérisé en ce qu'il comporte en outre l'étape de distorsion d'échelle de Bark (265) aux dits spectres
après l'étape e) et de redressement de distorsion (355) après l'étape h).
9. Procédé selon l'une des revendications 6 à 8, caractérisé en ce que 256 échantillons sont recueillis recouvrant 128 échantillons dans des intervalles
de blocs de 16ms.
10. Procédé selon l'une des revendications 6 à 9, caractérisé en ce qu'il comporte en outre l'étape de réaliser des ajouts de recouvrement (345).
11. Annuleur d'écho acoustique selon l'une des revendications 1 à 5,
caractérisé en ce que les moyens aptes à transformer les signaux en leurs spectres comportent en outre
:
- un moyen (140) apte à normaliser une amplitude de x(t) et de y(t) décalé dans le
temps ;
- un moyen (150) apte à appliquer une fenêtre de Hanning et transformer le dit signal
normalisé x(t) et le signal normalisé y(t) décalé dans le temps en un domaine de fréquence
où X(f)=xr(f)+jxi(f) et Y(f) = yr(f) + jyi(f) et où X(f) = FFT(x) et Y(f) = FFT(y);
- un moyen (155,156) aptes à calculer un spectre de puissance Px et Py où Px = |xr(f)|+|xi(f)|+ε|xr(f)||xi(f)| et Py = |yr(f)|+|yi(f)| + ε|yr(f)||yi(f)|
où ε est un facteur d'échelle qui contrôle la quantité d'écho et de bruit destinée
à être supprimée ;
- un moyen (165, 166) pour distorsion d'échelle de Bark Px et Py pour redresser la distorsion d'échelle de Bark Px(b) et Py(b);
Or des moyens aptes à l'estimation de la fonction de transfert comportent en outre.
- un moyen (170) apte à estimer une fonction de transfert H(b) normalisé Py (b) par la dite fonction de transfert ;
Le moyen pour estimer la fonction de gain comporte en outre.
- un moyen (175) apte à estimer une fonction de gain G (b), le dit G (b) étant calculé
à partir de Px(b) et Py(b) ;
- les moyens aptes à multiplier comportent en outre :
- un moyen (180) apte à redresser la distorsion d'échelle de Bark du dit G (b) pour
produire G (f) ;
- un moyen apte à multiplier le dit signal X (f) par la dite fonction de gain G (f)
pour produire X̃(f) ; et
les moyens aptes à transformer en un signal à domaine de temps comportent en outre
un moyen (190) apte à réaliser une transformation inverse et un ajout de recouvrement
(195) pour convertir le dit signal X̃(f) pour produire un signal X̃ (t) libre d'écho.
12. Annuleur d'écho acoustique selon la revendication 11, caractérisé en ce qu'il comporte en outre un moyen apte à mettre à jour le dit retard, le dit retard étant
mis à jour sur la base d'une valeur de la dite corrélation entre x (t) et y (t).
13. Annuleur d'écho selon l'une des revendications 11 ou 12,
caractérisé en ce que le dit
H(
b) est calculé comme suit :
14. Annuleur d'écho acoustique selon l'une des revendications 11 à 13,
caractérisé en ce que G(b)= Rrpr(
b)
L̅b(
b) où
L̅(b) est un facteur qui varie par rapport à
Lb=Rrpr(
b)
Rpo(
b);
Où

Où

γ est un facteur de lissage avec γ
< 1;

et
P̅rpo initialement présente une valeur de 0 ;

Et

Où
P̅y(b) est normalisé à l'échelle de Bark
Py(
b);
Où
Gs est un facteur qui varie par rapport au facteur de cohérence Φ et