Acoustic echo canceller - Patent 0895397

(19)

(11)

EP 0 895 397 B9

(12)	CORRECTED EUROPEAN PATENT SPECIFICATION
	Note: Bibliography reflects the latest situation

(15)	Correction information:
	Corrected version no 1 (W1 B1)
	Corrections, see Claims

(48)	Corrigendum issued on:
	17.01.2007 Bulletin 2007/03

(45)	Mention of the grant of the patent:
	23.08.2006 Bulletin 2006/34

(21)	Application number: 98306066.6

(22)	Date of filing: 30.07.1998

(51)

International Patent Classification (IPC):

H04M 9/08^(2006.01)

(54)	Acoustic echo canceller Akustischer Echokompensator Annuleur d'écho acoustique

(84)	Designated Contracting States:
	DE ES FR GB IT

(30)

Priority:

01.08.1997 SG 9702744

(43)	Date of publication of application:
	03.02.1999 Bulletin 1999/05

(73)	Proprietor: Bitwave PTE Ltd.
	Singapore 117684 (SG)

(72)	Inventor:
	Kok, Hui Siew Spanish Village, Singapore 268848 (SG)

(74)	Representative: Harrison Goddard Foote
	Belgrave Hall Belgrave Street Leeds LS2 8DD Leeds LS2 8DD (GB)

(56)

References cited: :

US-A- 4 562 312

US-A- 5 537 647

KOSAKA T ET AL: "A NOVEL FREQUENCY DOMAIN FILTERED-X LMS ALGORITHM FOR ACTIVE NOISE REDUCTION" 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VL MUNICH, APR. 21 - 24, 1997, vol. 1, 21 April 1997, pages 403-406, XP000789198 INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS

Note: Within nine months from the publication of the mention of the grant of the European patent, any person may give notice to the European Patent Office of opposition to the European patent granted. Notice of opposition shall be filed in a written reasoned statement. It shall not be deemed to have been filed until the opposition fee has been paid. (Art. 99(1) European Patent Convention).

Description

FIELD OF THE INVENTION

[0001] The present invention relates generally to the field of acoustic echo cancellation in telecommunications, and particularly to a pseudo spectrum-based acoustic echo canceller which adaptively cancels echoes arising in hands-free audio and video teleconferencing and related systems without requiring a state machine or training.

BACKGROUND OF THE INVENTION

[0002] Acoustic echo cancellers and their applications in the field of telecommunication are well known to those skilled in the art. Many such cancellers and related technologies have been described in various publications including the following patent documents:

U.S. Patent No. 5,548,642

U.S. Patent No. 5,530,724

U.S. Patent No. 5,506,901

U.S. Patent No. 5,428,562

U.S. Patent No. 5,406,583

U.S. Patent No. 5,394,392

U.S. Patent No. 5,384,806

U.S. Patent No. 5,329,586

U.S. Patent No. 5,206,854

U.S. Patent No. 5,163,044

U.S. Patent No. 5,146,494

U.S. Patent No. 5,016,271

U.S. Patent No. 5,001,701

U.S. Patent No. 4,918,685

U.S. Patent No. 4,817,081

U.S. Patent No. 4,464,545

[0003] A typical acoustic echo canceller currently available uses what-is-known-as an adaptive filter which employs a well-known algorithm such as the algorithm known as the Least-Mean-Square algorithm, or LMS. This algorithm continuously adapts to changes in the placement of both the speaker and microphone and to changes in loudspeaker volume. For these cancellers, a state machine is needed to automatically determine each of the four states, i.e., receiving, transmitting, double-talk, and idle. In addition, in order to cancel the echoes, these cancellers much be trained, that is, they must "leam" the loudspeaker-to-microphone acoustic response function for the room it is servicing. Also, the acoustic compensation length is determined by the length of the filter that is determined by the host resource availability.

[0004] Kosaka et al discloses in "A Novel Frequency Domain Filtered-X LMS Algorithm For Active Noise Reduction" 1997 IEEE April 1997, pages 403-406 a novel Frequency Filtered-X LMS algorithm. The frequency domain algorithm is able to converge channel systems by compensating for the coupling between control channels.

[0005] Duttweiler in US Patent No, 4,562,312 discloses the estimation of delays in incoming and outgoing signals from a communication circuit. Obtaining the correlation between the signals and estimating the delay between the signals so as to employ an echo canceller to cancel echos developed in the delay.

OBJECT OF THE INVENTION

[0006] It is an object of the present invention to provide an acoustic echo canceller which adaptively cancels echo arising in hands-free audio and video teleconferencing systems and other related systems where echo cancellation is required.

[0007] It is an another object of the present invention to provide an acoustic echo canceller which provides high-quality and low cost full duplex speech communication typical of dedicated video conferencing systems.

[0008] It is yet another object of the present invention to provide an acoustic echo canceller which does not require a state machine.

[0009] It is still yet another object of the present invention to provide an acoustic echo canceller which does not require training.

[0010] It is still yet another object of the present invention to provide an acoustic echo canceller which continuously adapts to changes in microphone and loudspeaker placement, loudspeaker volume setting, and the movement of people.

[0011] It is still yet another object of the present invention to provide an acoustic echo canceller which is independent of any standard.

[0012] It is still yet another object of the present invention to provide an acoustic echo canceller which can be connected directly to a PC soundcard and an ordinary telephone set.

SUMMARY OF THE INVENTION

[0013] A microphone array is used together with a block adaptive algorithm to effectively suppress acoustic echo arising in hands free voice communication. A the same time, the system is also capable of suppressing environmental noise.

[0014] The present echo canceller utilizes the principle that the spectrum pattem of human speech does not change much in the short run. The present echo canceller takes 256 overlap 128 samples in 16 ms intervals, or sample blocks. The power spectrum taken at time 0 and at any time within the 16 ms interval are generally the same. This is true even though the waveform of the speech may change over time even in the short run. The echoes are simply a delayed form of a speech signal. Therefore, in following the principle described above, the spectrum of the speech signal and the spectrum of the echo taking are substantially the same.

[0015] The inputs to the present echo canceller are x(t) and y(t), y(t) representing the incoming speech signal from a far-end speaker and x(t) representing the combination of speech signal from a near-end speaker and the echo. The well-known normalized cross-correlation estimation between x(t) and y(t) is performed to determine the level of correlation between x(t) and y(t) which is quantitatively represented by the correlation coefficient C, a value of 1 for C being perfect correlation.

[0016] When the far-end speaker is speaking and the near-end speaker is not speaking, x(t) comprises of only the echo portion which is essentially a delayed form of y(t). In that case, there is almost a perfect correlation between x(t) and y(t) and the C value is near 1. When the near-end speaker is speaking and the far-end speaker is not speaking, the x(t) comprises only of the signal and the C value is near 0. When both the near-end and the far-end speakers are speaking simultaneously, the C value may be between 0 and 1, but typically near to 0 since the two speech signals will not be highly correlated. And of course, silence would result in a near 0 also, since respective noises will not be highly correlated. Certain decisions are based on whether the C value exceeds certain thresholds.

[0017] Since the echo is generally a delayed y(t), the amount of delay is estimated by measuring the time shift required to produce the maximum C value. Once the delay is determined, the two channels of inputs are aligned by time-shifting x(t) to match y(t). The amplitude of the x(t) and y(t) is then normalized by first determining a certain gain factor, and then multiplying y(t) by the gain factor.

[0018] The processed forms of the input x(t) and y(t) are next processed by applying the well-known Hanning window. They are then transformed into their respective frequency domain using the well-known fast Fourier transform (FFT) and then to Bark Scales, P_x(b) and P_y(b), using the Bark Frequency Warping technique. The transfer function H(b) is then estimated using the Bark Scales. The transfer function is used to normalize P_y(b), which, in turn together with P_x(b), is used to estimate the gain G(b) which will be used to suppress the echo. Subsequently, the Bark Scales are unwarped and the gain function is then used to suppress the echo from the input x(t). The well-known inverse FFT (IFFT) and overlap add are performed to yield an echo-free signal.

BRIEF DESCRIPTION OF THE DRAWINGS

[0019]

FIG. 1 is a functional diagram illustrating the present echo canceller deployed in a teleconference room setting.

FIG. 2 is functional block diagram illustrating the circuitry of the present echo canceller.

FIGS. 3a through 3c is a continuous flow diagram illustrating the echo cancelling process employed by the present echo canceller.

FIG. 4 is a lookup Table 1 listing values for G_s.

FIG. 5 is a lookup Table 2 listing values for L̅_b(b).

FIG. 6 is a lookup Table 3 listing values for W_i.

DETAILED DESCRIPTION OF THE INVENTION

[0020] FIG. 1 illustrates schematically the present echo canceller 1 placed in a telephone conference system operating in a room 10. The echo canceller 1 is serially connected to a telecommunications network through incoming line 15 and outgoing line 16. Room reverberative surfaces 18 define multiple echo paths which depend on room geometry. Two such echo paths, 20 and 21, are illustrated. Speech originating from a far-end speaker (speaker not shown) emanating from the room loudspeaker 1 travels along the echo paths 20 and 21, among others paths, and enters microphone 25 with various time delays. Speech 31 from the near-end speaker 30 also enters the microphone 25. Both the speech signal and the echo, denoted as x(t), travel along the line 35 and into the echo canceller 1. The speech signal from the far-end speaker, denoted as y(t), which is essentially the same as the echo without the delay, is also an input to the echo canceller 1 via line 15.

[0021] To optimize the performance of the present echo canceller, a microphone array consisting of 3 microphones is used instead of a single microphone, and a well-known beam-forming technique is employed. This arrangement enhances the strength of the near-end speech signal while reducing the strength of the echo signal from the loudspeaker. This occurs because the array forms an acoustic beam at the signal direction and a null in the speaker direction. It has been found that this microphone array significantly enhances the performance of embodiments of the present invention.

[0022] The present echo canceller utilizes the principle that the spectrum pattern of human speech does not change much in the short run. The present echo canceller takes 256 overlap 128 samples in 16 ms intervals, or sample blocks. The power spectrum of a speech signal taken at time 0, for instance, and at any time within the 16 ms interval are essentially the same. This is true even though the waveform of the speech may change over time even in the short run. In referring to FIG. 1, the echo taking the paths 21 and 20, for instance, are simply a delayed form of y(t). Therefore, in following the principle described above, the spectrum of the speech signal from the far-end speaker's speech signal y(t) and the spectrum of the echo taking the paths 21 and 20 are substantially the same. The following description will make it clearer to the those skilled in the art, how this principle is utilized in the present echo canceller to cancel the echo in a manner which is more effective than the currently-available systems.

[0023] FIG. 2 illustrates a functional block diagram representing the circuitry for the echo canceller 1 referred to in FIG. 1. Typically, the circuitry would be implemented in a DSP chip or a microprocessor, though it can be implemented in other ways which are known to one skilled in the art. A brief description of the blocks will be given for FIG. 2. A more detailed flow diagram and description for the echo cancellation process employed by the circuit of FIG. 2 shall follow thereafter.

[0024] Referring to FIG. 2 in conjunction with FIG. 1, the inputs to the circuit are x(t) and y(t), y(t) representing the incoming speech signal from the far-end speaker and x(t) representing the combination of speech signal from the near-end speaker and the echo. The well-known normalized cross-correlation estimation between x(t) and y(t) is performed in block 100 to determine the level of correlation between x(t) and y(t) which is quantitatively represented by the correlation coefficient C, a value of 1 for C being perfect correlation.

[0025] When the far-end speaker is speaking and the near-end speaker 30 (see FIG. 1) is not speaking, x(t) comprises of only the echo portion which is essentially a delayed form of y(t). In that case, there is almost a perfect correlation between x(t) and y(t) and the C value is near 1. When the near-end speaker 30 is speaking and the far-end speaker is not speaking, the x(t) comprises only of the signal and the C value is near 0. When both the near-end 30 and the far-end speakers are speaking simultaneously, the C value may be between 0 and 1, but typically near to 0 since the two speech signals will not be highly correlated. And of course, silence would result in a near 0 also, since respective noises will not be highly correlated. Certain decisions are based on whether the C value exceeds certain thresholds.

[0026] Since the echo is generally a delayed y(t), the amount of delay is estimated in block 120 by measuring the time shift required to produce the maximum C value. Once the delay is determined, the two channels of inputs are aligned in block 130 by time-shifting y(t) to match x(t). The amplitude of the x(t) and y(t) is then normalized in block 140 by first determining a certain gain factor, and then multiplying y(t) by the gain factor in block 145.

[0027] The processed form of the input x(t) is next processed in blocks 150 through 165; the processed form of the input y(t) is next processed 151 through 166. Because both channels are processed in an identical manner which is well known and understood, only a brief description will be provided. In blocks 150 and 151, the well-known Hanning window is applied to the processed inputs. They are then transformed into their respective frequency domain using the well-known fast Fourier transform (FFT), blocks 155 and 156, and then to Bark Scales, P_x(b) and P_y(b), using the Bark Frequency Warping technique in blocks 165 and 166.

[0028] In block 170, the transfer function H(b) is estimated. The transfer function is then used in block 175 to normalize P_y(b), which is then used to estimate the gain G(b) which will be used to suppress the echo. In block 180, the Bark Scales are unwarped. The gain function is then used to suppress the echo from the input spectrum in block 185. The well-known inverse FFT (IFFT) is performed in block 190 and the overlap add in block 195 to yield an echo-free signal.

[0029] Using the flow diagrams of FIGs. 3a through 3c and the circuit diagram of FIG. 2, the echo cancellation process employed by embodiments of the present invention will now be described in greater detail.

[0030] Referring now to FIG. 3a, M samples (in this case 256 overlap 128, though other values are possible) are taken from the inputs x(t) and y(t) in step 205 at 16 ms block intervals (8 KHz sampling rate). Sometimes a dc component exists with the inputs and so it is removed, step 210, using a common procedure well known to those skilled in the art. The next step, 220, is to compute the normalized cross-correlation as represented by a value C where,

where T denotes the transpose of a vector. A number of C values will result from this calculation so in step 220, the maximum value representing C, or C_max, is chosen.

[0031] Once C_max is found, the amount of delay between the two inputs, or D_n, is estimated in step 230. A comparison is made in step 232 to determine if C_max > ρ_new where ρ_new initially has a value of 0. If the condition is met, i.e., C_max > ρ_new, then ρ_new is updated following the formula ρ_new = γ C_max where a value for γ is empirically chosen to be 0.8. The delay D_n is then updated based on the most current value of C_max. On the other hand, if the condition C_max > ρ_new is not met, then ρ_new is updated using the formula ρ_new = y ρ_old where ρ_old simply represents the previous ρ_new, and the delay D_n from the previous sample block is used. Whether or not the delay D_n is updated or not, the two inputs, x(t) and y(t), are aligned by delaying the y(t) by the amount D_n in step 245.

[0032] It is important to note here that while the updating of the delay D_n is a process included in the preferred embodiment of present invention, it is not crucial. For instance, the present canceller can still function, though not as optimally, even if steps 232, 234, 236, and 240 were eliminated, and step 245 were to be performed immediately after 230 using the same D_n each time.

[0033] After the alignment of the inputs in step 245, in step 250, an amplitude normalization is performed on the inputs using a gain normalization factor, Z which is initially set at 1, but which is continually updated in step 269 when the stated condition is met. In step 255, the well-known Hanning Window is applied and the FFT is computed as follows:

Coherence estimation is performed in step 257, where the coherence factor, Φ, is computed is as follows:

It can be seen from this formula that if X(f) and Y(f) are coherent, Φ will be near to 1 which indicates that only the echo is present. However, if Φ is near to 0, that indicates either a double-talk or only near-end speech or only silence. The coherence factor, Φ, is used together with a non-linear energy function described in step 267 (see below) to further control the echo suppression.

[0034] Thereafter, in step 260, P_x and P_y are computed as follows:

where ε is a scaling factor which controls the amount of echo to be suppressed and is a trade-off between speech quality and echo suppression. In step 265, P_x and P_y are converted to Bark Scales P_x(b) and P_y(b) using the well-known Bark Frequency Warping technique.

[0035] A non-linear energy computation is performed in step 267 where the energy, E_n, is computed as follows:

where L represents the number of Bark frequency band. In the preferred embodiment L = 18 is used.

[0036] In step 269, the gain normalization factor, Z, is updated if the following conditions are met: Φ > τ and E_n>T_n. The gain normalization factor, Z, is computed as follows:

where σ < 1. It is important to note that while this is the preferred method other gain normalization methods may be used.

[0037] In step 271, it is determined if the condition E_n < T_n is met. If yes, T_n is updated in step 273. T_n is computed as follows:

where

is the T_n from the previous run where V < 1. It is important to note here that the noise threshold, T_n, is initially estimated during the silence period. It is computed as follows:

where θ is chosen between the range 1.125 and 1.25.

[0038] In step 275, it is determined whether Φ > τ and E_n > T_n. In the preferred embodiments, τ = 0.65, though a different value may be optimal for τ under different configuration, e.g., different microphone set-up. If the condition in step 275 is met, the transfer function H(b) is updated from its initial value of 1 in step 280. If the condition in step 275 is not met, then the step 285 is performed without updating the H(b). The H(b) is calculated as follows:

where α < 1
In step 285, P_y(b) is normalised by H(b) as follows:

A buffer is provided to store M old values of P̃_y (b). In step 295, the total echo power is computed as follows:

W_i is a weighting fraction and its value depends on the echo path characteristics. The typical values are listed in Table 3. In step 300, the value of G_s is found by referring to lookup Table 1 using the current value for Φ.

[0039] In step 310, R_rpr(b) is computed as follows:

where

where γ is a smooth factor with γ < 1 (γ ≈ 0.02) and
P̅_rpo initially has a value of 0.
and

where if P_rpo(b) < 0 then P_rpo(b) = 0
and

In step 315, L(b) is computed as follows:

In step 320, a look-up Table 2 is used to find a value for L̅_b(b). In step 325, the gain G(b) is computed as follows:

In step 330, P̅_rpo is computed as follows:

After the step 330, the steps 310, 315, 320, 325 and 330 are repeatedly performed for each sample block of input, each loop producing an updated value for the parameters involved.

[0040] In step 335, the G(b) is unwarped to produce G(f). The output spectrum is then computed in step 340 as follows:

In step 345 the well-known inverse FFT (IFFT) and overlap add are performed on X̃(f) and to produce an echo-free signal X̃(t).

[0041] It is very important for one of ordinary skilled in the art to understand that many of the steps and/or components of the preferred embodiment of the echo canceller of embodiments of the present invention are included as a way of optimizing the performance of the canceller, and, therefore, may be substituted or even eliminated in some instances without negating the function and the purpose of the present invention. In addition, although the preferred embodiment of the present invention was described in the context of a teleconferencing system, it is clear that the present echo canceller may be used in other telecommunications systems where echoes are present in the similar manner as the scenarios described herein. While one skilled in the art could certainly appreciate these principles, some examples will be given for illustration purposes.

[0042] For instance, in referring to FIG. 2 and FIG. 3, the cross-correlation estimation technique employed here may be substituted with other techniques for determining the correction between two signals. Also, the amplitude normalisation, the use of Hanning Window and Bark Scales, while contributing to the effectiveness of the preferred embodiment of the present invention, may be eliminated under some circumstances without completely negating the function of the present invention. The Bark Scales, for instance, are used in this case as way of reducing computation time and, therefore, may not unduly affect the performance of the present echo canceller. In addition, although the Hanning Window was found to be optimal in this case, it may be replaced with other windows. Similarly, while the choice to take 256 overlap 128 samples in 16 ms intervals was found to be optimal in this case, other sample sizes and intervals may be chosen. The presently disclosed embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims.

Claims

1. An acoustic echo canceller (1) for a telecommunications system adapted for communication between a far-end speaker and a near-end speaker (30), said system having a microphone (25) and a loud-speaker (23), said microphone (25) receiving a speech signal from said near-end speaker (30) and an echo signal of a speech signal from said far-end speaker emanating from said loud-speaker, (23) said echo canceller (1) comprising:

a means adapted for collecting samples of two inputs from said telecommunications system, x(t) and y(t), said y(t) being said speech signal from said far-end speaker, said x(t) being a combination of said speech signal from said near-end speaker and said echo signal of said y(t);

a means (100) adapted for estimating a correlation between the x(t) and y(t);

a means (120) adapted for estimating a delay between the x(t) and y(t);

a means (130) adapted for aligning x(t) and y(t) by time-shifting y(t) by said delay between x(t) and y(t);
said echo canceller characterized in that it comprises:

a means adapted for transforming the x(t) and time shifted y(t) signals into their spectrums in a frequency domain;

a means (170) adapted for estimating a transfer function using the spectrum of x(t) and the spectrum of time-shifted y(t) and adapted for normalising the spectrum of time-shifted y(t) using the transfer function;

a means (175) adapted for estimating a gain function using the spectrum of x(t) and the normalised spectrum of time-shifted y(t);

a means adapted for multiplying the spectrum of x(t) by said gain function; and a means adapted for transforming the resulting spectrum into a time domain signal producing an echo-free signal.

2. The echo canceller (1) as claimed in claim 1 further comprising a means adapted for applying a Hanning Window (150, 151).

3. The echo canceller (1) as claimed in claim 1 or 2 further comprising a means (165, 166) adapted for Bark Scale Warping and means (180) adapted for Bark Scale Unwarping.

4. The echo canceller (1) as claimed in any preceding claim wherein 256 samples are collected overlapping 128 samples in 16 ms block intervals.

5. The echo canceller (1) as claimed in any preceding claim further comprising a means (195) adapted for overlap add.

6. A method of cancelling an acoustic echo in a telecommunications system adapted for communication between a far-end speaker and a near-end speaker (30), said system having a microphone (25) and a loud-speaker (23), said microphone (25) receiving a speech signal from said near-end speaker (30) and an echo signal of a speech signal from said far-end speaker emanating from said loud-speaker (23), said method comprising the steps of:

a) collecting (205) samples of two inputs from said telecommunications system, x(t) and y(t), said y(t) being said speech signal from said far-end speaker, said x(t) being a combination of said speech signal from said near-end speaker (30) and said echo signal of said y(t);

b) estimating (220) a correlation between the x(t) and y(t);

c) estimating (230) a delay between the x(t) and y(t);

d) aligning (245) x(t) and y(t) by time-shifting y(t) by said delay estimated in step c);
said method characterized in that it comprises the following steps:

e) transforming (255) the x(t) and time-shifted y(t) signals into their spectrums, in a frequency domain;

f) estimating a transfer function using the spectrum of x(t) and the spectrum of time-shifted y(t);

g) normalising the spectrum of time-shifted y(t) using the transfer function;

h) estimating (325) a gain function using the spectrum of x(t) and the normalised spectrum of time-shifted y(t)

i) multiplying (340) the spectrum of x(t) by said gain function; and

j) transforming (345) said resulting spectrum of step i) into a time domain signal producing an echo-free signal.

7. The method as claimed in claim 6 further comprising the step of applying (255) a Hanning Window after step d).

8. The method as claimed in claim 6 or 7 further comprising the step of Bark Scale Warping (265) said spectrums after step e) and Bark Scale Unwarping (335) after step h).

9. The method as claimed in any of claims 6 to 8 wherein said 256 samples are collected overlapping 128 samples in 16 ms block intervals.

10. The method as claimed in any of claims 6 to 9 further comprising the step of performing overlap add (345).

11. The acoustic echo canceller (1) as claimed in any of claims 1 to 5, wherein

the means adapted for transforming the signals into their spectrum further comprises

a means (140) adapted for normalizing an amplitude of x(t) and time-shifted y(t);

a means (150) adapted for applying a Hanning Window and transforming said normalised x(t) signal and normalised time-shifted y(t) signal into a frequency domain where X(f) = x_r(f) + jx_i(f) and Y(f)=y_r(f)+jy_i(f) and where X(f)=FFT(x) and Y(f)=FFT(y);

a means (155, 156) adapted for computing a power spectrum P_x and P_y where P_x = |x_r(f)|+|x_t(f)| + ε|x_r(f)||x_i(f)| and
P_y = |y_r(f)|+|y_i(f)|+ ε|y_r(f)||y_i(f)| where ε is a scaling factor which controls
the amount of echo and noise to be suppressed;

a means (165,166) for Bark-scale warping P_x and P_y to yield bark scales P_s(b) and P_y(b);

the means adapted for estimating the transfer function further comprises

a means (170) adapted for estimating a transfer function H(b) and normalizing P_y(b) by said transfer function;

the means for estimating the gain function further comprises

a means (175) adapted for estimating a gain function G(b), said G(b) being calculated from P_x(b) and P_y(b);

the means adapted for multiplying further comprises

a means (180) adapted for Bark-scale uwarping said G(b) to yield G(f);

a means adapted for multiplying said signal X(f) by said gain function G(f) to yield X̃(f); and

the means adapted for transforming into a time-domain signal further comprises

a means (190) adapted for performing an inverse transform and overlap add (195) to convert said signal X̃(f) to yield an echo-free signal X̃(t).

12. The acoustic echo canceller as claimed in claim 13 further comprising a means adapted for updating said delay, said delay being updated based on a value of said correlation between x(t) and y(t).

13. The acoustic echo canceller as recited in Claim 11 or 12 wherein said H(b) is calculated as follows:

14. The acoustic echo canceller as recited in any of Claims 11 to 13 wherein

where L̅_(b) is a factor which varies with respect to L_b and L_b = R_rpr(b)R_po(b);
where

where

γ is a smooth factor with γ < 1;

and P̅_rpo initially has a value of 0;

where if P_rpo(b) < 0 then P_rpo(b) = 0
and

where P̅_y(b) is normalised Bark-scale P_y(b);
where G_s is a factor which varies with respect to coherence factor Φ and

Ansprüche

1. Akustischer Echokompensator (1) für ein Telekommunikationssystem, das für eine Kommunikation zwischen einem Sprecher an einem fernen Ende und einem Sprecher (30) an einem nahen Ende ausgelegt ist, wobei das System ein Mikrophon (25) und einen Lautsprecher (23) aufweist, wobei das Mikrophon (25) ein Sprachsignal von dem Sprecher (30) an dem nahen Ende und ein Echosignal eines Sprachsignals von dem Sprecher an dem fernen Ende erhält, das von dem Lautsprecher (23) stammt, wobei der Echokompensator (1) umfaßt:

ein Mittel, das zum Sammeln von Proben zweier Eingaben x(t) und y(t) von dem Telekommunikationssystem ausgelegt ist, wobei y(t) das Sprachsignal von dem Sprecher an dem fernen Ende ist und wobei x(t) eine Kombination des Sprachsignals des Sprechers an dem nahen Ende und dem Echosignal von y(t) ist;

ein Mittel (100), das ausgelegt ist, um eine Korrelation zwischen x(t) und y(t) zu berechnen;

ein Mittel (120), das ausgelegt ist, um eine Verzögerung zwischen x(t) und y(t) zu berechnen;

ein Mittel (130), das ausgelegt ist, um x(t) und y(t) durch Zeitversetzung durch die Verzögerung zwischen x(t) und y(t) auszurichten;

wobei der Echokompensator dadurch gekennzeichnet ist, daß er umfaßt:

ein Mittel, das ausgelegt ist, das x(t)-Signal und das zeitversetzte y(t)-Signal in deren Spektren in einem Frequenzbereich zu transformieren;

ein Mittel (170), das ausgelegt ist, eine Transferfunktion unter Verwendung des Spektrums von x(t) und des Spektrums des zeitversetzten y(t) zu berechnen, und das ausgelegt ist, das Spektrum des zeitversetzten y(t) unter Verwendung der Transferfunktion zu normieren;

ein Mittel (175), das ausgelegt ist, um eine Verstärkungsfunktion unter Verwendung des Spektrums x(t) und des normierten Spektrums des zeitversetzten y(t) zu berechnen;

ein Mittel, das ausgelegt ist, das Spektrum von x(t) durch die Verstärkungsfunktion zu vervielfachen;

und ein Mittel, das ausgelegt ist, das resultierende Spektrum in ein Zeitbereichssignal zu transformieren, das ein Echo freies Signal herstellt.

2. Echokompensator (1) nach Anspruch 1, des weiteren ein Mittel zur Anwendung eines Hann-Fensters (150, 151) umfassend.

3. Echokompensator (1) nach Anspruch 1 oder 2, des weiteren ein Mittel (165, 166), das für Bark-Skala-Skalierung ausgelegt ist, und Mittel (180) umfassend, das für Bark-Skala-Reskalierung ausgelegt ist.

4. Echokompensator (1) nach einem der vorangehenden Ansprüche, wobei 256 Proben gesammelt werden, die 128 Proben in 16ms Blockintervallen überlappen.

5. Echokompensator (1) nach einem der vorangehenden Ansprüche, des weiteren umfassend ein Mittel (195), das für einen Überlappungszusatz ausgelegt ist.

6. Verfahren zum Löschen eines akustischen Echos in einem Telekommunikationssystem, das zur Kommunikation zwischen einem Sprecher an einem fernen Ende und einem Sprecher (30) an einem nahen Ende ausgelegt ist, wobei das System ein Mikrophon (25) und einen Lautsprecher (23) aufweist, wobei das Mikrophon (25) ein Sprachsignal des Sprechers (30) an dem nahen Ende und ein Echosignal des Sprechers an dem fernen Ende erhält, welches von dem Lautsprecher (23) stammt, wobei das Verfahren die Schritte umfaßt:

(a) Sammeln (295) von Proben zweier Eingaben, x(t) und y(t), von dem Telekommunikationssystem, wobei y(t) das Sprachsignal des Sprechers an dem fernen Ende ist, wobei x(t) eine Kombination des Sprachsignals von dem Sprecher (30) an dem nahen Ende und des Echosignals von dem y(t) ist;

(b) Berechnen (220) einer Korrelation zwischen x(t) und y(t);

(d) Ausrichten (245) von x(t) und y(t) durch Zeitversetzen von y(t) durch die in Schritt c) berechnete Verzögerung;

wobei das Verfahren dadurch gekennzeichnet ist, daß es die folgenden Schritte umfaßt:

(e) Transformieren (255) des x(t)-Signals und des zeitversetzten y(t)-Signals in deren Spektren in einem Frequenzbereich;

(f) Berechnen einer Transferfunktion unter Verwendung des Spektrums von x(t) und des Spektrums des zeitversetzten y(t);

(g) Normieren des Spektrums des zeitversetzten y(t) unter Verwendung der Transferfunktion;

(h) Berechnen (325) einer Verstärkungsfunktion unter Verwendung des Spektrums von x(t) und des normierten Spektrums des zeitversetzten y(t);

(i) Vervielfachen (340) des Spektrums von x(t) durch die Verstärkungsfunktion; und

(j) Transformieren (345) des resultierenden Spektrums des Schritts i) in ein Zeitbereichssignal, das ein Echo freies Signal erzeugt.

7. Verfahren nach Anspruch 6, des weiteren umfassend den Schritt des Anwendens (255) eines Hann-Fensters nach Schritt d).

8. Verfahren nach Anspruch 6 oder 7, des weiteren umfassend den Schritt der Bark-Skala-Skalierung (265) des Spektrums nach Schritt e) und der Bark-Skala-Reskalierung (335) nach Schritt h).

9. Verfahren nach einem der Ansprüche 6 bis 8, wobei die 256 Proben gesammelt werden, die 128 Proben in 16ms Blockintervallen überlappen.

10. Verfahren nach einem der Ansprüche 6 bis 9, des weiteren den Schritt des Ausführens eines Überlappungszusatzes umfassend.

11. Akustischer Echokompensator (1) nach einem der Ansprüche 1 bis 5, wobei das Mittel, das zum Transformieren der Signale in deren Spektren ausgelegt ist, des weiteren umfaßt:

ein Mittel (140), das ausgelegt ist, um eine Amplitude von x(t) und des zeitversetzten y(t) zu normieren;

ein Mittel (150), das ausgelegt ist, ein Hann-Fenster anzuwenden und das normierte x(t)-Signal und das normierte zeitversetzte y(t)-Signal in einen Frequenzbereich zu transformieren, wo X(f)=x_r(f) + jx_i(f) und Y(f)=y_r(f) + jy_i(f) und wo X(f)=FFT(x) und Y(f)=FFT(y);

ein Mittel (155, 156), das ausgelegt ist, ein Leistungsspektrum P_x und P_y zu berechnen, wo P_x = |x_r(f)|+|x_i(f)|+ε|x_r(f)||x_i(f)| und P_y =|y_r(f)|+|y_i(f)|+ε|y_r(f)||y_i(f)|, wobei ε ein Skalierungsfaktor ist, welcher die Menge des zu unterdrückenden Echos und des Rauschens steuert;

ein Mittel (165, 166) zur Bark-Skala-Skalierung von P_x und P_y, um P_x(b) und P_y(b) zu erhalten;

wobei das Mittel, das zur Berechnung der Transferfunktion ausgelegt ist, des weiteren umfaßt:

ein Mittel (170), das zur Berechnung einer Transferfunktion H(b) und zur Normierung von P_y(b) durch die Transferfunktion ausgelegt ist;

wobei das Mittel zur Berechnung der Verstärkungsfunktion des weiteren umfaßt:

ein Mittel (175), das zur Berechung einer Verstärkungsfunktion G(b) ausgelegt ist, wobei G(b) von P_x(b) und P_y(b) berechnet wird;

wobei das Mittel zur Vervielfachung des weiteren umfaßt:

ein Mittel (180), das zur Bark-Skala-Skalierung von G(b) ausgelegt ist, um G(f) zu erhalten;

ein Mittel, das zur Vervielfachung des Signals X(f) durch die Verstärkungsfunktion G(f) ausgelegt ist, um X̃(f) zu erhalten; und

wobei das Mittel, das zum Transformieren in ein Zeitbereichssignal ausgelegt ist, des weiteren umfaßt:

ein Mittel (190), das zum Ausführen eines inversen Transformierungs- und Überlappungszusatzes ausgelegt ist, um das Signal X̃(f) umzuwandeln, um ein Echo freies Signal X̃(t) zu erhalten.

12. Akustischer Echokompensator nach Anspruch 11, des weiteren umfassend ein Mittel, das ausgelegt ist, um die Verzögerung zu aktualisieren, wobei die Verzögerung basierend auf einem Wert der Korrelation zwischen x(t) und y(t) aktualisiert wird.

13. Akustischer Echokompensator nach Anspruch 11 oder 12, wobei H(b) wie folgt berechnet wird:

wobei α < 1.

14. Akustischer Echokompensator nach einem der Ansprüche 11 bis 13, wobei

wo L̅_(b) ein Faktor ist, der hinsichtlich L_b variiert, wobei L_b = R_rpr(b)R_po(b);
wo

γ ist ein Glättungsfaktor mit γ <1;

und P̅_rpo hat anfänglich einen Wert von 0;

wo, wenn P_rpo(b) <0, dann P_rpo(b)=0
und

wo P̅_y(b) eine normierte Bark-Skala P_y(b) ist;
wo G_s ein Faktor ist, der hinsichtlich des Koheränzfaktors Φ variiert, wobei

Revendications

1. Annuleur (1) d'écho acoustique pour un système de télécommunication apte à communiquer entre un haut parleur éloigné et un haut parleur proche, dit système ayant un microphone (25) et un haut parleur (23), le dit microphone recevant un signal de parole du haut parleur (30) proche, et un signal d'écho d'un signal de parole du haut parleur éloigné, émanant du dit haut parleur (23), le dit annuleur (1) d'écho comprenant :

- des moyens aptes à réceptionner des échantillons de deux entrées provenant du système de télécommunication, x(t) et y(t), le dit y(t) étant le dit signal de parole provenant du dit haut parleur éloigné, le dit x(t) étant une combinaison du dit signal de parole du haut parleur proche et du dit signal d'écho de y(t) ;

- des moyens (100) aptes à estimer une corrélation entre le x(t) et le y(t) ;

- des moyens (120) aptes à estimer un retard entre le x (t) et le y(t) ;

- des moyens (130) aptes à aligner x(t) et y(t) par décalage de temps de y (t) par le dit retard entre x(t) et y(t) ;

Le dit annuleur d'écho étant caractérisé en ce qu'il comporte :

- un moyen apte à transformer les signaux x(t) et décalés en temps y(t) en utilisant leur spectre dans un domaine de fréquence ;

- des moyens (170) aptes à estimer une fonction de transfert en utilisant le spectre de x(t) et le spectre y(t) décalé dans le temps, et apte à normaliser le spectre de y(t) H décalé dans le temps, en utilisant la fonction de transfert ;

- un moyen (175) apte à estimer une fonction de gain utilisant le spectre de x(t) et le spectre normalisé du y(t) décalé dans le temps ;

- des moyens aptes à multiplier le spectre de x(t) par la dite fonction de gain ;

- un moyen apte à transformer le spectre résultant en un signal en domaine de temps produisant un signal libre d'écho.

2. Annuleur d'écho selon la revendication 1, caractérisé en ce qu'il comporte en outre des moyens aptes à appliquer une fenêtre de Hanning (150, 151).

3. Annuleur d'écho selon l'une des revendications 1 ou 2, caractérisé en ce qu'il comporte en autre un moyen (165, 166) de distorsion d'échelle de Bark et des moyens (180) de redressement de distorsion d'échelle de Bark.

4. Annuleur d'écho selon l'une des revendications précédentes, caractérisé en ce que 256 échantillons sont recueillis et recouvrant 128 échantillons dans des intervalles de blocs de 16ms.

5. Annuleur d'écho selon l'une des revendications précédentes, caractérisé en ce qu'il comporte en outre un moyen (195) apte à l'addition de recouvrement.

6. Procédé pour annuler un écho acoustique dans un système de télécommunication apte à la communication entre un haut parleur éloigné et un haut parleur proche (30), dit système comprenant un microphone (25) et un haut parleur (23), le dit microphone (25) recevant un signal de parole du dit haut parleur (30) proche et un signal d'écho du signal de parole du haut parleur proche, émanant du dit haut parleur (23), dit procédé comprenant les étapes de :

a) recueillir (205) des échantillons de deux entrées provenant du système de télécommunication, x(t) et y(t), le dit y(t) étant le dit signal de parole du dit haut parleur éloigné, le dit x(t) étant une combinaison du dit signal de parole du haut parleur proche (30) et du dit signal d'écho du dit y(t) ;

b) estimer (220) une corrélation entre le x(t) et y(t) ;

c) estimer (230) un retard entre le x(t) et y(t) ;

d) aligner (245) x(t) et y(t) par décalage de temps de y(t) du dit retard estimé dans l'étape c) ;

Le dit procédé étant caractérisé en ce qu'il comporte les étapes suivantes :

e) transformer (255) les signaux x(t) et y(t) décalés dans le temps en leurs spectres H dans un domaine de fréquence ;

f) estimer une fonction de transfert en utilisant le spectre de y(t) et spectre de y(t) décalé dans le temps ;

g) normaliser le spectre de y(t) décalé dans le temps en utilisant la fonction de transfert ;

h) estimer (325) une fonction de gain H en utilisant le spectre de y(t) et le spectre normalisé de y(t) décalé dans le temps ;

i) multiplier (340) le spectre de y(t) par la dite fonction de gain ;

j) transformer (345) le dit spectre résultant de l'étape i) en un signal de domaine de temps produisant un signal libre d'écho.

7. Procédé selon la revendication 6, caractérisé en ce qu'il comporte en outre l'étape d'appliquer (255) une fenêtre de Hanning après l'étape d).

8. Procédé selon l'une des revendications 6 ou 7, caractérisé en ce qu'il comporte en outre l'étape de distorsion d'échelle de Bark (265) aux dits spectres après l'étape e) et de redressement de distorsion (355) après l'étape h).

9. Procédé selon l'une des revendications 6 à 8, caractérisé en ce que 256 échantillons sont recueillis recouvrant 128 échantillons dans des intervalles de blocs de 16ms.

10. Procédé selon l'une des revendications 6 à 9, caractérisé en ce qu'il comporte en outre l'étape de réaliser des ajouts de recouvrement (345).

11. Annuleur d'écho acoustique selon l'une des revendications 1 à 5, caractérisé en ce que les moyens aptes à transformer les signaux en leurs spectres comportent en outre :

- un moyen (140) apte à normaliser une amplitude de x(t) et de y(t) décalé dans le temps ;

- un moyen (150) apte à appliquer une fenêtre de Hanning et transformer le dit signal normalisé x(t) et le signal normalisé y(t) décalé dans le temps en un domaine de fréquence où X(f)=x_r(f)+jx_i(f) et Y(f) = y_r(f) + jy_i(f) et où X(f) = FFT(x) et Y(f) = FFT(y);

- un moyen (155,156) aptes à calculer un spectre de puissance P_x et P_y où P_x = |x_r(f)|+|x_i(f)|+ε|x_r(f)||x_i(f)| et P_y = |y_r(f)|+|y_i(f)| + ε|y_r(f)||y_i(f)|

où ε est un facteur d'échelle qui contrôle la quantité d'écho et de bruit destinée à être supprimée ;

- un moyen (165, 166) pour distorsion d'échelle de Bark P_x et P_y pour redresser la distorsion d'échelle de Bark P_x(b) et P_y(b);

Or des moyens aptes à l'estimation de la fonction de transfert comportent en outre.

- un moyen (170) apte à estimer une fonction de transfert H(b) normalisé P_y (b) par la dite fonction de transfert ;

Le moyen pour estimer la fonction de gain comporte en outre.

- un moyen (175) apte à estimer une fonction de gain G (b), le dit G (b) étant calculé à partir de P_x(b) et P_y(b) ;

- les moyens aptes à multiplier comportent en outre :

- un moyen (180) apte à redresser la distorsion d'échelle de Bark du dit G (b) pour produire G (f) ;

- un moyen apte à multiplier le dit signal X (f) par la dite fonction de gain G (f) pour produire X̃(f) ; et

les moyens aptes à transformer en un signal à domaine de temps comportent en outre un moyen (190) apte à réaliser une transformation inverse et un ajout de recouvrement (195) pour convertir le dit signal X̃(f) pour produire un signal X̃ (t) libre d'écho.

12. Annuleur d'écho acoustique selon la revendication 11, caractérisé en ce qu'il comporte en outre un moyen apte à mettre à jour le dit retard, le dit retard étant mis à jour sur la base d'une valeur de la dite corrélation entre x (t) et y (t).

13. Annuleur d'écho selon l'une des revendications 11 ou 12, caractérisé en ce que le dit H(b) est calculé comme suit :

14. Annuleur d'écho acoustique selon l'une des revendications 11 à 13, caractérisé en ce que G(b)= R_rpr(b)L̅_b(b) où L̅_(b) est un facteur qui varie par rapport à L_b=R_rpr(b)R_po(b);
Où

Où

γ est un facteur de lissage avec γ < 1;

et P̅_rpo initialement présente une valeur de 0 ;

Où P̅_y(b) est normalisé à l'échelle de Bark P_y(b);
Où G_s est un facteur qui varie par rapport au facteur de cohérence Φ et

Drawing