(19)
(11) EP 2 045 801 A1

(12) EUROPEAN PATENT APPLICATION

(43) Date of publication:
08.04.2009 Bulletin 2009/15

(21) Application number: 07019281.0

(22) Date of filing: 01.10.2007
(51) International Patent Classification (IPC): 
G10L 21/02(2006.01)
H04M 9/08(2006.01)
(84) Designated Contracting States:
AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR
Designated Extension States:
AL BA HR MK RS

(71) Applicant: Harman Becker Automotive Systems GmbH
76307 Karlsbad (DE)

(72) Inventors:
  • Schmidt, Gerhard
    89081 Ulm (DE)
  • Köpf, Hans-Jörg
    73340 Amstetten (DE)
  • Wirsching, Günther
    85072 Eichstätt (DE)

(74) Representative: Grünecker, Kinkeldey, Stockmair & Schwanhäusser Anwaltssozietät 
Leopoldstrasse 4
80802 München
80802 München (DE)

   


(54) Efficient audio signal processing in the sub-band regime


(57) The present invention relates to a method for audio signal processing, comprising dividing a microphone signal (y(n)) into microphone sub-band signals ysb(n); excising a predetermined number of the microphone sub-band signals ysb(n) for predetermined sub-bands; processing the remaining microphone sub-band signals (ysb,g(n)) to obtain enhanced microphone sub-band signals (ŝsb,g(n)); and reconstructing microphone sub-band signals for the predetermined sub-bands for which microphone sub-band signals were excised, wherein each of the excised microphone sub-band signals is reconstructed by means of enhanced microphone sub-band signals (ŝsb,g(n)) obtained by processing the remaining microphone sub-band signals (ysb,g(n)).




Description

Field of Invention



[0001] The present invention relates to audio signal processing and, in particular, to efficient echo compensation of a speech signal in the sub-band regime.

Background of the invention



[0002] Audio signal processing in large part concerns the enhancement of signals with respect to noise and echoes. Speech signal processing has often to be performed in a noisy background environment. A prominent example is hands-free voice communication in vehicles. Hands-free telephones provide a comfortable and safe communication systems of particular use in motor vehicles. In the case of hands-free telephones it is mandatory to suppress noise in order to guarantee the communication. The amplitudes and frequencies of the noise signals are temporally variable due to, for example, the speed of the vehicle and road noises.

[0003] Of particular importance is the suppression of signals of the remote subscriber which are emitted by the loudspeakers and therefore received again by the microphone(s), since otherwise unpleasant echoes can severely affect the quality and intelligibility of voice conversation. In the worst case, the acoustic feedback can even lead to a complete breakdown of communication, if the acoustic echoes are not significantly attenuated or substantially removed.

[0004] Echo suppression is particularly difficult, if the speaker using a microphone for communication with a remote communication party is moving as, e.g., a driver using a hands-free set who steers a wheel while communicating with a remote party by the hands-free telephone set. In this case, the impulse response of the loudspeaker-room-microphone (LRM) system is time-variant. Usually residual echoes are still present in the processed audio signals to be provided to a remote communication party. These residual echoes, e.g., result in so-called echo blips in hands-free telephone systems thereby deteriorating the microphone signal significantly, in particular, due to the huge delay of current mobile phone connections.

[0005] Several methods for echo compensation have been proposed and implemented in communication systems in recent years. Adaptive filters are employed for echo compensation of acoustic signals (see, e.g., E. Hänsler and G. Schmidt, "Acoustic Echo and Noise Control - A Practical Approach", John Wiley & Sons, New York, 2004) that are used to model the transfer function of the LRM system by means of an adaptive finite impulse response (FIR) filter.

[0006] In present echo compensation processing an adaptive filter is used to model the impulse response of the LRM system to generate an estimate for the echo signal that can be subtracted from the microphone signal. The adaptation of the echo compensation filtering means is usually carried out by the normalized least mean square (NLMS) algorithm.

[0007] In order to contain the need for high-performance computational means to a reasonable level, the signal processing is usually performed in a down-sampled sub-band regime in which the computational complexity, in principle, can be reduced as compared to the full-band processing.

[0008] The higher the down-sampling rate of the sub-band signals that are processed for echo compensation is selected the more the computation costs can be reduced. However, in the art the choice of an appropriate down-sampling factor is generally limited by the known problem of aliasing. Hann windows or other filters chosen show different aliasing characteristics. Artifacts increase with increasing down-sampling rate and, moreover, the echo damping rate is insufficient when the down-sampling rate exceeds some threshold.

[0009] Moreover, the frequency response of a Hann window is characterized by a significant overlap of sub-bands and, thus, adjacent pitch trajectories are sometimes hard to separate which is crucial for speech enhancement. The noise reduction in frequency ranges adjacent to a frequency ranges that are dominated by a wanted signal, e.g., are not sufficiently damped. In order to reduce the overlap the order of the DFT might be increased (e.g., from a standard of N = 256 to N = 512 nodes of the Fourier transform). The corresponding increase of the frequency resolution results, however, in a decrease in time resolution of the processed audio signal.

[0010] This may give raise to severe problems, since, e.g., the standards of the International Telecommunication Union and the European Telecommunication Standards Institute have to be met by any actual telephone equipment. For a sampling frequency of 11025 Hz, N = 512 results in a time delay that is not tolerable according to the above mentioned standards.

[0011] Therefore, despite the recent developments and improvements, effective echo compensation and noise reduction in speech signal processing proves still to be a major challenge. It is therefore the problem underlying the present invention to overcome the above-mentioned drawbacks and to provide a system and a method for audio signal processing, in particular, suitable for hands-free telecommunication system showing a more efficient processing for enhancing the quality of an audio signal, in particular, a speech signal.

Description of the Invention



[0012] The above-mentioned problem is solved by the method for audio signal processing, in particular, speech signal processing according to claim 1, the method comprising the steps of
dividing a microphone signal (y(n)) into microphone sub-band signals ysb(n);
excising a predetermined number of the microphone sub-band signals ysb(n) for predetermined sub-bands;
processing the remaining microphone sub-band signals (Ysb,g(n)), i.e. the microphone sub-band signals that are not excised in the preceding step, to obtain enhanced microphone sub-band signals (ŝsb,g (n)); and
reconstructing microphone sub-band signals for the predetermined sub-bands for which microphone sub-band signals were excised, wherein each of the excised microphone sub-band signals is reconstructed by means of enhanced microphone sub-band signals (ŝsb,g(n)) obtained by processing the remaining microphone sub-band signals (ysb,g(n)).

[0013] According to the disclosed method only part of the microphone sub-band signals that are obtained by a filter bank comprising high-pass, band pass and low-pass filters are processed for quality enhancement, e.g., noise reduction and echo compensation which are expensive processing operations. For instance, half of the microphone sub-band signals may be excised, e.g., all microphone sub-band signals for sub-bands with an even or an odd index. Therefore, the overall computational load, memory demand as well as computational time, can be reduced as compared to the art. Moreover, the disclosed method does not increase the cost of manufacture.

[0014] Besides excising microphone sub-band signals for alternating indices, microphone sub-band signals above or below a predetermined frequency threshold may be excised. In particular, microphone sub-band signals with alternating indices may be excised for a frequency range of the sub-band signals above or below a predetermined frequency threshold only. For instance, one might consider applying the above described method (comprising excising a predetermined number of microphone sub-band signals) only for a high frequency range (e.g., above some kHz, above 1 kHz, 1.5 kHz or 2 kHz) and using conventional signal processing for all of the obtained microphone sub-band signals in a low frequency range. Thereby, a variety of compromises between saving computational costs and achieving the best signal quality possible can be achieved.

[0015] It is expressly noted that the processing of the remaining microphone sub-band signals (ysb,g(n)) to obtain the enhanced microphone sub-band signals (ŝsb,g(n)) can comprise echo compensating of the remaining microphone sub-band signals (ysb,g(n)) and/or noise reduction of the remaining microphone sub-band signals (ysb,g(n)). In particular, the expensive process of echo compensation benefits from excising a predetermined number of microphone sub-band signals.

[0016] Echo compensated microphone sub-band signals can be advantageously further processed for noise reduction. Moreover, the microphone sub-band signals might be de-correlated by a time-invariant de-correlation filtering means (e.g., of the first or second order) or by an adaptive de-correlation means as known in the art in order to improve the convergence speed of the adaptation process of the filter coefficients of the employed echo compensation filtering means.

[0017] The remaining microphone sub-band signals (ysb,g(n)) can be echo compensated by steps comprising
dividing a reference signal x(n) into reference sub-band signals xsb(n);
excising a predetermined number of the reference sub-band signals xsb(n) that is equal to the predetermined number of excised microphone sub-band signals for the same predetermined sub-bands;
adapting filter coefficients of an echo compensating filtering means based on the remaining reference sub-band signals xsb,g(n) (and sub-band error signals); and
filtering the remaining microphone sub-band signals (ysb,g(n)) by means of the adapted filter coefficients for reducing echo contributions to the remaining microphone sub-band signals (ysb,g(n)) (i.e. subtracting estimated echo contributions from the remaining microphone sub-band signals (ysb,g(n)).

[0018] The reference signal x(n) is a signal received by a near party and input in a loudspeaker of a loudspeaker-room system. In particular, the reference signal represents a verbal utterance of the remote party and transmitted to the near party, e.g., by radio transmission.

[0019] An analysis filter bank for obtaining the reference sub-band signals xsb(n) might be used that is similar to the one used for dividing the microphone signal (y(n)) into microphone sub-band signals ysb(n). Both filter banks may comprise Hann or Hamming windows. When adapting the employed echo compensating means it is highly efficient to only consider reference sub-band signals for sub-bands that are remained for the microphone sub-band signals. If microphone sub-band signals with even sub-band indices are maintained (not excised), reference sub-band signals with the same even sub-band indices are maintained for adaptation of the echo compensating means, for example.

[0020] The microphone sub-band signals (ysb(n)) and the reference sub-band signals (xsb,g(n)) may be down-sampled with respect to the microphone signal (y(n)) and the reference signal (x(n)), respectively, by the same down-sampling factor (r). If, e.g., a Hann window is used for the analysis filter banks the length of the analysis filters has to be equal to the number of sub-bands M and a down-sampling factor of r = M / 4 is suitable. In fact, the set of reconstructed microphone sub-band signals for the predetermined sub-bands for which microphone sub-band signals were excised and the enhanced microphone sub-band signals (ŝsb,g(n)) are synthesized by a synthesis filter bank similar to the analysis filter bank used to divide the microphone signal into the microphone sub-band signals.

[0021] For a typical processing of the analysis and the synthesis filter bank by Discrete Fourier Transformation (DFT), for example, the lengths of the analysis and the synthesis filter banks are the same and equal to the number of sub-bands M. A down-sampling factor of r = M / 4, in principle, allows for perfect re-synthesis of the microphone sub-band signals.

[0022] By down-sampling the microphone sub-band signals with respect to the microphone signal detected by a microphone (sampling rate about 8 kHz, for example) the computational load is significantly reduced as known in the art.

[0023] According to a relatively simple embodiment of the inventive method at least at one time (n) the microphone sub-band signals for the predetermined sub-bands for which microphone sub-band signals were excised are reconstructed for the predetermined sub-bands by averaging the remaining microphone sub-band signals that are adjacent in time (n + k, n - k), where n is the discrete time index and k is an integer, k ≥ 1; k = 1, 2, ... Thus, by the term "adjacent" not only the closest adjacent signals (in time) are meant but also some finite number of neighbors can be included. For example, a reconstructed microphone sub-band signal at frequency bin j may be calculated by averaging enhanced (e.g., echo compensated) remained microphone sub-band signals at frequency bins j + 1 and j - 1. Averaging may include different weights (interpolation matrices) for the microphone sub-band signals at times n + 1, n and n - 1 (and further adjacent values, when used).

[0024] Such averaging is relatively straightforward and may be performed only in frequency regimes or for time frames where the inventive method has been applied. If a predetermined number of microphone sub-band signals is excised all over the set of sub-bands (µ = 1, .., M) averaging may be performed also all over the entire range of sub-bands. On the other hand, it may be preferred to reconstruct a part of the excised microphone sub-band signals only. In principle, reconstruction can be variably performed according to the actual application. As already mentioned above one might prefer to apply the herein disclosed method only to relatively high-frequency sub-bands. In this case averaging is only necessary in order to reconstruct microphone sub-band signals for the predetermined sub-bands for which microphone sub-band signals were excised in the high-frequency sub-band range.

[0025] According to a more elaborated embodiment interpolation is performed based on both adjacent and the maintained microphone sub-band signals at the time n at which microphone sub-band signals shall be reconstructed. In particular, the excised microphone sub-band signals at time n can be reconstructed by interpolation of remaining microphone sub-band signals at the time n and remaining microphone sub-band signals adjacent in time (one or more previous signal vector and subsequent signal vector). Accurate reconstruction with tolerable artifacts can thereby be achieved. Details are given in the detailed description below.

[0026] In order to guarantee a significant reduction of the need for computational resources it may be preferred that the above-mentioned interpolation is performed by interpolation matrices which are approximated by their main diagonals and secondary diagonals, respectively (for further details, see again description below).

[0027] The present invention also provides a computer program product, comprising one or more computer readable media having computer-executable instructions for performing the steps of one of the above examples of the herein disclosed method for audio signal processing.

[0028] Furthermore, it is provided a signal processing means, comprising
at least one microphone configured to obtain at least one microphone signal (y(n));
an analysis filter bank configured to divide the at least one microphone signal (y(n)) into microphone sub-band signals ysb(n);
a first filtering means configured to excise a predetermined number of the microphone sub-band signals ysb(n) for predetermined sub-bands;
a second filtering means configured to process the remaining microphone sub-band signals (ysb,g(n)), i.e. the microphone sub-band signals that are not excised in the preceding step, to obtain enhanced microphone sub-band signals (ŝsb,g(n));
processing means configured to reconstruct microphone sub-band signals for the predetermined sub-bands for which microphone sub-band signals were excised, wherein each of the excised microphone sub-band signals is reconstructed by means of enhanced microphone sub-band signals (ŝsb,g(n)) obtained by processing the remaining microphone sub-band signals (ysb,g(n)); and
a synthesis filter bank configured to synthesize the reconstructed and enhanced microphone sub-band signals (ŝsb (n)) obtained from the remaining microphone sub-band signals (ysb,g(n)) to obtain an enhanced microphone signal.

[0029] The set of reconstructed and enhanced microphone sub-band signals (ŝsb (n)) is obtained from (and include the) enhanced microphone sub-band signals (ŝsb,g(n)) obtained from the remaining microphone sub-band signals (ysb,g(n)); see detailed description below.

[0030] As described above particular sub-bands (microphone sub-band signals of particular sub-bands) might be excised (odd or even ones) all over the entire sub-band range or for a particular range (e.g., a high-frequency range) only. The first filtering means can be part of the analysis filter bank.

[0031] In the signal processing means the second filtering means may, in particular, be an echo compensation filtering means; and the signal processing means may further comprise
an analysis bank configured to divide a reference signal into reference sub-band signals xsb(n);
a third filtering means (e.g., being integrated in the analysis filter bank) configured to excise a predetermined number of the reference sub-band signals xsb(n) that is equal to the predetermined number of excised microphone sub-band signals for the same predetermined sub-bands; and in addition
the echo compensation filtering means may be configured to be adapted based on the remaining reference sub-band signals xsb,g(n) (i.e. the filter coefficients of the adaptable echo compensation filtering means are adapted); and
to filter the remaining microphone sub-band signals (ysb,g(n)) by means of adapted filter coefficients.

[0032] The analysis filter banks (for dividing the microphone and reference signals into sub-band signals) are, in particular, configured to down-sample the sub-band signals by a factor r. The synthesis filter bank is, in particular, configured to up-sample the down-sampled reconstructed and enhanced microphone sub-band signals (ŝsb,g(n)) by the same factor as the down-sampling factor r.

[0033] According to an example of the signal processing means, the processing means is configured to reconstruct microphone sub-band signals for the predetermined sub-bands for which microphone sub-band signals were excised (excised microphone sub-band signals) at a particular time (n) by interpolation of remaining microphone sub-band signals at the particular time (n) and remaining microphone sub-band signals adjacent in time (directly adjacent or at some time n ± 1, and/or n ± 2, etc.).

[0034] Furthermore, the herein disclosed signal processing means according may further comprise a post-filtering means configured to filter the enhanced microphone sub-band signals (ysb,g(n)) in order to reduce background noise and/or residual echoes. The post-filtering means can be an adaptive noise reduction filter as known in the art, in particular, a Wiener filter.

[0035] The signal processing means according to one of the examples above is particularly useful for hands-free communication. Moreover, speech recognition can be improved by the herein disclosed speech signal processing. Thus, it is provided a hands-free telephone set or a speech recognition means comprising a signal processing means according to one of the examples described above.

[0036] Additional features and advantages of the present invention will be described with reference to the drawings. In the description, reference is made to the accompanying figures that are meant to illustrate preferred embodiments of the invention. It is understood that such embodiments do not represent the full scope of the invention.

[0037] Figure 1 is a flow chart illustrating basic steps of the inventive method for signal processing comprising the steps of excising a number of microphone sub-band signals before processing for enhancing the quality of the microphone signal and subsequently reconstructing the excised microphone sub-band signals.

[0038] Figure 2 illustrates an example for a realization of the herein disclosed method for signal processing including an echo compensation filter and a post-filtering means for noise reduction.

[0039] In Figure 1 basic steps of the herein disclosed method for signal processing in the sub-band regime are shown. A speech signal representing an utterance by a local speaker is detected by a microphone that generates a microphone signal 1. The microphone signal is filtered by an analysis filter bank 2 comprising low-pass, band pass and high-pass filters in order to obtain microphone sub-band signals. These microphone sub-band signals are subsequently further processed for enhancing the quality, e.g., by echo compensation, dereverberation and/or noise reduction.

[0040] The speech signal might be detected by a microphone array to obtain a number of microphone signals that might by processed by beamforming. In this case, the signal processing described in the following can be applied to each of the microphone signals obtained by the microphones of the microphone array.

[0041] It is noted that the analysis filter bank can be any one known in the art, e.g., a Discrete Fourier Transformation (DFT) or Discrete Cosine Transformation (DCT) filter or a Fast Fourier Transformation (FFT) filter. However, according to the present invention not all M microphone sub-band signals are used for the further processing, where M is the order of a DFT, DCT or FFT, for example, or the channel number of the analysis filter bank, in general. Thus, after the filtering of the microphone signal by the analysis filter bank 2 a predetermined number of microphone sub-band signals is excised 3. For instance, all microphone sub-band signals yµ with an odd index might be excised and only microphone sub-band signals yµ with an even index µ ∈ {0, 2, 4,..., M - 2} may be maintained (resulting in reducing the memory demand by half) and processed, e.g., for echo compensation 4.

[0042] Then, the previously excised sub-band signals (signal for sub-bands with an odd index, for instance, if microphone sub-band signals has been previously excised for such indices) are reconstructed 5 based on the echo compensated microphone sub-band signals. An enhanced microphone signal is eventually obtained by synthesizing 6 both the echo compensated sub-band signals and the reconstructed sub-band signals.

[0043] In the following the process of the reconstruction of the previously excised sub-band signals is described or the case that the signals for odd sub-bands are excised (resulting in half of the memory demand). For example, a windowed DFT is used for sub-band filtering by an analysis filter bank. Further, reconstruction is based on one previous and one following sub-band vector for exemplary purposes only. From the vector of the microphone signal y(n), where n is the discrete time index, a vector of some length M + 2r (r denotes the factor of down-sampling of the sub-band signals) is extracted


where the upper index T denotes the transposition operation. Windowing is performed by


where the diagonal coefficients go, .., gM-1, are the coefficients of the 0th prototype filter (see also below), e.g., a Hann window, of the analysis filter bank that is given by



[0044] The analysis filter banks may operate in the frequency (Ω) domain and the ideal frequency response of a prototype low-pass filter may be given by



[0045] The other filters (sub-band index µ = 1, .., M-1) can be obtained by frequency shifting as known to the skilled person (for details see, e.g., E. Hänsler and G. Schmidt, "Acoustic Echo and Noise Control - A Practical Approach", John Wiley & Sons, New York, 2004).

[0046] After supplementation of the window matrix F with M x r zeros (zero padding) on the left-hand and right-hand sides F0 = [0M×r F 0M×r] a windowed signal portion of the length M can be obtained by F0 y(n).

[0047] After transformation, e.g., by a DFT, the actual sub-band vector (at time n) is obtained. The DFT can be formulated by the transformation matrix



[0048] The sub-band signal shall be down-sampled by the factor r and, thus, the down-sampled sub-band signal at time n is obtained by



[0049] By means of the respective window matrices for the previous (n-1) and subsequent (n+1) sub-band vectors

the signal vectors

are obtained. In order to extract odd sub-band vectors only the matrix

is defined to obtain sub-bands for odd sub-band indices



[0050] Similarly extraction of sub-band signals with even indices results from

with the extraction matrix



[0051] Reconstruction of the odd sub-band vectors is achieved by interpolation of even sub-band vectors. At time n a reconstructed odd sub-band vector sb,u(n) is calculated from an actual (time is n) even sub-band signal vector ysb,g(n) and a previous (time is n - 1) and a sub-sequent (time is n + 1) even sub-band vector

with the interpolation matrices C1, C0, C-1. In principle, averaging by more than two adjacent (in time) signal vectors can be performed, e.g.,



[0052] With the above expression for sub-band signals with even indices one gets for the interpolation with C1, C0, and C-1:



[0053] This expression can concisely be represented by

with Cges = [C1 C0 C-1] and the block diagonal matrix

and the total window matrix



[0054] In order to find an optimal reconstruction for the previously excised sub-band vector the L2 - norm of the difference vector

has to be minimized. The minimization can readily be achieved in a sufficiently good approximation by determining Cges such that each row of the matrix

has a minimal L2 - norm. This can be achieved by means of the Moore-Penrose-pseudo inverse

of the matrix

Thus, cges can be expressed by

Under the assumption that

is invertable (where the upper index H denotes the Hemite conjugate, i. e. the adjoint matrix) the Moore-Penrose-pseudo inverse can be calculated from



[0055] It should be noted, however, that a direct use of the interpolation matrices C-1 and C1 demands for high computer capacities. Therefore, it is preferred to approximate these matrices C-1 and C1 by their respective main and secondary diagonals.

[0056] If the output signals after echo compensation and/or other processing for noise reduction, dereverberation , etc., for the sub-bands that are not excised is denoted by ŝµ (n), where µ is the sub-band index, one obtains for all sub-bands (including the reconstruction of the previously excised sub-band vectors):

where Ck(n1, n2) denotes the element in the n1th row and the n2th column of the matrix Ck. This implies that even sub-bands are taken with a delay of one time increment (n-1).

[0057] It should be noted that reconstruction of a previously excised sub-band signal can be based on more than one preceding and subsequent sub-band signal (n-1 and n+2). In particular, a different number of preceding and subsequent sub-band signals may be used for the interpolation (C00).

[0058] Synthesization of the thus obtained enhanced microphone sub-band signals in step 6 of Figure 1 is performed by means of a synthesis filter bank gµ,syn = └gµ,0,syn,...,gµ,Nsyn-1.synT comprising, e.g., Hann windows, and also up-sampling the previously down-sampled sub-band signals again (by the same factor r above that was used for the down-sampling).

[0059] Figure 2 shows an example of a realization of the herein disclosed method for signal processing. Consider a telephone conversation between a remote party and a near party that makes use of a hands-free set comprising a loudspeaker and a microphone. A signal from the remote party x(n) (reference signal) is received on the near side. The communication room of the near speaker, e.g., a vehicular compartment, represents a loudspeaker-room-microphone (LRM) system characterized by an impulse response h(n). The microphone of the LRM system is intended to detect a speech signal s(n) of the near side speaker. However, the microphone also detects background noise b(n) and an echo contribution d(n) caused by the loudspeaker output. The microphone signal generated by the microphone is thus given by y(n)= s(n) + b(n) + d(n).

[0060] The signal processing for quality enhancement in the present invention is performed in the sub-band regime. Thus, the microphone signal y(n) is filtered by an analysis filter bank gµ,ana 11 (s. above) in order to obtain microphone sub-band signals ysb(n). A particular number of sub-band signals is excised by a filtering means 12, e.g., each second sub-band signal, more particularly, sub-band signals of odd sub-bands. The resulting set of remaining sub-band signals of even sub-bands ysb, g (even index of the sub-bands) is subject to filtering for enhancing the signal quality. For example, noise reduction by means of a Wiener filter might be performed.

[0061] In the example shown in Figure 2, the microphone sub-band signals of even sub-bands ysb, g (n) are filtered by an echo compensation filter

of length N (number of filter coefficients for each sub-band µ) for modeling the impulse response of the LRM. For the echo compensation filter that shall estimate the impulse response, in principle, an infinite impulse response filter (IIR) or an adaptable finite impulse response filter (FIR) as known in the art may be used. For stability reasons an FIR filter is preferred. Given a typical sampling frequency of a speech signal some 256/r to some 1000/r filter coefficient are to be employed (with r denoting the factor of down-sampling of the sub-band signals).

[0062] In general, typical adaptation methods are iterative methods, i.e., in full band

the normalized least mean square (NLMS) algorithm

with the vector of the reference signal x(n) = [x(n), x(n-1),...,x(n-N +1)]T and the error signal e(n) representing the difference of the microphone signal and the output of the echo compensation filter 13



[0063] The corrector step is adjusted by means of the real number κ.

[0064] Accordingly, in the sub-band regime the normalized least mean square (NLMS) algorithm reads

(where the asterisk denotes the complex conjugate and κsb(n) adjusts the corrector step) with the vector of the reference signal

and


where the upper index H denotes the Hermitian adjugate.

[0065] Since the filtering means 12 only outputs signals for even sub-bands, the echo compensation filter 13 also has only to operate for such sub-bands. Therefore, the reference signal x(n) is input in an analysis filter bank 14 similar to the analysis filter bank gµ,ana 11 used for dividing the microphone signal y(n) into sub-bands. From the resulting reference sub-band signals xsb(n) odd ones are excised as in the case of the microphone sub-band signals. Thus, the echo compensation filter 13 only receives even reference sub-band signals that are output by the filtering means 15 used to excise the other reference sub-band signals. For even sub-bands only error signals esb,g(n) are obtained that represent echo compensated microphone sub-band signals.

[0066] It should be noted that both the microphone sub-band signals and the reference signal sub-band signals are down-sampled by a factor r. For example, the spectra of the down-sampled reference sub-band signals are given by

for each sub-band µ.

[0067] The higher the rate r is chosen, the more the computational load is reduced. However, due to the finite-slope filter flanks, r = M is an upper limit for the sampling rate r (where M is the number of sub-bands, i.e. number of channels of the analysis filter banks 11 and 14).

[0068] In the example shown in Figure 2, the error signals esb,g(n) are further processed for noise reduction and reduction of residual echoes due to imperfect adaptation of the echo compensation filter 13 by a post-filtering means 16, e.g., a Wiener filter. The filter characteristics of the Wiener filter is adapted on the basis of the estimated auto power density of the error signals esb,g(n) and the perturbation that is still present in the error signals esb,g(n), i.e. the echo compensated microphone sub-band signals, in form of background noise and residual echoes (for details see, e.g., E. Hänsler and G. Schmidt, "Acoustic Echo and Noise Control - A Practical Approach", John Wiley & Sons, New York, 2004).

[0069] The enhanced sub-band signals ŝsb,g(n) are transferred from the post-filtering means 16 to a processing means 17 for reconstructing sub-band signals for the previously excised odd sub-bands. Reconstruction can be done as described above with reference to the flow chart of Figure 1. The complete set of sub-band signals s̃sb(n) is input in a synthesis filter bank 18 corresponding to the analysis filter bank gµ,ana 14 used for the division of the microphone signal y(n) into the microphone sub-band signals ysb(n). Eventually, a full-band enhanced microphone signal s̃ (n) is obtained.

[0070] For a typical application including M = 256 sub-bands and down-sampling rates of r = 64 and r = 72, for example, according to the method of the present invention as described above both the computational time and memory demand can be reduced by about 50 % as compared to standard DFT processing. The time for signal processing (delay time) is only a few milliseconds above the time delay of standard processing by means of polyphase filter banks and, in particular, below the threshold according to the GSM standards of 39 ms in vehicular cabins. Moreover, the adaptation velocity of the echo compensation filter 13 is only slightly different from the one in methods known in the art.

[0071] All previously discussed embodiments are not intended as limitations but serve as examples illustrating features and advantages of the invention. It is to be understood that some or all of the above described features can also be combined in different ways.


Claims

1. Method for audio signal processing, comprising
dividing a microphone signal (y(n)) into microphone sub-band signals ysb(n);
excising a predetermined number of the microphone sub-band signals ysb(n) for predetermined sub-bands;
processing the remaining microphone sub-band signals (ysb,g(n)) to obtain enhanced microphone sub-band signals (ŝsb,g(n)); and
reconstructing microphone sub-band signals for the predetermined sub-bands for which microphone sub-band signals were excised, wherein each of the excised microphone sub-band signals is reconstructed by means of enhanced microphone sub-band signals (ŝsb,g(n)) obtained by processing the remaining microphone sub-band signals (ysb,g(n)).
 
2. Method according to claim 1, wherein the processing of the remaining microphone sub-band signals (ysb,g(n)) to obtain the enhanced microphone sub-band signals (ŝsb,g(n)) comprises echo compensating and/or noise reduction of the remaining microphone sub-band signals (ysb,g(n)).
 
3. Method according to claim 1 or 2, wherein the remaining microphone sub-band signals (ysb,g(n)) are echo compensated by steps comprising
dividing a reference signal (x(n)) into reference sub-band signals (xsb(n));
excising a predetermined number of the reference sub-band signals (xsb(n)) that is equal to the predetermined number of excised microphone sub-band signals for the same predetermined sub-bands;
adapting filter coefficients of an echo compensating filtering means based on the remaining reference sub-band signals (xsb,g(n)); and
filtering the remaining microphone sub-band signals (ysb,g(n)) by means of the adapted filter coefficients.
 
4. Method according to one of the preceding claims, wherein the predetermined number of sub-bands consists of sub-bands with an odd index or an even index and/or of sub-bands above or below some predetermined frequency.
 
5. Method according to one of the preceding claims, wherein the microphone sub-band signals (ysb(n)) and the reference sub-band signals (xsb,g(n)) are down-sampled with respect to the microphone signal (y(n)) and the reference signal (x(n)), respectively, by the same down-sampling factor (r).
 
6. Method according to one of the preceding claims, wherein at least at one time the microphone sub-band signals for the predetermined sub-bands for which microphone sub-band signals were excised are reconstructed for the predetermined sub-bands by averaging the remaining microphone sub-band signals that are adjacent in time.
 
7. Method according to one of the preceding claims, wherein microphone sub-band signals for the predetermined sub-bands for which microphone sub-band signals were excised are reconstructed at a particular time (n) by interpolation of remaining microphone sub-band signals at the particular time (n) and remaining microphone sub-band signals that are adjacent in time.
 
8. Method according to claim 7, wherein the interpolation is performed by interpolation matrices and wherein the interpolation matrices are approximated by their main diagonals and secondary diagonals, respectively.
 
9. Computer program product, comprising one or more computer readable media having computer-executable instructions for performing the steps of the method according to one of the claims 1 - 8.
 
10. Signal processing means, comprising
at least one microphone configured to obtain at least one microphone signal (y(n));
an analysis filter bank (11) configured to divide the at least one microphone signal (y(n)) into microphone sub-band signals ysb(n);
a first filtering means (12) configured to excise a predetermined number of the microphone sub-band signals ysb(n) for predetermined sub-bands;
a second filtering means (16) configured to process the remaining microphone sub-band signals (ysb,g(n)) to obtain enhanced microphone sub-band signals (ŝsb,g(n));
a processing means (17) configured to reconstruct microphone sub-band signals for the predetermined sub-bands for which microphone sub-band signals were excised by means of enhanced microphone sub-band signals (ŝsb,g(n)) obtained by processing the remaining microphone sub-band signals (ysb,g(n)); and
a synthesis filter bank (18) configured to synthesize the reconstructed and enhanced microphone sub-band signals (ŝsb (n)) obtained from the remaining microphone sub-band signals (ysb,g(n)) to obtain an enhanced microphone signal (s̃(n)).
 
11. The signal processing means according to claim 10, wherein the second filtering means (16) is an echo compensation filtering means; and further comprising
an analysis bank (14) configured to divide a reference signal into reference sub-band signals xsb(n);
a third filtering means (15) configured to excise a predetermined number of the reference sub-band signals xsb(n) that is equal to the predetermined number of excised microphone sub-band signals for the same predetermined sub-bands; and wherein
the echo compensation filtering means is configured to be adapted based on the remaining reference sub-band signals xsb,g(n) and to filter the remaining microphone sub-band signals (ysb,g(n)) by means of adapted filter coefficients.
 
12. The signal processing means according to claim 10 or 11, wherein the processing means (17) is configured to reconstruct microphone sub-band signals for the predetermined sub-bands for which microphone sub-band signals were excised at a particular time (n) by interpolation of remaining microphone sub-band signals at the particular time (n) and remaining microphone sub-band signals that are adjacent in time.
 
13. The signal processing means according to one of the claims 10 - 12, further comprising a post-filtering means (17) configured to filter the enhanced microphone sub-band signals (ŝsb,g(n)) in order to reduce background noise and/or residual echoes.
 
14. Hands-free telephone set or speech recognition means comprising a signal processing means according to one of the claims 10 - 13.
 




Drawing










Search report










Cited references

REFERENCES CITED IN THE DESCRIPTION



This list of references cited by the applicant is for the reader's convenience only. It does not form part of the European patent document. Even though great care has been taken in compiling the references, errors or omissions cannot be excluded and the EPO disclaims all liability in this regard.

Non-patent literature cited in the description