Field of Invention
[0001] The present invention relates to audio signal processing and, in particular, to efficient
echo compensation of a speech signal in the sub-band regime.
Background of the invention
[0002] Audio signal processing in large part concerns the enhancement of signals with respect
to noise and echoes. Speech signal processing has often to be performed in a noisy
background environment. A prominent example is hands-free voice communication in vehicles.
Hands-free telephones provide a comfortable and safe communication systems of particular
use in motor vehicles. In the case of hands-free telephones it is mandatory to suppress
noise in order to guarantee the communication. The amplitudes and frequencies of the
noise signals are temporally variable due to, for example, the speed of the vehicle
and road noises.
[0003] Of particular importance is the suppression of signals of the remote subscriber which
are emitted by the loudspeakers and therefore received again by the microphone(s),
since otherwise unpleasant echoes can severely affect the quality and intelligibility
of voice conversation. In the worst case, the acoustic feedback can even lead to a
complete breakdown of communication, if the acoustic echoes are not significantly
attenuated or substantially removed.
[0004] Echo suppression is particularly difficult, if the speaker using a microphone for
communication with a remote communication party is moving as, e.g., a driver using
a hands-free set who steers a wheel while communicating with a remote party by the
hands-free telephone set. In this case, the impulse response of the loudspeaker-room-microphone
(LRM) system is time-variant. Usually residual echoes are still present in the processed
audio signals to be provided to a remote communication party. These residual echoes,
e.g., result in so-called echo blips in hands-free telephone systems thereby deteriorating
the microphone signal significantly, in particular, due to the huge delay of current
mobile phone connections.
[0006] In present echo compensation processing an adaptive filter is used to model the impulse
response of the LRM system to generate an estimate for the echo signal that can be
subtracted from the microphone signal. The adaptation of the echo compensation filtering
means is usually carried out by the normalized least mean square (NLMS) algorithm.
[0007] In order to contain the need for high-performance computational means to a reasonable
level, the signal processing is usually performed in a down-sampled sub-band regime
in which the computational complexity, in principle, can be reduced as compared to
the full-band processing.
[0008] The higher the down-sampling rate of the sub-band signals that are processed for
echo compensation is selected the more the computation costs can be reduced. However,
in the art the choice of an appropriate down-sampling factor is generally limited
by the known problem of aliasing. Hann windows or other filters chosen show different
aliasing characteristics. Artifacts increase with increasing down-sampling rate and,
moreover, the echo damping rate is insufficient when the down-sampling rate exceeds
some threshold.
[0009] Moreover, the frequency response of a Hann window is characterized by a significant
overlap of sub-bands and, thus, adjacent pitch trajectories are sometimes hard to
separate which is crucial for speech enhancement. The noise reduction in frequency
ranges adjacent to a frequency ranges that are dominated by a wanted signal, e.g.,
are not sufficiently damped. In order to reduce the overlap the order of the DFT might
be increased (e.g., from a standard of N = 256 to N = 512 nodes of the Fourier transform).
The corresponding increase of the frequency resolution results, however, in a decrease
in time resolution of the processed audio signal.
[0010] This may give raise to severe problems, since, e.g., the standards of the International
Telecommunication Union and the European Telecommunication Standards Institute have
to be met by any actual telephone equipment. For a sampling frequency of 11025 Hz,
N = 512 results in a time delay that is not tolerable according to the above mentioned
standards.
[0011] Therefore, despite the recent developments and improvements, effective echo compensation
and noise reduction in speech signal processing proves still to be a major challenge.
It is therefore the problem underlying the present invention to overcome the above-mentioned
drawbacks and to provide a system and a method for audio signal processing, in particular,
suitable for hands-free telecommunication system showing a more efficient processing
for enhancing the quality of an audio signal, in particular, a speech signal.
Description of the Invention
[0012] The above-mentioned problem is solved by the method for audio signal processing,
in particular, speech signal processing according to claim 1, the method comprising
the steps of
dividing a microphone signal (y(n)) into microphone sub-band signals y
sb(n);
excising a predetermined number of the microphone sub-band signals y
sb(n) for predetermined sub-bands;
processing the remaining microphone sub-band signals (
Ysb,g(n)), i.e. the microphone sub-band signals that are not excised in the preceding step,
to obtain enhanced microphone sub-band signals (ŝ
sb,g (n)); and
reconstructing microphone sub-band signals for the predetermined sub-bands for which
microphone sub-band signals were excised, wherein each of the excised microphone sub-band
signals is reconstructed by means of enhanced microphone sub-band signals (ŝ
sb,g(n)) obtained by processing the remaining microphone sub-band signals (y
sb,g(n)).
[0013] According to the disclosed method only part of the microphone sub-band signals that
are obtained by a filter bank comprising high-pass, band pass and low-pass filters
are processed for quality enhancement, e.g., noise reduction and echo compensation
which are expensive processing operations. For instance, half of the microphone sub-band
signals may be excised, e.g., all microphone sub-band signals for sub-bands with an
even or an odd index. Therefore, the overall computational load, memory demand as
well as computational time, can be reduced as compared to the art. Moreover, the disclosed
method does not increase the cost of manufacture.
[0014] Besides excising microphone sub-band signals for alternating indices, microphone
sub-band signals above or below a predetermined frequency threshold may be excised.
In particular, microphone sub-band signals with alternating indices may be excised
for a frequency range of the sub-band signals above or below a predetermined frequency
threshold only. For instance, one might consider applying the above described method
(comprising excising a predetermined number of microphone sub-band signals) only for
a high frequency range (e.g., above some kHz, above 1 kHz, 1.5 kHz or 2 kHz) and using
conventional signal processing for all of the obtained microphone sub-band signals
in a low frequency range. Thereby, a variety of compromises between saving computational
costs and achieving the best signal quality possible can be achieved.
[0015] It is expressly noted that the processing of the remaining microphone sub-band signals
(y
sb,g(n)) to obtain the enhanced microphone sub-band signals (ŝ
sb,g(n)) can comprise echo compensating of the remaining microphone sub-band signals (y
sb,g(n)) and/or noise reduction of the remaining microphone sub-band signals (y
sb,g(n)). In particular, the expensive process of echo compensation benefits from excising
a predetermined number of microphone sub-band signals.
[0016] Echo compensated microphone sub-band signals can be advantageously further processed
for noise reduction. Moreover, the microphone sub-band signals might be de-correlated
by a time-invariant de-correlation filtering means (e.g., of the first or second order)
or by an adaptive de-correlation means as known in the art in order to improve the
convergence speed of the adaptation process of the filter coefficients of the employed
echo compensation filtering means.
[0017] The remaining microphone sub-band signals (y
sb,g(n)) can be echo compensated by steps comprising
dividing a reference signal x(n) into reference sub-band signals x
sb(n);
excising a predetermined number of the reference sub-band signals x
sb(n) that is equal to the predetermined number of excised microphone sub-band signals
for the same predetermined sub-bands;
adapting filter coefficients of an echo compensating filtering means based on the
remaining reference sub-band signals x
sb,g(n) (and sub-band error signals); and
filtering the remaining microphone sub-band signals (y
sb,g(n)) by means of the adapted filter coefficients for reducing echo contributions to
the remaining microphone sub-band signals (y
sb,g(n)) (i.e. subtracting estimated echo contributions from the remaining microphone
sub-band signals (y
sb,g(n)).
[0018] The reference signal x(n) is a signal received by a near party and input in a loudspeaker
of a loudspeaker-room system. In particular, the reference signal represents a verbal
utterance of the remote party and transmitted to the near party, e.g., by radio transmission.
[0019] An analysis filter bank for obtaining the reference sub-band signals x
sb(n) might be used that is similar to the one used for dividing the microphone signal
(y(n)) into microphone sub-band signals y
sb(n). Both filter banks may comprise Hann or Hamming windows. When adapting the employed
echo compensating means it is highly efficient to only consider reference sub-band
signals for sub-bands that are remained for the microphone sub-band signals. If microphone
sub-band signals with even sub-band indices are maintained (not excised), reference
sub-band signals with the same even sub-band indices are maintained for adaptation
of the echo compensating means, for example.
[0020] The microphone sub-band signals (y
sb(n)) and the reference sub-band signals (x
sb,g(n)) may be down-sampled with respect to the microphone signal (y(n)) and the reference
signal (x(n)), respectively, by the same down-sampling factor (r). If, e.g., a Hann
window is used for the analysis filter banks the length of the analysis filters has
to be equal to the number of sub-bands M and a down-sampling factor of r = M / 4 is
suitable. In fact, the set of reconstructed microphone sub-band signals for the predetermined
sub-bands for which microphone sub-band signals were excised and the enhanced microphone
sub-band signals (ŝ
sb,g(n)) are synthesized by a synthesis filter bank similar to the analysis filter bank
used to divide the microphone signal into the microphone sub-band signals.
[0021] For a typical processing of the analysis and the synthesis filter bank by Discrete
Fourier Transformation (DFT), for example, the lengths of the analysis and the synthesis
filter banks are the same and equal to the number of sub-bands M. A down-sampling
factor of r = M / 4, in principle, allows for perfect re-synthesis of the microphone
sub-band signals.
[0022] By down-sampling the microphone sub-band signals with respect to the microphone signal
detected by a microphone (sampling rate about 8 kHz, for example) the computational
load is significantly reduced as known in the art.
[0023] According to a relatively simple embodiment of the inventive method at least at one
time (n) the microphone sub-band signals for the predetermined sub-bands for which
microphone sub-band signals were excised are reconstructed for the predetermined sub-bands
by averaging the remaining microphone sub-band signals that are adjacent in time (n
+ k, n - k), where n is the discrete time index and k is an integer, k ≥ 1; k = 1,
2, ... Thus, by the term "adjacent" not only the closest adjacent signals (in time)
are meant but also some finite number of neighbors can be included. For example, a
reconstructed microphone sub-band signal at frequency bin j may be calculated by averaging
enhanced (e.g., echo compensated) remained microphone sub-band signals at frequency
bins j + 1 and j - 1. Averaging may include different weights (interpolation matrices)
for the microphone sub-band signals at times n + 1, n and n - 1 (and further adjacent
values, when used).
[0024] Such averaging is relatively straightforward and may be performed only in frequency
regimes or for time frames where the inventive method has been applied. If a predetermined
number of microphone sub-band signals is excised all over the set of sub-bands (µ
= 1, .., M) averaging may be performed also all over the entire range of sub-bands.
On the other hand, it may be preferred to reconstruct a part of the excised microphone
sub-band signals only. In principle, reconstruction can be variably performed according
to the actual application. As already mentioned above one might prefer to apply the
herein disclosed method only to relatively high-frequency sub-bands. In this case
averaging is only necessary in order to reconstruct microphone sub-band signals for
the predetermined sub-bands for which microphone sub-band signals were excised in
the high-frequency sub-band range.
[0025] According to a more elaborated embodiment interpolation is performed based on both
adjacent and the maintained microphone sub-band signals at the time n at which microphone
sub-band signals shall be reconstructed. In particular, the excised microphone sub-band
signals at time n can be reconstructed by interpolation of remaining microphone sub-band
signals at the time n and remaining microphone sub-band signals adjacent in time (one
or more previous signal vector and subsequent signal vector). Accurate reconstruction
with tolerable artifacts can thereby be achieved. Details are given in the detailed
description below.
[0026] In order to guarantee a significant reduction of the need for computational resources
it may be preferred that the above-mentioned interpolation is performed by interpolation
matrices which are approximated by their main diagonals and secondary diagonals, respectively
(for further details, see again description below).
[0027] The present invention also provides a computer program product, comprising one or
more computer readable media having computer-executable instructions for performing
the steps of one of the above examples of the herein disclosed method for audio signal
processing.
[0028] Furthermore, it is provided a signal processing means, comprising
at least one microphone configured to obtain at least one microphone signal (y(n));
an analysis filter bank configured to divide the at least one microphone signal (y(n))
into microphone sub-band signals y
sb(n);
a first filtering means configured to excise a predetermined number of the microphone
sub-band signals y
sb(n) for predetermined sub-bands;
a second filtering means configured to process the remaining microphone sub-band signals
(y
sb,g(n)), i.e. the microphone sub-band signals that are not excised in the preceding step,
to obtain enhanced microphone sub-band signals (ŝ
sb,g(n));
processing means configured to reconstruct microphone sub-band signals for the predetermined
sub-bands for which microphone sub-band signals were excised, wherein each of the
excised microphone sub-band signals is reconstructed by means of enhanced microphone
sub-band signals (ŝ
sb,g(n)) obtained by processing the remaining microphone sub-band signals (y
sb,g(n)); and
a synthesis filter bank configured to synthesize the reconstructed and enhanced microphone
sub-band signals (ŝ
sb (n)) obtained from the remaining microphone sub-band signals (y
sb,g(n)) to obtain an enhanced microphone signal.
[0029] The set of reconstructed and enhanced microphone sub-band signals (ŝ
sb (n)) is obtained from (and include the) enhanced microphone sub-band signals (ŝ
sb,g(n)) obtained from the remaining microphone sub-band signals (y
sb,g(n)); see detailed description below.
[0030] As described above particular sub-bands (microphone sub-band signals of particular
sub-bands) might be excised (odd or even ones) all over the entire sub-band range
or for a particular range (e.g., a high-frequency range) only. The first filtering
means can be part of the analysis filter bank.
[0031] In the signal processing means the second filtering means may, in particular, be
an echo compensation filtering means; and the signal processing means may further
comprise
an analysis bank configured to divide a reference signal into reference sub-band signals
x
sb(n);
a third filtering means (e.g., being integrated in the analysis filter bank) configured
to excise a predetermined number of the reference sub-band signals x
sb(n) that is equal to the predetermined number of excised microphone sub-band signals
for the same predetermined sub-bands; and in addition
the echo compensation filtering means may be configured to be adapted based on the
remaining reference sub-band signals x
sb,g(n) (i.e. the filter coefficients of the adaptable echo compensation filtering means
are adapted); and
to filter the remaining microphone sub-band signals (y
sb,g(n)) by means of adapted filter coefficients.
[0032] The analysis filter banks (for dividing the microphone and reference signals into
sub-band signals) are, in particular, configured to down-sample the sub-band signals
by a factor r. The synthesis filter bank is, in particular, configured to up-sample
the down-sampled reconstructed and enhanced microphone sub-band signals (ŝ
sb,g(n)) by the same factor as the down-sampling factor r.
[0033] According to an example of the signal processing means, the processing means is configured
to reconstruct microphone sub-band signals for the predetermined sub-bands for which
microphone sub-band signals were excised (excised microphone sub-band signals) at
a particular time (n) by interpolation of remaining microphone sub-band signals at
the particular time (n) and remaining microphone sub-band signals adjacent in time
(directly adjacent or at some time n ± 1, and/or n ± 2, etc.).
[0034] Furthermore, the herein disclosed signal processing means according may further comprise
a post-filtering means configured to filter the enhanced microphone sub-band signals
(y
sb,g(n)) in order to reduce background noise and/or residual echoes. The post-filtering
means can be an adaptive noise reduction filter as known in the art, in particular,
a Wiener filter.
[0035] The signal processing means according to one of the examples above is particularly
useful for hands-free communication. Moreover, speech recognition can be improved
by the herein disclosed speech signal processing. Thus, it is provided a hands-free
telephone set or a speech recognition means comprising a signal processing means according
to one of the examples described above.
[0036] Additional features and advantages of the present invention will be described with
reference to the drawings. In the description, reference is made to the accompanying
figures that are meant to illustrate preferred embodiments of the invention. It is
understood that such embodiments do not represent the full scope of the invention.
[0037] Figure 1 is a flow chart illustrating basic steps of the inventive method for signal
processing comprising the steps of excising a number of microphone sub-band signals
before processing for enhancing the quality of the microphone signal and subsequently
reconstructing the excised microphone sub-band signals.
[0038] Figure 2 illustrates an example for a realization of the herein disclosed method
for signal processing including an echo compensation filter and a post-filtering means
for noise reduction.
[0039] In Figure 1 basic steps of the herein disclosed method for signal processing in the
sub-band regime are shown. A speech signal representing an utterance by a local speaker
is detected by a microphone that generates a microphone signal 1. The microphone signal
is filtered by an analysis filter bank 2 comprising low-pass, band pass and high-pass
filters in order to obtain microphone sub-band signals. These microphone sub-band
signals are subsequently further processed for enhancing the quality, e.g., by echo
compensation, dereverberation and/or noise reduction.
[0040] The speech signal might be detected by a microphone array to obtain a number of microphone
signals that might by processed by beamforming. In this case, the signal processing
described in the following can be applied to each of the microphone signals obtained
by the microphones of the microphone array.
[0041] It is noted that the analysis filter bank can be any one known in the art, e.g.,
a Discrete Fourier Transformation (DFT) or Discrete Cosine Transformation (DCT) filter
or a Fast Fourier Transformation (FFT) filter. However, according to the present invention
not all M microphone sub-band signals are used for the further processing, where M
is the order of a DFT, DCT or FFT, for example, or the channel number of the analysis
filter bank, in general. Thus, after the filtering of the microphone signal by the
analysis filter bank 2 a predetermined number of microphone sub-band signals is excised
3. For instance, all microphone sub-band signals y
µ with an odd index might be excised and only microphone sub-band signals y
µ with an even index µ ∈ {0, 2, 4,..., M - 2} may be maintained (resulting in reducing
the memory demand by half) and processed, e.g., for echo compensation 4.
[0042] Then, the previously excised sub-band signals (signal for sub-bands with an odd index,
for instance, if microphone sub-band signals has been previously excised for such
indices) are reconstructed 5 based on the echo compensated microphone sub-band signals.
An enhanced microphone signal is eventually obtained by synthesizing 6 both the echo
compensated sub-band signals and the reconstructed sub-band signals.
[0043] In the following the process of the reconstruction of the previously excised sub-band
signals is described or the case that the signals for odd sub-bands are excised (resulting
in half of the memory demand). For example, a windowed DFT is used for sub-band filtering
by an analysis filter bank. Further, reconstruction is based on one previous and one
following sub-band vector for exemplary purposes only. From the vector of the microphone
signal y(n), where n is the discrete time index, a vector of some length M + 2r (r
denotes the factor of down-sampling of the sub-band signals) is extracted

where the upper index T denotes the transposition operation. Windowing is performed
by

where the diagonal coefficients go, .., g
M-1, are the coefficients of the 0
th prototype filter (see also below), e.g., a Hann window, of the analysis filter bank
that is given by

[0044] The analysis filter banks may operate in the frequency (Ω) domain and the ideal frequency
response of a prototype low-pass filter may be given by

[0046] After supplementation of the window matrix
F with M x r zeros (zero padding) on the left-hand and right-hand sides
F0 =
[0
M×r F 0
M×r] a windowed signal portion of the length M can be obtained by
F0 y(n).
[0047] After transformation, e.g., by a DFT, the actual sub-band vector (at time n) is obtained.
The DFT can be formulated by the transformation matrix

[0048] The sub-band signal shall be down-sampled by the factor r and, thus, the down-sampled
sub-band signal at time n is obtained by

[0049] By means of the respective window matrices for the previous (n-1) and subsequent
(n+1) sub-band vectors

the signal vectors

are obtained. In order to extract odd sub-band vectors only the matrix

is defined to obtain sub-bands for odd sub-band indices

[0050] Similarly extraction of sub-band signals with even indices results from

with the extraction matrix

[0051] Reconstruction of the odd sub-band vectors is achieved by interpolation of even sub-band
vectors. At time n a reconstructed odd sub-band vector
ŷsb,u(n) is calculated from an actual (time is n) even sub-band signal vector
ysb,g(n) and a previous (time is n - 1) and a sub-sequent (time is n + 1) even sub-band
vector

with the interpolation matrices
C1,
C0,
C-1. In principle, averaging by more than two adjacent (in time) signal vectors can be
performed, e.g.,

[0052] With the above expression for sub-band signals with even indices one gets for the
interpolation with
C1,
C0, and
C-1:

[0053] This expression can concisely be represented by

with
Cges = [
C1 C0 C-1] and the block diagonal matrix

and the total window matrix

[0054] In order to find an optimal reconstruction for the previously excised sub-band vector
the L
2 - norm of the difference vector

has to be minimized. The minimization can readily be achieved in a sufficiently good
approximation by determining
Cges such that each row of the matrix

has a minimal L
2 - norm. This can be achieved by means of the Moore-Penrose-pseudo inverse

of the matrix

Thus, c
ges can be expressed by

Under the assumption that

is invertable (where the upper index H denotes the Hemite conjugate, i. e. the adjoint
matrix) the Moore-Penrose-pseudo inverse can be calculated from

[0055] It should be noted, however, that a direct use of the interpolation matrices
C-1 and
C1 demands for high computer capacities. Therefore, it is preferred to approximate these
matrices
C-1 and
C1 by their respective main and secondary diagonals.
[0056] If the output signals after echo compensation and/or other processing for noise reduction,
dereverberation , etc., for the sub-bands that are not excised is denoted by ŝ
µ (n), where µ is the sub-band index, one obtains for all sub-bands (including the
reconstruction of the previously excised sub-band vectors):

where C
k(n
1, n
2) denotes the element in the n
1th row and the n
2th column of the matrix
Ck. This implies that even sub-bands are taken with a delay of one time increment (n-1).
[0057] It should be noted that reconstruction of a previously excised sub-band signal can
be based on more than one preceding and subsequent sub-band signal (n-1 and n+2).
In particular, a different number of preceding and subsequent sub-band signals may
be used for the interpolation (
C0 ≠
0).
[0058] Synthesization of the thus obtained enhanced microphone sub-band signals in step
6 of Figure 1 is performed by means of a synthesis filter bank
gµ,syn = └g
µ,0,syn,...,g
µ,Nsyn-1.syn┘
T comprising, e.g., Hann windows, and also up-sampling the previously down-sampled
sub-band signals again (by the same factor r above that was used for the down-sampling).
[0059] Figure 2 shows an example of a realization of the herein disclosed method for signal
processing. Consider a telephone conversation between a remote party and a near party
that makes use of a hands-free set comprising a loudspeaker and a microphone. A signal
from the remote party x(n) (reference signal) is received on the near side. The communication
room of the near speaker, e.g., a vehicular compartment, represents a loudspeaker-room-microphone
(LRM) system characterized by an impulse response h(n). The microphone of the LRM
system is intended to detect a speech signal s(n) of the near side speaker. However,
the microphone also detects background noise b(n) and an echo contribution d(n) caused
by the loudspeaker output. The microphone signal generated by the microphone is thus
given by y(n)= s(n) + b(n) + d(n).
[0060] The signal processing for quality enhancement in the present invention is performed
in the sub-band regime. Thus, the microphone signal y(n) is filtered by an analysis
filter bank
gµ,ana 11 (s. above) in order to obtain microphone sub-band signals y
sb(n). A particular number of sub-band signals is excised by a filtering means 12, e.g.,
each second sub-band signal, more particularly, sub-band signals of odd sub-bands.
The resulting set of remaining sub-band signals of even sub-bands y
sb, g (even index of the sub-bands) is subject to filtering for enhancing the signal quality.
For example, noise reduction by means of a Wiener filter might be performed.
[0061] In the example shown in Figure 2, the microphone sub-band signals of even sub-bands
y
sb, g (n) are filtered by an echo compensation filter

of length N (number of filter coefficients for each sub-band µ) for modeling the
impulse response of the LRM. For the echo compensation filter that shall estimate
the impulse response, in principle, an infinite impulse response filter (IIR) or an
adaptable finite impulse response filter (FIR) as known in the art may be used. For
stability reasons an FIR filter is preferred. Given a typical sampling frequency of
a speech signal some 256/r to some 1000/r filter coefficient are to be employed (with
r denoting the factor of down-sampling of the sub-band signals).
[0062] In general, typical adaptation methods are iterative methods, i.e., in full band

the normalized least mean square (NLMS) algorithm

with the vector of the reference signal
x(n) = [x(n), x(n-1),...,x(n-N +1)]
T and the error signal e(n) representing the difference of the microphone signal and
the output of the echo compensation filter 13

[0063] The corrector step is adjusted by means of the real number κ.
[0064] Accordingly, in the sub-band regime the normalized least mean square (NLMS) algorithm
reads

(where the asterisk denotes the complex conjugate and κ
sb(n) adjusts the corrector step) with the vector of the reference signal

and

where the upper index H denotes the Hermitian adjugate.
[0065] Since the filtering means 12 only outputs signals for even sub-bands, the echo compensation
filter 13 also has only to operate for such sub-bands. Therefore, the reference signal
x(n) is input in an analysis filter bank 14 similar to the analysis filter bank
gµ,ana 11 used for dividing the microphone signal y(n) into sub-bands. From the resulting
reference sub-band signals x
sb(n) odd ones are excised as in the case of the microphone sub-band signals. Thus,
the echo compensation filter 13 only receives even reference sub-band signals that
are output by the filtering means 15 used to excise the other reference sub-band signals.
For even sub-bands only error signals e
sb,g(n) are obtained that represent echo compensated microphone sub-band signals.
[0066] It should be noted that both the microphone sub-band signals and the reference signal
sub-band signals are down-sampled by a factor r. For example, the spectra of the down-sampled
reference sub-band signals are given by

for each sub-band µ.
[0067] The higher the rate r is chosen, the more the computational load is reduced. However,
due to the finite-slope filter flanks, r = M is an upper limit for the sampling rate
r (where M is the number of sub-bands, i.e. number of channels of the analysis filter
banks 11 and 14).
[0068] In the example shown in Figure 2, the error signals e
sb,g(n) are further processed for noise reduction and reduction of residual echoes due
to imperfect adaptation of the echo compensation filter 13 by a post-filtering means
16, e.g., a Wiener filter. The filter characteristics of the Wiener filter is adapted
on the basis of the estimated auto power density of the error signals e
sb,g(n) and the perturbation that is still present in the error signals e
sb,g(n), i.e. the echo compensated microphone sub-band signals, in form of background
noise and residual echoes (for details see, e.g.,
E. Hänsler and G. Schmidt, "Acoustic Echo and Noise Control - A Practical Approach",
John Wiley & Sons, New York, 2004).
[0069] The enhanced sub-band signals ŝ
sb,g(n) are transferred from the post-filtering means 16 to a processing means 17 for
reconstructing sub-band signals for the previously excised odd sub-bands. Reconstruction
can be done as described above with reference to the flow chart of Figure 1. The complete
set of sub-band signals s̃
sb(n) is input in a synthesis filter bank 18 corresponding to the analysis filter bank
gµ,ana 14 used for the division of the microphone signal y(n) into the microphone sub-band
signals y
sb(n). Eventually, a full-band enhanced microphone signal s̃ (n) is obtained.
[0070] For a typical application including M = 256 sub-bands and down-sampling rates of
r = 64 and r = 72, for example, according to the method of the present invention as
described above both the computational time and memory demand can be reduced by about
50 % as compared to standard DFT processing. The time for signal processing (delay
time) is only a few milliseconds above the time delay of standard processing by means
of polyphase filter banks and, in particular, below the threshold according to the
GSM standards of 39 ms in vehicular cabins. Moreover, the adaptation velocity of the
echo compensation filter 13 is only slightly different from the one in methods known
in the art.
[0071] All previously discussed embodiments are not intended as limitations but serve as
examples illustrating features and advantages of the invention. It is to be understood
that some or all of the above described features can also be combined in different
ways.
1. Method for audio signal processing, comprising
dividing a microphone signal (y(n)) into microphone sub-band signals ysb(n);
excising a predetermined number of the microphone sub-band signals ysb(n) for predetermined sub-bands;
processing the remaining microphone sub-band signals (ysb,g(n)) to obtain enhanced microphone sub-band signals (ŝsb,g(n)); and
reconstructing microphone sub-band signals for the predetermined sub-bands for which
microphone sub-band signals were excised, wherein each of the excised microphone sub-band
signals is reconstructed by means of enhanced microphone sub-band signals (ŝsb,g(n)) obtained by processing the remaining microphone sub-band signals (ysb,g(n)).
2. Method according to claim 1, wherein the processing of the remaining microphone sub-band
signals (ysb,g(n)) to obtain the enhanced microphone sub-band signals (ŝsb,g(n)) comprises echo compensating and/or noise reduction of the remaining microphone
sub-band signals (ysb,g(n)).
3. Method according to claim 1 or 2, wherein the remaining microphone sub-band signals
(ysb,g(n)) are echo compensated by steps comprising
dividing a reference signal (x(n)) into reference sub-band signals (xsb(n));
excising a predetermined number of the reference sub-band signals (xsb(n)) that is equal to the predetermined number of excised microphone sub-band signals
for the same predetermined sub-bands;
adapting filter coefficients of an echo compensating filtering means based on the
remaining reference sub-band signals (xsb,g(n)); and
filtering the remaining microphone sub-band signals (ysb,g(n)) by means of the adapted filter coefficients.
4. Method according to one of the preceding claims, wherein the predetermined number
of sub-bands consists of sub-bands with an odd index or an even index and/or of sub-bands
above or below some predetermined frequency.
5. Method according to one of the preceding claims, wherein the microphone sub-band signals
(ysb(n)) and the reference sub-band signals (xsb,g(n)) are down-sampled with respect to the microphone signal (y(n)) and the reference
signal (x(n)), respectively, by the same down-sampling factor (r).
6. Method according to one of the preceding claims, wherein at least at one time the
microphone sub-band signals for the predetermined sub-bands for which microphone sub-band
signals were excised are reconstructed for the predetermined sub-bands by averaging
the remaining microphone sub-band signals that are adjacent in time.
7. Method according to one of the preceding claims, wherein microphone sub-band signals
for the predetermined sub-bands for which microphone sub-band signals were excised
are reconstructed at a particular time (n) by interpolation of remaining microphone
sub-band signals at the particular time (n) and remaining microphone sub-band signals
that are adjacent in time.
8. Method according to claim 7, wherein the interpolation is performed by interpolation
matrices and wherein the interpolation matrices are approximated by their main diagonals
and secondary diagonals, respectively.
9. Computer program product, comprising one or more computer readable media having computer-executable
instructions for performing the steps of the method according to one of the claims
1 - 8.
10. Signal processing means, comprising
at least one microphone configured to obtain at least one microphone signal (y(n));
an analysis filter bank (11) configured to divide the at least one microphone signal
(y(n)) into microphone sub-band signals ysb(n);
a first filtering means (12) configured to excise a predetermined number of the microphone
sub-band signals ysb(n) for predetermined sub-bands;
a second filtering means (16) configured to process the remaining microphone sub-band
signals (ysb,g(n)) to obtain enhanced microphone sub-band signals (ŝsb,g(n));
a processing means (17) configured to reconstruct microphone sub-band signals for
the predetermined sub-bands for which microphone sub-band signals were excised by
means of enhanced microphone sub-band signals (ŝsb,g(n)) obtained by processing the remaining microphone sub-band signals (ysb,g(n)); and
a synthesis filter bank (18) configured to synthesize the reconstructed and enhanced
microphone sub-band signals (ŝsb (n)) obtained from the remaining microphone sub-band signals (ysb,g(n)) to obtain an enhanced microphone signal (s̃(n)).
11. The signal processing means according to claim 10, wherein the second filtering means
(16) is an echo compensation filtering means; and further comprising
an analysis bank (14) configured to divide a reference signal into reference sub-band
signals xsb(n);
a third filtering means (15) configured to excise a predetermined number of the reference
sub-band signals xsb(n) that is equal to the predetermined number of excised microphone sub-band signals
for the same predetermined sub-bands; and wherein
the echo compensation filtering means is configured to be adapted based on the remaining
reference sub-band signals xsb,g(n) and to filter the remaining microphone sub-band signals (ysb,g(n)) by means of adapted filter coefficients.
12. The signal processing means according to claim 10 or 11, wherein the processing means
(17) is configured to reconstruct microphone sub-band signals for the predetermined
sub-bands for which microphone sub-band signals were excised at a particular time
(n) by interpolation of remaining microphone sub-band signals at the particular time
(n) and remaining microphone sub-band signals that are adjacent in time.
13. The signal processing means according to one of the claims 10 - 12, further comprising
a post-filtering means (17) configured to filter the enhanced microphone sub-band
signals (ŝsb,g(n)) in order to reduce background noise and/or residual echoes.
14. Hands-free telephone set or speech recognition means comprising a signal processing
means according to one of the claims 10 - 13.