[0001] The invention relates to a signal processing method and a signal processing device.
It further relates to a training method and a training device.
[0002] In signal processing it is commonly important to accomplish noise reduction. This
may in particular be important for the purpose of speech enhancement when processing
a speech signal which comprises a certain amount of noise. In order to ensure a good
speech quality, for example when having a mobile phone being operated within a car
being operated via a hands-free speaking system the background noise from the car
may add a substantive amount of noise to the speech signal and thereby decrease its
quality. A common approach for the purpose of speech enhancement by way of noise reduction
is the Wiener filter. Wiener filters are characterized by an assumption that the signal
and the additive noise are stochastic processes with known spectral characteristic
or known autocorrelation or cross-correlation. They are further characterized by performance
criteria like minimum mean-square error and an optimal such filter may be determined
from a solution based on scalar methods. The goal of the Wiener filter is to filter
out noise that has corrupted a signal by statistical means.
[0003] Environment noise degrades both speech quality and intelligibility for voice calls
from mobile phones. Methods for speech enhancement aim at reducing the noise down
to a reasonable level while maintaining as much as possible the speech signal undistorted.
[0004] Approaches in order to achieve this have been to apply a weighting rule to the noisy
speech spectral amplitudes for estimating the clean speech component. The derivation
of the waiting rule may be formulated as an optimization problem using criteria such
as minimum mean square error of spectral amplitudes, logged-spectral amplitudes or
perceptually motivated variants of these. Such approaches have been disclosed in:
- [1] P. Scalart and J.V. Filho, "Speech Enhancement Based on A Priori Signal to Noise Estimation,"
in Proc. of ICASSP'96, Atlanta, GA, May 1996, pp. 629-632.
- [2] Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time
Spectral Amplitude Estimator," IEEE Transactions on Acoustics, Speech and Signal Processing,
vol. 32, no. 6, pp. 1109-1121, Dec. 1984.
- [3] Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Log-Spectral
Amplitude Estimator," IEEE Transactions on Acoustics, Speech and Signal Processing,
vol. 33, no. 2, pp. 443-445, Apr. 1985.
- [4] P.C. Loizou, "Speech Enhancement Based on Perceptually Motivated Bayesian Estimators
of the Magnitude Spectrum,"IEEE Transactions on Speech and Audio Processing, vol.
13, no. 5, pp. 857-869, Sept. 2005.
[0005] A further approach is to model the spectral of clean speech and noise using probability
density functions (PDF). The probability density functions of the real and imaginary
part of the clean speech spectrum may be modelled as Gaussian, which is disclosed
[2,3] but more recently shows that a Gamma PDF [insert paper 5!] or a super-Gaution
PDF [insert paper 6] leads to better results.
[0007] It is an object of the invention to create a signal processing method and a signal
processing device which needs a feasible memory space. According to a further aspect
of the invention it is an object to provide a training method and a training device
being designed for providing means for enabling a signal enhancement with feasible
memory space needed.
[0008] The object is achieved by the features of the independent claims.
[0009] According to a first aspect of the invention a signal processing method and a corresponding
signal processing device are provided. The signal processing method comprises the
steps of an acquisition of an audio signal. It further comprises periodically digitizing
the audio signal resulting in frames of the digitized audio signal. A noisy audio
signal spectrum is determined for each frame of the digitized audio signal. Quantized
a priori and a posteriori signal to noise ratios are determined depending on the noisy
audio signal spectrum for the provided discrete frequencies of each frame. For the
provided discrete frequencies given associated Perceptual scale values are determined
dependent on the quantized a priori and a posteriori signal to noise ratios. The given
Perceptual scale gain values may be provided on a Bark scale for respective Bark scale
subbands. The Bark scale is a psychoacoustical scale. The scale ranges from 1 to 24
and corresponds to the first 24 critical bands of hearing. The subsequent band edges
are in hertz, 0, 100, 200, 300, 400, 510, 630, 770, 920, 1080, 1270, 1480, 1720, 2000,
2320, 2700, 3150, 3700, 4400, 5300, 6400, 7700, 9500, 12000 and 15500. The perceptual
scale gain values may however also be provided on a Me1 scale or some other type of
perceptual scale. The given Perceptual scale gain values are provided on a Perceptual
scale for respective Perceptual scale subbands. The respective spectral values of
the noisy audio signal spectrum of the respective frame are multiplied with the determined
respective Perceptual scale gain values resulting in estimated wanted spectrum values.
An estimated digitized wanted signal is determined dependent on the estimated wanted
spectrum values.
[0010] Dependent on the sampling frequency of the digitized audio signal not even all the
subbands of the Perceptual scale are used. Even if all subbands of the Perceptual
scale are used due to the maximum number of 24 Perceptual scale subbands, in the case
of the Bark scale, the number of Perceptual scale gain values is much lower than,
if gain values directly associated to each discrete frequency were employed. Therefore
the memory space needed for storing the Perceptual scale gain values is fairly low.
[0011] According to a preferred embodiment of the first aspect of the invention the associated
Perceptual scale values are determined from an approximating function associated to
the respective quantized a posteriori signal to noise ratio. The approximating function
is dependent on the respective quantized a priori signal to noise ratio. By this the
amount of memory space needed to store the data needed for performing the signal processing
may even be further greatly reduced.
[0012] According to a further preferred embodiment the approximating function is a polynomial
function. This uses the insight that a polynomial function is typically well-suited
for approximating the associated Perceptual scale gain values associated to a respective
a posteriori signal to noise ratio. It is in particular advantageous, if the approximating
function is a polynomial function and a saturation level. This uses the insight that
typically at the given point of the quantized a posteriori signal to noise ratio the
Perceptual scale gain values reach a saturation level and may be therefore simply
approximating by the saturation level.
[0013] According to a further preferred embodiment the polynomial function has an order
of between 4 and 12. In this range a reasonable trade-off between performance and
storage requirements is obtained.
[0014] According to a further preferred embodiment the estimated digitized wanted signal
is a digitized speech signal and the estimated wanted spectrum is an estimated speech
spectrum. This enables to enhance a speech signal.
[0015] According to a further preferred embodiment the quantization of the quantized a priori
signal to noise ratio and/or the quantized a posteriori signal to noise ratio are
on a logarithmic scale. This further enables to limit the necessary memory space.
[0016] According to a further preferred embodiment the Perceptual scale gain values are
determined depending on a wanted signal activity detector. This enables to further
enhance the noise reduction and the overall signal quality.
[0017] According to a second aspect of the invention the training method and a corresponding
training device is provided. The method comprises the steps of the provision of frames
of a digitized audio signal, provision of frames of a digitized wanted signal and
provision of frames of a digitized noise. Preferably the digitized audio signal, the
digitized wanted signal and the digitized noise are recorded in an environment where
the signal processing method is to be conducted.
[0018] A noisy audio signal spectrum is determined for each frame of the digitized audio
signal. A wanted signal spectrum is determined for each frame of the digitized wanted
signal. A noise spectrum is determined for each frame of the digitized noise. Quantized
a priori and a posteriori signal to noise ratios are determined depending on the noisy
audio signal spectrum for the provided discrete frequencies of each frame and depending
on the wanted signal spectrum for the provided discrete frequencies of each frame.
Gain values for the provided discrete frequencies are determined depending on the
noise spectra and wanted signal spectra associated to the respective discrete frequencies.
The quantized a priori and a posteriori signal to noise ratios of respective discrete
frequencies are associated to the respective gain values for the provided discrete
frequencies. Perceptual scale gain values are determined associated to the quantized
a priori and a posteriori signal to noise ratios of respective discrete frequencies
depending on the respective gain values being associated to the respective discrete
frequencies falling within the respective Perceptual scale subband. They are associated
to the quantized a priori and a posteriori signal to noise ratios of respective discrete
frequencies.
[0019] In this respect advantage is also taken of the relatively low number of Perceptual
scale subbands for determining the Perceptual scale gain values and in that way greatly
reducing the memory space needed for storing the Perceptual scale gain values without
having to accept a subjective loss in the enhancement of the signal when using the
Perceptual scale gain values for the signal processing.
[0020] According to a preferred embodiment of the second aspect of the invention parameters
of an approximating function are determined by curve fitting of Perceptual scale gain
values associated to a respective quantized a posteriori signal to noise ratio. In
this way the memory space can further be reduced. In this respect it is particularly
advantageous if the approximating function is a polynomial function. According to
a further preferred embodiment the approximating function is a polynomial function
and a saturation level. In this respect it is particularly advantageous if the polynomial
function has an order between 4 and 12.
[0021] According to a further preferred embodiment the quantization of the quantized a priori
signal to noise ratio and the quantized a posteriori signal to noise ratio are on
a logarithmic scale.
[0022] According to a further preferred embodiment the estimated digitized wanted signal
is a digitized speech signal and the estimated wanted spectrum is an estimated speech
spectrum. It is in particular advantageous if the Perceptual scale gain values are
determined depending on a wanted signal activity detector. It is also in particular
advantageous if the parameters of the approximating function are determined depending
on a wanted signal activity detector.
[0023] According to a further aspect of the invention a computer program product is provided
comprising a computer readable medium embodying program instructions executable by
a computer in order to conduct the signal processing method according the first aspect
of the invention.
[0024] According to a further aspect of the invention a computer program product is provided
comprising a computer readable medium embodying program instructions executable by
a computer in order to conduct the training method according the second aspect of
the invention.
[0025] Exemplary embodiments of the invention are explained in the following with the aid
of schematic drawings. These are as follows:
- Figure 1,
- a block diagram of a signal processing device,
- Figure 2,
- a block diagram of a training device,
- Figure 3,
- a detailed block diagram of the training device,
- Figure 4,
- a further detailed block diagram of further parts of the training device,
- Figure 5,
- a further block diagram of further parts of the training device,
- Figure 6,
- a detailed block diagram of parts of the signal processing device,
- Figures 7A to 7D,
- diagrams of Perceptual scale gain values,
- Figure 8,
- a further Perceptual scale gain value diagram,
- Figure 9,
- a further Perceptual scale gain value,
- Figures 10A and 10B,
- original gain values,
- Figures 10C and 10D,
- approximated gain values,
- Figures 10E and 10F,
- approximation errors,
- Figure 11,
- segmental SSDRs and
- Figure 12,
- segmental SSDRs in speech presence.
[0026] Elements of the same design or function that appear in different illustrations are
identified with the same reference characters.
[0027] Figure 1 shows a signal processing device. It comprises a block B1, which is operable
to sense an audio signal A1 y(t) and may be embodied as a microphone. Block B2 comprises
an analog/digital converter ADC and block B3 comprises single sample processing and
block B4 comprises an echo cancellation. The output of block B4 is then a digitized
audio signal A4 y
1(n). The audio signal A1 y(t) is periodically digitized resulting in frames 1 of the
digitized audio signal A4 y
1(n). Each frame 1 therefore comprises a set of values of the digitized audio signal
A4 y
1(n). The reference numeral 1 for a frame is also used as an index. A n is a place
holder for the respective value of the digitized audio signal A4 y
1(n). The echo cancellation in block B4 may be accomplished by a preprocessing filter
suitable for echo cancellation.
[0028] A block B5 is operable to conduct noise reduction and is described in further detail
by the aid of Figure 6. Further blocks may follow and a further block B6 comprises
an encoder which may encode the estimated digitized wanted signal A48
x̂l(n), for example in order to send it via an antenna.
[0029] The signal processing device may be embodied in a cell phone, it may however, for
example, also be part of a hands-free speaking system or may also be embodied in another
mobile communication device. It may however also be embodied in a non-mobile communication
or a device known to a person skilled in the art.
[0030] The signal processing device comprises a storage device for storing data and a program
code being run on a processor of the signal processing device during operation of
the signal processing device. The processor may preferably comprise a digital signal
processor (DSP).
[0031] Figure 2 shows a block diagram of a training device. A speech database comprising
the wanted signal (B10) and a block B12 comprising a noise database. The speech database
may in general comprise the wanted signal, which is not limited to being a speech
signal. It is preferably a speech signal, may however also be of a different kind,
for example a music signal. The noise database comprises preferably typical car noise,
such car noise signals may be taken from for example NTT and NTT-AT databases [[11]
NTT-AT Speech Database, "Multi-Lingual Speech Database for Telephonometry 1994," http://www.ntt-at.com/produets_e/
speech/index.html, 1994.; [12] NTT-AT Noise Database, "Ambient Noise Database for
Telephonometry 1996," http://www.ntt-at.com/ products_e/noise-DB/index.html, 1996].
[0032] The speech database may, in the case of speech being the wanted signal, comprise
various utterances spoken by different speakers, in particular male and female.
[0033] Block B14 may comprise an analog/digital converter ADC and comprise the functionality
of single sample processing and echo cancellation. In block B14 frames 1 of the digitized
wanted signal (A34 x
1(n)) and frames of a digitized noise A38 n
1(n) are determined. Each frame 1 preferably has a same given length, for example 200
samples. Preferably each frame 1 of a digitized audio signal (A4 y
1(n)) is obtained by summing respective frames 1 of the digitized noise (A38 n
1(n)) and the digitized wanted signal (A34 x
1(n)). This may also be accomplished in a block B16, the block B16 is designed to determine
gain values A26
GVAD(
k) and is in further detail explained by the aid of Figure 3 below. A block B18 is
designed to determine Perceptual scale gain values A28

[0034] A block B20 is provided for determining parameters of an approximating function by
curve fitting and is further described by the aid of Figure 5 below. The determined
parameters are then stored in a block B20, which may be part of a data storage device.
The parameters are then preferably stored in the respective storage device of the
signal processing device for conducting the noise reduction of block B5.
[0035] The respective frames 1 of the digitized noise A38 n
1(n), the digitized wanted signal A34 x
1(n) and the digitized audio signal A4 y
1(n) are all subjected to a discrete Fourier transformation DFT in a block B24. The
outputs of the block B24 are then the noise spectra A40 N
1(k), the wanted signal spectra A36 X
1(k) and noisy audio signal spectra A6 Y
1(k) each associated to respective frame 1. A k represents the respective discrete
frequency.
[0036] Preferably also the amplitudes of the noise spectra A40 N
1(k), the wanted signal spectra A36 X
1(k) and the noisy audio signal spectra A6 Y
1(k) are determined by the respective absolute values and squaring them. In a block
B26 respective ideal gains

are then determined by aid of the formula shown in block B28. A k is always a place
holder for the discrete frequency and may be dependent on the amount of samples associated
to the respective frame obtained values from k=0 up to K-1.
[0037] A block B30 is operable to conduct a minimum statistics. The output of block B30
is a noise estimate (A18λ̂
Nl(k)). Preferably the minimum statistics is conducted by searching for a minimum value
of the respective values of the noisy audio signal spectra A6 Y
1(k) going through the provided frames 1 at always a given discrete frequency k. In
this way stationary and non-stationary noise may be estimated with relatively high
quality. A block B32 is operable to determine a quantized a priori signal to noise
ratio A14 ξ̃
l (
k) and a quantized a posteriori signal to noise ratio A16 γ̃
l(
k). An a posteriori signal to noise ratio A12 γ̂
l(k) is preferably determined by aid of the formula shown in block B34. The quantized
a posteriori signal to noise ratio A16 γ̃
l(
k) is then obtained from the a posteriori signal to noise ratio A12 γ̂
l(
k) in a block B36 by quantizing the a posteriori signal to noise ratio A12 γ̂
l(
k) preferably on a logarithmic scale with each discrete step preferably having a distance
A52 Δ, e.g. 1 dB.
[0038] An interim a priori signal to noise ratio A54 ζ̂
l(
k) is preferably obtained by the formula shown in block B38. A w denotes a weighting
factor, which may for example have a value of 0.98. max denotes a maximum value function
and ensures that the interim a priori signal to noise ratio A54 ζ̂
l(
k) is hot calculated with a negative value of the a posteriori signal to noise ratio
A12 γ̂
l(
k), which might occur due to an error in the noise estimate A18 λ̂
Nl(k).
[0039] An a priori signal to noise ratio A10 ξ̂
l(
k) is also determined in block B38 preferably by aid of the shown formula, which comprises
a maximum value function and with a limitation value A56 ζ
min. The limitation value is set such that, if the interim a priori signal to noise ratio
A54 ζ̂
l(
k) was quantized on a logarithmic scale, it would have a value of - 15 dB.
[0040] The a priori signal to noise ratio A10 ξ̂
l(k) is then quantized in a block B40 in a corresponding way to the way it is done
in the block B36 then resulting in a a quantized a priori signal to noise ratio A14
ξ̃
l(k).
[0041] In a block B42 a wanted signal activity detector VAD is determined. In the preferred
case of a speech signal being the wanted signal a speech absence probability is then
determined. For determining the speech absence probability VAD of the quantized a
priori signal to noise ratio A14 ξ̃
l(
k) for the respective frame is preferably smoothed and is then compared with a given
threshold being representative for a wanted signal presence or absence. The wanted
signal activity detector VAD is assigned a value of preferably either 1 or 0. Preferably
a value of 1 represents the presence of the wanted signal and preferably a value of
0 represents the absence of the wanted signal.
[0042] In a block B44 the respective ideal gain A22

a quantized a posteriori signal to noise ratio A16 γ̃
l(
k) and a quantized a priori signal to noise ratio A14 ξ̃
l (k) of the respective frame 1 for each discrete frequency k then are associated to
each other and preferably buffered in a buffer shown in block B46. Preferably there
is a buffer for each of the distinctions between the values of the wanted signal activity
detector VAD. Respective triplets of the ideal gain A22

the quantized a priori signal to noise ratio A14 ξ̃
l (
k) and the quantized a posteriori signal to noise ratio A16 γ̃
l(
k) are determined in the blocks B24 to B46 for all the discrete frequencies k for all
the frames 1. A24

refers to the ideal gain associated to wanted signal absence or respectively presence.
A24

[0043] In block B46 gain value A26
GVAD(k) associated to the respective quantized a priori signal to noise ratios (A14 ξ̃
l(
k)) and the respective a posteriori signal to noise ratios (A16 γ̃
l(
k)) are then determined for each discrete frequency k and also each value of the quantized
a priori signal to noise ratio A14 ξ̃
l(
k) and the quantized a posteriori signal to noise ratio A16 γ̃
l(
k)
. Preferably the quantized a priori signal to noise ratios A14 ξ̃
l(
k) and the quantized a posteriori signal to noise ratios A16 γ̃
l(
k) have a value range between 20 and - 15 dB with a distance of 1 dB resolution. The
gain values A26
GVAD(
k) are preferably determined by averaging all ideal gains A24

associated to wanted signal activity detector values of the respective discrete frequency
k and of the respective associated quantized a priori signal to noise ratio A14 ξ̃
l(
k) and the associated quantized a posteriori signal to noise ratio A16 γ̃
l(
k). In block B46 a resulting value range of the gain value A26
GVAD(k) for one given discrete frequency and one value of the wanted signal activity detector
is shown. For all the other discrete frequencies k separated by the value of the wanted
signal activity detector VAD respective gain values A26
GVAD(k) are determined in this way.
[0044] In block B18 Perceptual scale gain values A28

are determined. The Perceptual scale is psychoacoustical scale. It has up to 24 subbands
m and corresponds to the first 24 critical bands of hearing. If f
s represents the sampling frequency used to obtain the digitized audio signal A4 y
1(n), the digitized noise A38 n
1(n) and the digitized wanted signal A34 xi(n), it may for example be in the range
of 8 KHz.
[0045] Depending on the sampling frequency f
s only some of the subbands of the Perceptual scale may be used in case of a sampling
frequency of 8 KHz, for example the first nineteen subbands m of the Perceptual scale
may be used. A "G" with directly following brackets with a place holder for the respective
discrete frequency k represents a matrix of the gain values A26
GVAD(k) associated to the respective discrete frequencies k. The Perceptual scale gain values
A28

for the respective Perceptual scale subbands m are then determined for all the associated
quantized a priori signal to noise ratios A10 ξ̂
l(
k) and a posteriori signal to noise ratios A12 γ̂
l(
k) of respective discrete frequencies k dependent on the respective gain values A26
GVAD(k) being associated to the respective discrete frequencies k falling within the respective
Perceptual scale subbands m and being associated to the respective quantized a priori
signal to noise ratios (A14 ξ̃
l(
k)) and the respective a posteriori signal to noise ratios (A16 γ̃
l(
k)) of the respective discrete frequency k. This is preferably achieved by respective
averaging of the respective gain values A26
GVAD(
k).
[0046] A capital G followed by a raised 'Perceptual' with a place holder behind them represents
the matrix of the Perceptual scale gain values A28

for the respective Perceptual scale subband m.
[0047] A parameterisation of the Perceptual scale gain values A28

is accomplished in block B20. In block B20 parameters of an approximating function
are determined by curve fitting of Perceptual scale values A28

associated to a respective quantized a posteriori signal to noise ratio A16
γ̃l(
k)
.
[0048] It is visible in Figures 7A to 7D that such a parameterisation may well be accomplished
by a polynomial function preferably in a given range from a first range parameter
A44 a
γ̂l(k) to a second range parameter B46 by a saturation level A32 H
sat.
[0049] The polynomial coefficients A30 C
γ̂l(k) are preferably determined by a way known to the person skilled in the art for curve
fitting, in particular by utilizing the principle of minimizing the least mean square
error. The saturation level A32 H
sat may be determined by searching for the respective maximum of the respective Perceptual
scale gain values A28

which then also determines the second range parameter A46
bγ̃l(k). In the range of values between the first range parameter A44
aγ̃l(k) and the second range parameter A46
bγ̂l(k) of the quantized a priori signal to noise ratio A14 ξ̂
l(
k) the curve fitting of the Perceptual scale gain values A28

is then conducted.
[0050] The polynomial coefficients A30 C
γ̂l(k) are preferably determined by a way known to the person skilled in the art for curve
fitting, in particular by utilizing the principle of minimizing the least mean square
error. The saturation level A32 H
sat may be determined by searching for the respective maximum of the respective Perceptual
scale gain values A28

which then also determines the second range parameter A46
bγ̂l(k) in the range of values between the first range parameter A44
aγ̂l(k) and the second range parameter A46
bγ̂l(k) of the quantized a priori signal to noise ratio A14 ξ̃
l(
k) the curve fitting of the Perceptual scale gain values A28

is then conducted. For the range of the quantized a posteriori signal to noise ratio
A16 γ̃
l(
k) having values of an effective a posteriori signal to noise ratio A58 of for example
22 given value may be associated to all the respective Perceptual scale gain values
A28

. The same is true for a value of 1 and lower of the quantized a posteriori signal
to noise ratio A16 γ̃
l(
k).#
[0051] In the Figures 7A to 7D one may note that the Perceptual scale gain values A28

in speech absence do not completely suppress the wanted signal. A non-zero weighting
rule value, that is a non-zero Perceptual scale value 28, in the wanted signal absence
may help to preserve the wanted signal and noise naturalness, in particular in the
transition from the wanted signal presence to the wanted signal absence or vice versa.
[0052] It may also be noted that during a wanted signal pause the Perceptual scale gain
value A28

at the Perceptual subband m = 1 exhibits lower values than at the Perceptual subband
m = 14 indicating that the noise is more strongly suppressed at lower frequencies
than at higher frequencies. The explanation for this is that in particular car noises
are concentrated in low frequencies.
[0053] A capital P stands for a polynom obtained by the approximation of the respective
polynomial coefficients A30
Cγ̂1(
k)
.
[0054] A capital P with brackets behind and a place holder for the subband m then represents
the respective polynomial associated to the respective subband m.
[0055] Figure 6 shows in more detail block B5 of Figure 4. A block B50 is operable to conduct
a discrete Fourier transformation DFT of the respective frame 1 of the digitized audio
signal A4 y
1(n). The output of the block B50 is then the respective noisy audio signal spectrum
A6 Y
l(k) associated to the respective frame 1. In a block B52 the amplitude of the noisy
audio signal spectrum A6 Y
1(k) for the respective discrete frequency k computed by taking its absolute value
and squaring it. This is also conducted for all the other discrete frequencies.
[0056] A block B54 comprises the conduction of minimum statistics in order to obtain the
noise estimate A18 λ̂
Nl(k) and it is operable in the same way as block B30.
[0057] In a block B56 the quantized a posteriori signal to noise ratio A16 γ̃
l(
k) and the quantized a priori signal to noise ratio A14 ξ̃
l(
k) are obtained. The a posteriori signal to noise ratio A12 γ̂
l(
k) is obtained by the formulas by calculating it from the formulas of block B34 and
B36.
[0058] The quantized a priori signal to noise ratio A14 ξ̃
l (k) is obtained by calculating it from the formulas of block B58, which differs from
the one of the block B38 in that instead of the wanted signal spectrum A36 X
1(k) an estimated wanted signal spectrum A50
X̂l(
k) is used which is recursively obtained by the procedure of the following blocks within
the block B5. In addition to that the quantized a priori signal to noise ratio A14
ξ̃
l(
k) is obtained by applying the formula of the block B40.
[0059] In a block B60 the wanted signal activity detector VAD is estimated in the same way
as in block B42. In a block B62 the approximating function for the Perceptual scale
gain values A28

is determined depending on the quantized a posteriori signal to noise ratio A16 γ̃
l(
k) and the wanted signal activity detector VAD by retrieving the associated parameters
of the approximating function, preferably the respective polynomial coefficients A30
Cγ̂l(k) and the respective saturation level A32 H
sat preferably together with the first and second range parameters A44
aγ̂l(k), A46 b
γ̂l(k).
[0060] In a block B64 the Perceptual scale gain value A28

associated to the actual quantized a priori signal to noise ratio A14 ξ̃
l(
k) is then calculated and is then multiplied in a multiplication place Ml with a respective
value of the noisy audio signal spectrum A6 Y
1(k) and this is done for all the discrete frequencies k of the respective frame 1.
After that in a block B66 these obtained values, representing the estimated wanted
signal spechtrum A50
X̂l(
k) are subjected to an inverse discrete Fourier transformation IDFT which then results
in an estimated digitized wanted signal A48 x̂
l(
n). The input of the block B66 is the estimated wanted signal spectrum A50
X̂l(k) for the respective frame 1.
[0061] As an example the input data for the training device provided by the blocks B10 and
B12 may be of four different utterances spoken by different speakers, four male and
four female and 84 car noise signals, taken from for example NTT-AT databases. Thee
signals are split in two sets of equal size for training and testing. After a combination,
20 x 42 = 840 noisy speech utterances at the sampling frequency of 8 KHz are obtained
for a training and testing session.
[0062] The polynomial function used for approximation purposes preferably has an order between
4 and 12, it may however also have an order higher than 12 if enough memory space
is available.
[0063] The wanted signal activity detector may also be referred to as wanted signal activity
detection. In a particular case it may be the voice activity detector or also a wanted
signal absence probability.
1. Signal processing method comprising the steps of
- acquisition of an audio signal (A1 y(t)),
- periodically digitizing the audio signal (A1 y(t)) resulting in frames (1) of the
digitized audio signal (A4 y1(n)),
- determining a noisy audio signal spectrum (A6 Y1(k)) for each frame (1) of the digitized audio signal (A4 y1(n)),
- determining quantized a priori and a posteriori signal to noise ratios (A14 ξ̃l(k), A16 γ̃l(k)) depending on the noisy audio signal spectrum (A6 Y1(k)) for the provided discrete frequencies (k) of each frame (1) ,
- determining for the provided discrete frequencies (k) given associated Perceptual
scale gain values (A28

dependent on the quantized a priori and a posteriori signal to noise ratios (A14
ξ̃l(k), A16 γ̃l(k)), the given Perceptual scale gain values (A28

being provided on a Perceptual scale for respective Perceptual scale subbands (m),
- multiplying the respective spectral values of the noisy audio signal spectrum (A6
Y1(k)) of the respective frame (1) with the determined respective Perceptual scale gain
values (A28

resulting in estimated wanted spectrum values (A50 X̂l(k)) and
- determining an estimated digitized wanted signal (A48 x̂l(n)) dependent on the estimated wanted spectrum values (A50).
2. Signal processing method according to claim 1 comprising determining the associated
Perceptual scale gain values (A28

from an approximating function associated to the respective quantized a posteriori
signal to noise ratio (A16 γ̃
l(
k))
, the approximating function being dependent on the respective quantized a priori signal
to noise ratio (A14 ξ̃
l(
k)).
3. Signal processing method according to claim 2 with the approximating function being
a polynomial function (P).
4. Signal processing method according to claim 3 with the approximating function being
a polynomial function (P) and a saturation level (A32 Hsat).
5. Signal processing method according to one of the claims 3 or 4, with the polynomial
function (P) having an order between four and twelve.
6. Signal processing method according to one of the previous claims, with the quantization
of the quantized a priori signal to noise ratio (A14 ξ̃l(k)) and/or the quantized a posteriori signal to noise ratio (A16 γ̃l(k)) being on a logarithmic scale.
7. Signal processing method according to one the previous claims, with the estimated
digitized wanted signal (A48 x̂l(n) ) being a digitized speech signal and with the estimated wanted spectrum (A50 X̂l(k)) being an estimated speech spectrum.
8. Signal processing method according to one of the previous claims comprising determining
the Perceptual scale gain values (A28

depending on a wanted signal activity detector (VAD).
9. Signal processing device being operable to conduct a signal processing method according
to one of the previous claims.
10. Training method comprising the steps of:
- provision of frames (1) of a digitized audio signal (A4 y1(n)),
- provision of frames (1) of a digitized wanted signal (A34 x1(n)),
- provision of frames (1) of a digitized noise (A38 n1(n)),
- determining a noisy audio signal spectrum (A6 Y1(k)) for each frame (1) of the digitized audio signal (A4 y1(n)),
- determining a wanted signal spectrum (A36 x1(k)) for each frame (1) of the digitized wanted signal (A34 x1(n)),
- determining a noise spectrum (A40 N1(k)) for each frame (1) of the digitized noise (A38 n1(n)),
- determining a quantized a priori and a posteriori signal to noise ratios (A14 ξ̃l(k) , A16 γ̃l(k)) depending on the noisy audio signal spectrum (A6 Y1(k)) for the provided discrete frequencies (k) of each frame (1) and depending on
the wanted signal spectrum (A36 X1(k)) for the provided discrete frequencies (k) of each frame (1),
- determining gain values (A26 GVAD(k)) for the provided discrete frequencies (k) dependent on the noise spectra (A40 N1(k)) and wanted signal spectra (A36 X1(k)) associated to the respective discrete frequencies (k),
- associating the quantized a priori and a posteriori signal to noise ratios (A14
ξ̃l(k), A16 γ̃l(k)) of respective discrete frequencies (k) to the respective gain values (A26 GVAD(k)) for the provided discrete frequencies (k),
- determining Perceptual scale gain values (A28

associated to the quantized a priori and a posteriori signal to noise ratios (A14
ξ̃1(k), A16 γ̃l(k)) of respective discrete frequencies (k) dependent on the respective gain values
(A26 GVAD(k)) being associated to the respective discrete frequencies (k) falling within the respective
Perceptual scale subband (m) and being associated to the quantized a priori and a
posteriori signal to noise ratios (A14 ξ̃l(k), A16 γ̃l(k)) of respective discrete frequencies (k).
11. Training method according to claim 10 comprising determining parameters of an approximating
function by curve fitting of Perceptual scale gain values (A28

associated to a respective quantized a posteriori signal to noise ratio (A16 γ̃
l(
k))
.
12. Training method according to claim 11 with the approximating function being a polynomial
function (P).
13. Training method according to claim 12 with the approximating function being a polynomial
function (P) and a saturation level (A32 Hsat).
14. Training method according to one of the claims 12 or 13, with the polynomial function
(P) having an order between 4 and 12.
15. A training method according to one of the claims 10 to 14 with the quantization of
the quantized a priori signal to noise ratio (A14 ξ̃l (k)) and the quantized a posteriori signal to noise ratio (A12 γ̂l(k)) being on a logarithmic scale.
16. Training method according to one of the claims 10 to 15 with the estimated digitized
wanted signal (A48 x̂l(n)) being a digitized speech signal with the estimated wanted spectrum (A50 X̂l(k)) being an estimated speech spectrum.
17. Training method according to one of the claims 10 to 16, comprising determining the
Perceptual scale gain values (A28

depending on a wanted signal activity detector (VAD)
18. Training method according to one of the claims 10 to 17 comprising determining the
parameters of the approximating function depending on a wanted signal activity detector
(VAD).
19. Training method according to one of the claims 10 to 18, comprising determining a
noise estimate for each frame (1) dependent on the respective depending on the noisy
audio signal spectrum (A6 Y1(k)), and determining a quantized a priori and a posteriori signal to noise ratios
(A14 ξ̃l(k), A16 γ̃l(k)) depending on the noisy estimate, the noisy audio signal spectrum (A6 Y1(k)) for the provided discrete frequencies (k) of each frame (1) and depending on
the wanted signal spectrum (A36 X1(k)) for the provided discrete frequencies (k) of each frame (1).
20. Training device being operable to conduct a training method according to one of the
claims 10 to 18.
21. Computer program product comprising a computer readable medium embodying program instructions
executable by a computer in order to conduct a signal processing method according
to one of the claims 1 to 8.
22. Computer program product comprising a computer readable medium embodying program instructions
executable by a computer in order to conduct a training method according one of the
claims 10 to 18.