Prior Art
[0001] The invention relates to methods for processing noise speech signals by subtracting
a noise function in accordance with the generic class of the independent claims.
[0002] In the chapter 8 titled "Speech Enhancement" in J.R. Deller, J.G. Proakis and J.H.L.
Hansen: "Discrete Time Processing of Speech Signals", Macmillan Publishing Company,
1993, a method for weighting a noise signal in the frequency domain for a speech signal
is presented. The weighting is performed using predefined values independent of time
or of frequencies or of properties of the speech signal or of properties of the noise
signal. This weighting leads to a reduction of musical tones.
Advantages of the Invention
[0003] The methods for processing speech signals by subtracting a noise function having
the characterising features of the independent claims have the advantage that the
weighting of the noise signal depends either on amplitudes of an acquired signal spectrum
or on a noise type. By depending on the acquired signal spectrum simultaneously a
dependence on frequency and time is achieved. Therefore, an improved speech quality
is realised by minimising thereby artefacts such as musical tones. Thus, if the invention
is used in mobile phones, better quality phone calls will be the result but also other
applications such as speech recognition and audio recording benefit from the invention
by a reduction of the artefacts and therefore of an improved speech recognition or
an improved audio recording.
[0004] The invention is improved by detecting a noise type of the detected noise spectrum,
for example noise in a car, noise in an office or noise in the street. This knowledge
is used to adapt the noise signal to its noise type. In this way, an improved weighting
of the noise signal is achieved, and it will lead to an improved noise reduction of
the speech signal.
[0005] The features the dependent claims enable further improvements of the invention.
[0006] It is an advantage to use for the weighting of amplitudes of the noise signal an
upper and a lower limit. The upper limit limits the amplitudes of the spectral noise
signal exceeding the upper limit, so that this method avoids musical tones. This leaves
a small part of the original noise in the final signal, so that the listener can still
hear what environment the speaker is in. The lower limit sets the lower limit as an
amplitude for those amplitudes of the spectral noise signal being under the lower
limit. This is an easy and effective method for improving the noise signal.
[0007] Furthermore, it is an advantage of the invention to use a signal-to-noise ratio of
the speech signal to determine a weighting of the noise signal. For a very low signal-to-noise
ratio, it is useless to perform a noise reduction because the signal is so weak as
compared to the noise, so that a noise reduction would introduce unwanted audible
effects, musical tones, because the noise signal is mixed up with the speech signal.
If the signal-to-noise ratio is very high, then noise reduction will not be necessary,
since the signal quality is already very high, so that a further reduction of the
noise would not improve the quality of reproduced speech signals. It is an advantage
to use a predefined weighting function for the noise signal for those speech signals
having a signal-to-noise ratio between the lower and the upper limit. In this way,
a signal quality dependent noise reduction is performed.
[0008] It is an advantage to use an envelope of the signal spectrum containing speech and
noise as a lower limit for weighting factors for the spectral noise signal, while
the envelope itself is weighted by predefined factors which stem from listening tests
with test persons. The envelope of the speech signal is used to avoid a too large
noise reduction where the speech signal has its energy. A frequency in the speech
signal with low energy will have lower weighting than a frequency with high energy.
Thus, the method leads to a signal strength dependent reduction of noise over frequency
and time.
[0009] It is an advantage that a method of acquiring a spectral noise signal is used either
during speech or when no speech is present. If the spectral noise signal is acquired
during speech, it will be an excellent spectral noise signal because it is in time
very near to the speech signal for which it is used for noise reduction. If the spectral
noise signal is estimated during a time interval without speech, the processing of
the spectral noise signal will be straight forward and very easy, since the noise
is not masked by the speech signal.
Drawing
[0010] Exemplary embodiments of the invention are shown in the figure and elucidated in
detail in the description below.
[0011] Figure 1 shows a block diagram of a speech signal processing, figure 2 shows a flow
chart of a noise reduction in speech signals, figure 3 shows a flow chart of weighting
a noise signal depending on the signal-to-noise ratio of a speech signal, figure 4
shows a flow chart of a method for reducing noise in a speech signal using upper and
lower limits for the spectral power of the noise, figure 5 shows an amplitude/frequency
diagram with a noise signal and two limits, figure 6 shows a flow chart of a noise
reduction method using stored noise types and figure 7 shows a flow chart of a noise
reduction method in speech signals using an envelope of the signal spectrum.
Description
[0012] When speaking to a phone the speech is degraded by background noise. The speech and
the background noise are converted by a microphone of the phone set into electrical
signals. To improve the speech quality for listener at the receiver, this background
noise has to be removed or at least to be considerably reduced. If a too large amplitude
from an amplitude of a speech signal is removed, the speech signal will exhibit unwanted
audible effects - so called musical tones - when reconverted to acoustic signals by
a loudspeaker.
[0013] The musical tones are more disturbing for listeners than noise, since the auditory
recognition system of an human being tries to find an interpretation of those musical
tones whereas noise is easily regarded by a listener as noise and, consequently, the
noise does not interfere with the recognition of the speech as long as the signal
strength of the noise is not too high.
[0014] In mobile communications, frequency bandwidth for transmission and reception of radio
signals is precious and service providers who operate a mobile communication network
want to put more and more users in their allocated bandwidths. Therefore, an effective
speech coding which reduces the necessary bandwidths for transmission considerably
but retains an excellent speech quality is of high importance to design a successful
transmitter for radio signals.
[0015] Speech coding removes therefore redundancy from the speech signal. The information
per bit is considerably increased. Noise can influence the speech coding procedure
and therefore what the speech information containing bit will look like. This can
lead to a poor audio quality. The reduction of noise before speech coding sets in
is a precondition for excellent speech quality at the receiver.
[0016] Noise is a stochastic signal and therefore cannot be predicted. There are peaks and
drops in the noise signal which one has to cope with if noise reduction and avoiding
musical tones are the goals. By transforming noise and a speech signal from a time
domain to a frequency domain, it is possible to identify whether a present signal
is speech with noise or only noise. Alternatively, this can be done in the time domain.
[0017] In Fig. 1, a block diagram of a speech signal processing for reducing noise is shown.
A microphone 50 with attached electronics is connected to a transforming unit 51 for
transforming signals from the time domain to a frequency domain generating a signal
spectrum. The microphone transduces acoustical waves into electrical signal, the attached
electronics amplifying and digitising the electrical signals. The digitised signals
are therefore fed into the transforming unit 51.
[0018] The signal spectrum is then fed from the transforming unit 51 into a decision unit
52 deciding whether the signal spectrum is noise or noise and speech signals. If it
is a noise signal, the noise signal is then transferred over a first output of the
decision unit 52 to a noise processor 53 weighting the noise signal generating a noise
function. If the signal spectrum consists of noise and speech signals then the signal
spectrum is transferred from the decision unit 52 to an adder 54 subtracting the noise
function coming from the noise processor 53 from the signal spectrum coming from the
decision unit 52 in order to generate speech signals with reduced noise. The signal
spectrum is always transferred to the adder 54, whereas the noise function is only
updated when only a noise signal is present in the signal spectrum.
[0019] These speech signals are then transformed from the frequency domain to the time domain
by a retransforming unit 55. At the output of the retransforming unit 55 the speech
signals with reduced noise in the time domain are ready for further processing.
[0020] The transforming unit 51, the decision unit 52, the noise processor 53, the adder
54 and the retransforming unit 55 are implemented on one processor as different software
programs or functions. Alternatively, more than one processor can be implemented to
perform the above mentioned tasks.
[0021] The noise processor 53 performs either one of the following algorithms for processing
the noise signal considering amplitudes of the signal spectrum after the invention:
a) amplitudes of the noise signal above a first predefined limit are set equal to
that limit, amplitudes of the noise signal below a second predefined limit are set
equal to that limit.
b) Amplitudes of the noise signal are multiplied with an envelope of the signal spectrum
if the signal spectrum contains speech signals.
c) Amplitudes of the noise signal are weighted according to a signal-to-noise ration
of the signal spectrum if the signal spectrum contains speech signals.
The weighting is performed for every sampled frequency.
[0022] In addition, a noise processing is performed using stored noise spectra by comparing
those stored noise spectra with the noise signal.
[0023] In Fig. 2, a flow chart of a method for noise reduction and reduction of the musical
tones of speech signals after the invention is shown. This noise reduction and all
other noise reduction methods are implemented on a processor.
Preferably, on a processor already present in a mobile phone. Thus, noise reduction
methods are implemented in software running on processors.
[0024] In step 1 of the noise reduction method, acoustical waves emitted by a speaker are
converted by a mobile phone into electrical signals using a microphone as a transducer.
The electrical signal are then amplified, filtered and digitised using a signal processing
unit connected to the microphone.
[0025] In step 2 of the noise reduction method, the electrical signals are transformed from
a time domain to a frequency domain in order to generate the signal spectrum.
[0026] In a mobile phone, a processor is placed performing the transformation of the digitised
electrical signals from the time domain into the frequency domain using a Fast Fourier
Transform (FFT). The FFT is a well-known algorithm for processors to perform the transformation
of the signals of the time domain into the frequency domain. This transformation consists
of a sampled signal in the time domain and of using the samples of the sampled signal
for a well-known equation to perform the transformation.
[0027] Alternatively, other transform techniques can be used. One widely known technique
is the use of wavelets. Wavelets are mathematical functions cutting up data into different
frequency components and then studying each component with a resolution matched to
its scale. Wavelets perform especially well on discontinuities and spikes in the signals
to be analysed.
[0028] In step 3, it is checked by the processor whether the signal spectrum represents
speech signals with noise or a noise signal. This is done using a Voice Activity Detector
(VAD) algorithm. The VAD algorithm is one of the GSM (Global System for Mobile Communication)
speech coders and it detects whether there is speech activity or not by comparing
a spectral power density of the signal spectrum with predefined values. Does the spectral
power density exceed those predefined values, the VAD decides that speech is present.
Alternatively, a similar algorithm can be implemented on a separate processor which
is useful for distinguishing between speech and background noise because a processor
is fully dedicated to voice activity detection.
[0029] Apart from detecting noise in speech interruptions, it is possible to detect noise
during speech. This is explained below.
[0030] If there is no speech activity, then a background noise signal is updated in predefined
time intervals, for example 480 ms. This saves bandwidth for other mobile phones to
communicate. GSM is a widely used standard for digital cellular mobile communications.
[0031] If the signal spectrum is a noise signal, then in step 10 the noise signal will be
processed using predefined factors stored in the mobile phone. These factors have
been determined using the knowledge on a human auditory reception system. Alternatively,
this step can be omitted.
[0032] If the signal spectrum is a speech signal, then, in step 5, a signal-to-noise ratio
of the speech signal depending on the frequency is calculated. For this, the noise
signal is subtracted from the speech signal and then a resulting difference is divided
by the noise signal. This is done for certain frequencies in the acquired spectrum
of the speech signal. The number of those frequencies determines the accuracy and
the complexity of the noise reduction method.
[0033] In step 4 after having performed step 10, the noise signal is weighted to generate
a noise function by a function depending on the signal-to-noise ratio of the speech
signal using thereby the result of step 5. Thus, at a frequency for which the signal-to-noise
ratio is determined the noise signal is weighted. Therefore, according to this solution,
the weighting of the noise signal is only possible if speech signals are present.
[0034] In Fig. 3, this method of weighting the noise signal is shown. In step 13, the method
is started. In step 14, the signal-to-noise ratio of the speech signals is compared
with a first predefined limit. If the signal-to-noise ratio is higher than the first
predefined limit, then, in step 15, the spectral noise signal is set to zero, since
the speech signals are due to a very high signal-to-noise ratio of an excellent quality,
so that an improvement by reducing noise will not lead to an audible improvement for
the listener.
[0035] If the signal-to-noise ratio of the speech signals is below the first predefined
limit, then, in step 16, the signal-to-noise ratio of the speech signals is compared
to a second predefined limit. If the signal-to-noise ratio of the speech signals is
above the second predefined limit, then in step 17 the weighting of the noise signal
is performed after a predefined function depending on the signal-to-noise ratio.
[0036] If the signal-to-noise ratio of the speech signals is below the second predefined
limit, then, in step 18, the signal-to-noise ratio of the speech signals is compared
to a third predefined limit. If the speech signals are above the third predefined
limit, then, in step 19, the noise signal is weighted by a constant weighting factor.
The predefined function of step 17 connects the zero-value weighting appearing in
step 15 with this constant weighting factor in step 19.
[0037] If the signal-to-noise ratio of the speech signal is below or equal the third predefined
limit, then, in step 20, the signal-to-noise ratio of the speech signals is compared
to a fourth predefined limit. If the signal-to-noise ratio of the speech signals is
above this fourth predefined limit, then, in step 21, the noise signal is weighted
with a predefined function.
[0038] If the signal-to-noise ratio of the speech signals is equal or below the fourth predefined
limit, then, in step 22, the weighting of the noise signal for the frequencies where
the signal-to-noise ratio is equal or below the fourth predefined limit is set to
zero. This is done because if the signal-to-noise ratio is that low it is already
so noisy that a reduction of the noise would not lead to an improvement but it would
introduce unwanted audible effects. After step 15, 17, 19, 21 and 22 this method ends
in step 23. The predefined function applied in step 21 for weighting the noise signal
connects linearly the constant weighting factor and zero. Alternatively, parabolic
or exponential functions can be implemented for these predefined functions. The predefined
limits are set according to listening tests.
[0039] In step 6, the weighted noise signal, that is the noise function, is stored in a
processor. In step 7, the noise function is subtracted from the signal spectrum of
the original speech signals. These are the speech signals before the unweighted noise
signal was subtracted for calculating the signal-to-noise ratio.
[0040] In step 8, the spectrum of the speech signal with reduced noise is transformed from
the frequency domain into the time domain using inverse FFT. In step 9, speech coding
on the speech signal with reduced noise in the time domain is performed.
[0041] In Fig. 4, another method of weighting the noise signal is presented. In step 24,
a speech signal or noise are converted to an electrical signal using a microphone.
[0042] Attached electronics to the microphone amplifies and digitises the electrical signal.
[0043] In step 25, the electrical signal is transformed form the time domain to the frequency
domain using FFT generating the signal spectrum. In step 26, it is detected using
VAD whether speech signals or a noise signal are present. If a noise signal is present,
in step 28, the noise signal is weighted.
[0044] In Fig. 5, the weighting of the spectral noise signal with an amplitude s as a function
of the frequency f is shown. To avoid very high amplitudes of the noise signal, a
limit 11 for the high amplitudes is set. Is an amplitude equal or above the limit
11, then it is set to this limit 11. This avoids that too much noise is removed from
the speech signal and thereby the appearance of musical tones. An exemplary signal
is added to Fig. 4 to show amplitudes of that signal exceeding the limits in both
directions.
[0045] In addition, an optionally lower limit 12 is also added in this method, so that very
low noise amplitudes in the spectrum of the noise which are equal or under the limit
12 are set to the limit 12. As stated previously, background noise does not necessarily
disturb a listener and it gives the listener an impression that a connection between
him and a speaker is still in existence if only noise is transmitted. Here, the noise
processed with the upper and the lower limit is transmitted. For this method, the
upper limit 11 must be included whereas the lower limit can be included.
[0046] In step 29, the weighted noise signal is stored, so that when a speech signal is
present, the weighted noise signal is subtracted from the speech signal to generate
a speech signal with reduced noise in step 30. In step 31, the speech signal with
reduced noise is transformed from the frequency domain to the time domain using inverse
FFT. In step 32, speech coding on the speech signal with reduced noise in the time
domain is performed.
[0047] In Fig. 6, another method for reducing noise in a speech signal is presented in a
flow chart. In step 33, a speech signal or noise are converted to an electrical signal
using a microphone. Attached electronics to the microphone amplifies and digitises
the electrical signal.
[0048] In step 34, the electrical signal is transformed form the time domain to the frequency
domain using FFT. In step 35, it is detected using VAD whether a speech signal or
a noise signal is present. If a spectral noise signal is present, that means no speech
signal, in step 36, the noise signal is weighted.
[0049] The weighting of the noise signal is to analyse the noise spectrum by using stored
noise spectra. If the noise type is detected, for example noise in a car, noise in
an environment where many people are speaking or noise in a street, then the measured
noise can be weighted according to its noise type optimising both the noise reduction
and the reduction of artefacts (musical tones).
[0050] In step 37, the weighted noise signal is stored, so that when a speech signal is
present, the weighted noise signal is subtracted from the speech signal to generate
a speech signal with reduced noise in step 38. In step 39, the speech signal with
reduced noise is transformed from the frequency domain to the time domain using inverse
FFT. In step 40, speech coding on the speech signal with reduced noise in the time
domain is performed.
[0051] In Fig. 7, another method for reducing noise in a speech signal is presented in a
flow chart. In step 41, a speech signal or noise are converted to an electrical signal
using a microphone. Attached electronics to the microphone amplifies and digitises
the electrical signal.
[0052] In step 42, the electrical signal is transformed form the time domain to the frequency
domain using FFT. In step 43, it is detected using VAD whether a speech signal or
a noise signal is present. If a spectral noise signal is present, that means no speech
signal, in step 43, the noise signal is weighted with an envelope of the speech signal.
Thus, the weighting occurs only when a speech signal is present. The envelope of the
speech signal is calculated using FFT, also in step 42. It is practically the same
as the speech signal, so the weighting is performed using the speech signal.
[0053] The envelope is used to avoid a noise reduction of the speech signal which would
lead to unwanted audible effects that means too much noise reduction is avoided. Especially
at those frequencies where most of the speech signal energy is located. In addition,
the envelope of the speech signal is weighted by factors that are stored already in
the processor. These factors have been found by using listening tests.
[0054] In step 45, the weighted noise signal is stored, so that the weighted noise signal
is subtracted from the present speech signal to generate a speech signal with reduced
noise in step 46. In step 47, the speech signal with reduced noise is transformed
from the frequency domain to the time domain using inverse FFT. In step 48, speech
coding on the speech signal with reduced noise in the time domain is performed.
[0055] Apart from speech coding, the invention is usable for other applications. Speech
recognition demands a signal with good signal-to-noise ratio for a proper recognition
of the speech. Thus, a speech recognition system would considerably benefit from using
the invention.
[0056] Another application is audio recording either for only audio reproduction or in combination
with video recording.
Especially for live recordings suffering heavily from background noise, an improved
noise reduction algorithm focussing on improving speech and/or music quality benefits
from the invention.
[0057] The noise spectrum is either acquired when no speech signal is present or during
a speech signal. The first solution is straight forward, since the background noise
is transformed from the time domain to the frequency domain to generate the noise
spectrum.
[0058] To acquire the noise spectrum during the speech signal, the speech signal with the
noise is transformed from the time domain to the frequency domain and then by analysing
the speech signal, the speech signal itself is modelled using the processor and then
this model of the speech signal is subtracted from the transformed speech signal to
generate a noise spectrum as the difference between the measured speech signal and
the model. To generate this model of the speech signal, the processor uses stored
knowledge on the spoken language in order to estimate what was said.
1. Method for processing speech signals by subtracting a noise function, whereby the
noise function is determined by measuring a signal containing noise or noise and speech,
transforming the measured signal from-a time domain to a frequency domain to generate
a signal spectrum, deriving a noise signal from the signal spectrum, characterised
in that the measured noise signal is weighted by multiplying a function considering
amplitudes of the signal spectrum in order to generate the noise function.
2. Method according to claim 1 wherein the signal spectrum is acquired in speech interruptions.
3. Method according to claim 1 wherein the signal spectrum is acquired during speech
by modelling the speech signals and thereby identifying the noise signal.
4. Method according to claim 2 or 3 wherein amplitudes of the measured noise signal being
above a predefined upper limit are set equal to the upper limit and that amplitudes
of the measured noise signal being below a lower limit are set equal to the lower
limit.
5. Method according to claim 2 or 3 wherein the measured noise signal is weighted by
multiplying the amplitudes of the measured noise signal with an envelope of the signal
spectrum.
6. Method according to claim 2 or 3 wherein the measured noise signal is weighted by
multiplying a function considering amplitudes of a signal-to-noise ratio of the signal
spectrum.
7. Method according to claim 4 wherein the measured noise signal is multiplied with zero
if the signal-to-noise ratio of the signal spectrum is above a first predefined limit,
that the measured noise signal is multiplied with a first weighting function if the
signal-to-noise ratio of the signal spectrum is between the first predefined limit
and a second predefined limit, that the measured noise function is multiplied with
a constant factor if the signal-to-noise ratio of the signal spectrum is between the
second predefined limit and a third predefined limit, that the measured noise signal
is multiplied with a second weighting function if the signal-to-noise ratio of the
signal spectrum is between the third predefined limit and a fourth predefined limit
and that the measured noise function is multiplied with zero if the signal-to-noise
ratio of the signal spectrum is below the fourth predefined limit.
8. Method for processing speech signals by subtracting a noise function, whereby the
noise function is determined by measuring a signal containing noise or noise and speech,
transforming the measured signal from a time domain to a frequency domain generating
a signal spectrum, deriving a noise signal from the signal spectrum, chracterised
in that the noise signal is compared with stored noise spectra in order to determine
a fitting noise spectrum being used as the noise function.
9. Method according to claim 1 wherein the signal spectrum containing noise is acquired
in speech interruptions.
10. Method according to claim 1 wherein the signal spectrum containing speech and noise
is acquired during speech by modelling the speech and thereby identifying the noise.