[0001] This invention relates to a noise suppressor for use in suppressing a noise signal
from a speech signal.
[0002] As a rule, a speech signal is subjected to pre-processing before the speech signal
is encoded into a sequence of encoded signals. For example, such pre-processing has
been made to judge either a speech duration or a non-speech duration, in an article
which is contributed by J.F. Lynch, Jr. et al to IEEE and which is entitled "SPEECH/SILENCE
SEGMENTATION FOR REAL-TIME CODING VIA RULE BASED ADAPTIVE ENDPOINT DETECTION" (Proceedings
ICASSP, pages 1348-1351, 1987). In the article, description is made only about detection
between the speech duration and the non-speech duration but is not made about suppressing
a noise signal from the speech signal during the pre-processing. In other words, Lynch
et al never consider about pre-processing which suppresses the noise signal from the
speech signal. Practically, even when the pre-processing described in the article
is used for suppressing the noise signal from the speech signal, it is difficult to
suppress the noise signal, namely, a non-speech signal within the speech duration.
[0003] On the other hand, spectrum subtraction has been proposed to remove a noise component
from the speech signal in Japanese Unexamined Patent Publication No. Hei 2-278298,
namely, 278298/1990. Thereafter, the speech signal is encoded into a sequence of encoded
signals. With this method, only a noise spectrum which results from the noise component
is subtracted or removed from a spectrum including the noise spectrum and produced
as a noise-subtracted speech signal. Thus, the noise-subtracted speech signal might
be free from the noise component on the spectrum.
[0004] However, it is to be noted that speech encoding is usually carried out in connection
not only with the spectrum but also with a phase component of the speech signal. This
shows that a noise component can not be removed which is included in the phase component
in the above-mentioned method.
[0005] Therefore, the spectrum subtraction is disadvantageous in that the noise component
can not be completely suppressed from the speech signal.
[0006] Moreover, the spectrum subtraction can not be applied on post-processing which is
carried out after the encoded signal sequence is decoded into a sequence of decoded
signals.
[0007] At any rate, no consideration is made at all about suppressing a noise component
on post-processing, despite that noise suppression is necessary after decoding.
[0008] It is an object of this invention to provide a noise suppressor which is capable
of completely suppressing a noise component or signal from a speech signal.
[0009] It is another object of this invention to provide a noise suppressor of the type
described, which can be used either on pre-processing or on post-processing of the
speech signal.
[0010] It is still another object of this invention to provide a noise processor of the
type described, which can suppress the noise signal not only within a speech duration
but also within a non-speech duration.
[0011] According to an aspect of this invention, a noise suppressor is operable to carry
out pre-processing in relation to a speech signal. Specifically, the noise suppressor
is supplied with an internal input signal which includes both the speech signal and
a noise signal to produce an output signal substantially free from the noise signal.
The speech signal is specified by a sound source. The noise suppressor comprises feature
parameter calculating means supplied with said internal input signal for calculating
a feature parameter specifying a feature of the speech signal to produce a feature
parameter signal representative of said feature parameter and noise suppressing means
coupled to the feature parameter calculating means for suppressing the noise signal
from the internal input signal to produce the output signal. The noise suppressing
means comprises filter means supplied with the feature parameter signal and the internal
input signal for filtering the internal input signal to produce a filtered signal
which is dependent on the feature parameter and which specifies the sound source,
a suppression unit coupled to the filter means for suppressing the noise signal from
the filtered signal by estimating the noise signal to produce a noise-suppressed signal,
output means for producing the noise-suppressed signal as the output signal.
[0012] According to another aspect of this invention, a noise suppressor is operable to
carry out post-processing of a speech signal and responds to a feature parameter signal
specifying the speech signal and a sound source signal representative of a sound source
of the speech signal to suppress a noise signal from the speech signal and to produce
an output signal substantially free from the noise signal. The speech signal is divisible
into a speech duration and a non-speech duration. The noise suppressor comprises a
noise suppressing circuit for suppressing the noise signal from the sound source signal
with reference to the feature parameter signal to produce a noise-suppressed signal
and means for producing the noise-suppressed signal as the output signal.
Fig. 1 is a block diagram of a noise suppressor according to a first embodiment of
this invention;
Fig. 2 is a block diagram for use in describing a part of the noise suppressor illustrated
in Fig. 1;
Fig. 3 is a block diagram of a noise suppressor according to a second embodiment of
this invention; and
Fig. 4 is a block diagram for use in describing a part of the noise suppressor illustrated
in Fig. 3.
[0013] Description will be at first made as regards a principle of this invention so as
to facilitate an understanding of this invention. Herein, it is assumed that a speech
signal is given in the form of a sequence of digital speech signals to be subjected
to pre-processing and post-processing to suppress a noise signal from the speech signal.
In addition, the pre-processing is carried out in response to an input signal specified
by the digital speech signal sequence which is not encoded yet while the post-processing
is carried out in response to an input signal specified by the digital speech signal
sequence which is already decoded. Therefore, it is noted that the terms "digital
speech signal sequence" and "input signal" may be used in two different meanings hereinunder
so as to include both the pre-processing and the post-processing.
[0014] At any rate, the input signal includes the speech signal (namely, the digital speech
signal sequence) and the noise signal and may be therefore considered as a combination
of the digital speech signal sequence and the noise signal.
[0015] According to this invention, feature parameters are extracted from the input signal
and may be, for example, selected one or ones of spectrum parameters representative
of features of a spectrum in the input signal, pitch prediction gains representative
of periodicity of the input signal, and the like. The feature parameters are used
to determine either a speech duration or a non-speech duration by comparing the feature
parameters with a threshold level.
[0016] Briefly, a preliminary sound source signal which specifies a sound source is obtained
by the use of the input signal and the feature parameters on the pre-processing and
the post-processing. Specifically, the preliminary sound source signal appears in
the form of an error signal which is produced on the pre-processing by allowing the
input signal to pass through an inverse filter controlled by the feature parameters.
[0017] On the other hand, the preliminary sound source signal appears in the form of a decoder
output signal or a sequence of decoded signals which is decoded by the use of the
feature parameters.
[0018] Since the speech signal has an amplitude greater than the noise signal in the preliminary
sound source signal, it is possible to suppress the noise signal alone by comparing
an amplitude of the preliminary sound source signal with a predetermined threshold
level and to therefore attain a noise-suppressed signal. The noise-suppressed signal
is reproduced by the use of the feature parameters into a noise-free output signal
on the pre-processing or is produced as a noise-free decoded signal on the post-processing.
The noise-free output signal may be encoded by an encoder after the pre-processing
while the noise-free decoded signal may be converted into an audio signal after the
post-processing.
[0019] Noise suppression may be carried out only within a selected one of the speech duration
or the non-speech duration or within both the speech duration and the non-speech duration.
Thus, this invention enables to suppress the noise signal on a waveform by the use
of the feature parameters and is applicable to both the pre-processing and the post-processing.
[0020] Referring to Fig. 1, a noise suppressor according to a first embodiment of this invention
is applicable to the pre-processing and is therefore supplied through an input terminal
10 with an input signal IN which includes a speech signal and a noise signal superposed
on the speech signal. As mentioned before, the speech signal is given in the form
of a sequence of digital speech signals. The input signal IN is given to a frame division
circuit 11 and is divided by the frame division circuit 11 into a plurality of frames
each of which has a length of, for example, 40 milliseconds. Each frame is further
subdivided by a subframe division circuit 12 into a plurality of subframes each of
which has a length of, for example, eight milliseconds.
[0021] The input signal IN is divided into the subframes, as mentioned above, and is sent
in the form of a divided input signal sequence x(n) either at every frame or at every
subframe to a feature parameter calculator 15 on one hand and to a noise suppression
circuit 20 on the other hand. Herein, the divided input signal sequence x(n) may be
referred to as an internal input signal.
[0022] In the illustrated example, the feature parameter calculator 15 is supplied with
the internal input signal x(n) at every subframe. The feature parameter calculator
15 at first places a window to extract a piece of the internal input signal x(n) in
relation to each subframe. The window is longer than each subframe length and may
be, for example, 24 milliseconds.
[0023] Thereafter, the feature parameter calculator 15 calculates, as feature parameters,
spectrum parameters indicative of features of a spectrum in the input signal, pitch
prediction gains indicative of periodicity of the speech signal, and an average amplitude
in each subframe. In this event, average power may be calculated in the feature parameter
calculator 15. Such calculations of the feature parameters are known in the art and
will not be described any longer. In any event, the feature parameters are produced
as feature parameter signals from the feature parameter calculator 15.
[0024] Herein, it is to be noted that the feature parameter calculator 15 shown in Fig.
1 calculates the spectrum parameters of a predetermined order which may be, for example,
a tenth order. In addition, the following description will be made on the assumption
that linear prediction coefficients a
i are used as the spectrum parameters. Although such linear prediction coefficients
are calculated by using a well-known LPC analysis, Burg analysis, or the like, it
is assumed in connection with the illustrated example that the Burg analysis is used
to calculate the linear prediction coefficients. The Burg analysis is described in
detail in a book (pages 82 to 87) which is written by Nakamizo et al and which is
titled "Signal Analysis and System Identification" published by Corona Company Ltd,
Tokyo, in 1988. Accordingly, description will be omitted from the instant specification
as regards the Burg analysis.
[0025] Alternatively, the linear prediction coefficients may be also calculated by the use
of a covariance method or a correlation method.
[0026] As mentioned before, the pitch prediction gains are also calculated in the feature
parameter calculator 15. The pitch prediction gains are represented by P
g and are given by:

where T is a delay time representative of a pitch period; n, a sample number; and
N, a maximum sample number.
[0027] Instead of Equation (1), the pitch prediction gains P
g can be simply calculated by the use of the following equation:

The average amplitude is represented by R and is given by:

Herein, it is readily possible to implement circuits for calculating the above-mentioned
linear prediction coefficients, the pitch prediction gains P
g, and the average amplitude R by a combination of conventional circuit elements. Accordingly,
specific circuits for calculating the linear prediction coefficients, the pitch prediction
gains P
g, and the average amplitude will not be described later.
[0028] Thus, the feature parameter calculator 15 supplies a speech detection circuit 25
and the noise suppression circuit 20 with the feature parameter signals representative
of the feature parameters, as mentioned above. In the illustrated example, the speech
detection circuit 25 detects or determines either the speech duration or the non-speech
duration of the speech signal in response to at least one of the feature parameters.
To this end, a wide variety of methods can be applied to determine the speech duration
or the non-speech duration. For example, the illustrated speech detection circuit
25 at first smooths the pitch prediction gains P
g and the average amplitude R to obtain smoothed pitch prediction gains P
g' and a smoothed average amplitude R' and thereafter compares the smoothed pitch prediction
gains P
g' and the smoothed average amplitude R' with first and second threshold values Th1
and Th2, respectively.
[0029] The above-mentioned smoothing operation of the pitch prediction gains P
g and the average amplitude R is carried out in accordance with the following equation:
where P is representative of the pitch prediction gains or the average amplitude to
be smoothed; δ is representative of a time constant for smoothing and takes a value
between 0 and 1, both exclusive; and P'
j and P'
j-1 are representative of smoothed values at time instants j and j-1.
[0030] As a result of comparison, when the smoothed pitch prediction gains P
g' and the smoothed average amplitude R' are lower than the first and the second threshold
values Th1 and Th2, respectively, the speech detection circuit 25 judges that the
non-speech duration lasts in the internal input signal x(n). Otherwise, the speech
detection circuit 25 judges that the speech duration lasts in the internal input signal
x(n). Thus, the non-speech and the speech durations are detected by the speech detection
circuit 25. In the example, the first and the second threshold values Th1 and Th2
may be invariable or variable with time.
[0031] As mentioned before, the speech detection circuit 25 comprises a calculation circuit
for calculating the smoothed values (namely, the smoothed pitch prediction gains P
g' and the smoothed average amplitude R') in accordance with Equation 4 and a comparator
unit for comparing the smoothed values with the first and the second threshold values
Th1 and Th2. As a result, the illustrated speech detection circuit 25 can produce
the smoothed average amplitude R' at every frame or at every subframe and a detection
signal DT representative of either the speech or the non-speech duration at every
frame or at every subframe.
[0032] The smoothed average amplitude R' is delivered to a memory circuit 30 while the detection
signal DT is sent to the noise suppression circuit 20.
[0033] Referring to Fig. 2 in addition to Fig. 1, the noise suppression circuit 20 is operable
to suppress the noise signal within at least one of the speech and the non-speech
durations. In Fig. 2, the noise suppression circuit 20 comprises an inverse filter
201 supplied with the internal input signal x(n) from the input terminal 10 through
the frame and the subframe division circuits 11 and 12. The feature parameters a
i are also supplied from the feature parameter calculator 15 to the inverse filter
201. The inverse filter 201 carries out an inverse filtering operation to produce
an inverse-filtered signal e(n) which may be called a preliminary sound source signal
because the inverse-filtered signal e(n) specifies a sound source. Herein, the inverse-filtered
signal e(n) is given by:

where P represents an order of the inverse filter 201. Thus, the inverse-filtered
signal e(n) is dependent on the feature parameters and specifies the sound source.
[0034] The inverse-filtered signal e(n) includes a speech signal component and a noise signal
component superposed on the speech signal component and appears in the form of a continuous
signal. The inverse filter 201 may be simply called a filter circuit.
[0035] Now, it is to be noted that the inverse-filtered signal e(n) is specified by a comparatively
large amplitude pulse within a portion of the speech signal component appearing in
the speech duration because the speech signal has a pitch. On the other hand, the
inverse-filtered signal e(n) exhibits a comparatively small amplitude within a portion
of the noise signal.
[0036] Accordingly, it is possible to suppress the noise signal by comparing the inverse-filtered
signal e(n) with a threshold level TH1.
[0037] More specifically, the noise suppression circuit 20 illustrated in Fig. 2 comprises
a threshold value calculation circuit 202 supplied with the smoothed average amplitude
R' which is calculated by the feature parameter calculator 15 in accordance with Equation
4 and which is memorized into the memory circuit 30. The threshold value calculation
circuit 202 calculates the threshold value TH1 given by:
to produce a threshold value signal representative of the threshold value TH1, where
K2 is greater than zero. Thus, the threhold value TH1 is determined by the average
amplitude R memorized in the memory circuit 30.
[0038] The inverse-filtered signal e(n) and the threshold value signal are sent to a suppressor
unit 203 which is also given the detection signal DT from the speech detection circuit
25. The suppressor unit 203 is put into an active state or into an inactive state
in response to the detection signal DT. In this event, the suppressor unit 203 may
suppress the noise signal within at least one of the speech duration and the non-speech
duration. In the illustrated example, it is assumed that the suppressor unit 203 is
put into the active state within the non-speech duration in response to the detection
signal DT, although the suppressor unit 203 may be put into the active state within
the speech duration.
[0039] In addition, the suppressor unit 203 compares the inverse-filtered signal e(n) with
the threshold value signal. The suppressor unit 203 attenuates the inverse-filtered
signal e(n) by a predetermined amount or renders the inverse-filtered signal e(n)
into zero when the inverse-filtered signal e(n) is smaller than the threshold value
TH1. As a result, the suppressor unit 203 produces a noise-suppressed signal e' specified
by:

where K is greater than zero and smaller than unity.
[0040] At any rate, a combination of the threshold value calculation circuit 202 and the
suppressor unit 203 serves to suppress the noise signal included in the inverse-filtered
signal e(n) and to produce the noise-suppressed signal e'(n) and may be collectively
called a noise suppression portion.
[0041] The noise-suppressed signal e'(n) is sent to a reproduction circuit 204 together
with the feature parameters a
i. The reproduction circuit 204 reproduces the noise-suppressed signal e'(n) into a
noise-suppressed speech signal x'(n) with reference to the feature parameters ai.
In this event, the noise-suppressed speech signal x' is given by:

[0042] The noise-supressed speech signal x'(n) is delivered through an output terminal 35
of the noise suppression circuit 20 to an encoder (not shown) to be encoded. Thus,
the noise-suppressed speech signal x'(n) is produced during the pre-processing prior
to the encoding. Since the noise-suppression is carried out with reference to the
feature parameters of the input signal IN, a phase component of the noise signal can
also be suppressed in the above-mentioned example.
[0043] Referring to Fig. 3, a noise suppressor (depicted at 40) according to a second embodiment
of this invention is operable to carry out post-processing after decoding. To this
end, the illustrated noise processor 40 is connected to a decoder 45 which is supplied
as a decoder input signal or an input signal DIN with feature parameters of a speech
signal and an index signal related to a sound source. The decoder 45 itself may be
similar to that known in the art and produces a sequence of decoded sound source signals
v(n) representative of a sound source together with the feature parameters and the
index signal, in a known manner. The decoded sound source signal sequence v(n) and
the feature parameters and the index signal are sent to the noise suppressor 40.
[0044] In the noise suppressor 40, the decoded sound source signal sequence v(n) is given
to a noise suppression circuit which is depicted at 50 and which is operable in a
manner to be described later in detail. Furthermore, the illustrated noise suppressor
40 comprises a speech detection circuit 25' and a memory circuit 30' which may be
similar to those illustrated in Fig. 1, respectively. From this fact, it is readily
understood that the speech detection circuit 30' is operated in response to the feature
parameters, such as the spectrum parameters, the pitch prediction gains P
g, and the average amplitude R, to detect either the speech duration or the non-speech
duration. Thus, the speech detection circuit 30' supplies the noise suppression circuit
50 with a detection signal DT' indicative of either the speech duration or the non-speech
duration. Like in Fig. 1, the speech detection circuit 25' calculates the smoothed
average amplitude R' which is stored in the memory circuit 30'.
[0045] Referring to Fig. 4 together with Fig. 3, the noise suppression circuit 50 comprises
a threshold calculator 501 supplied with the smoothed average amplitude R' to calculate
a threshold value signal representative of a threshold value TH2, like in the threshold
value calculation circuit 202. The threshold value signal is given to the suppressor
unit 502 together with the detection signal DT'.
[0046] The suppressor unit 502 is put into an active state within at least one of the speech
and the non-speech durations. Herein, it is assumed that the illustrated suppressor
unit 502 becomes active only within the non-speech duration, like in the suppressor
unit 203. In any event, the suppressor unit 502 produces a sequence of noise-suppressed
sound source signals v'(n) given by:

where K is identical with K shown in Equation 7. The threshold value TH2 may be equal
to that of Equation 7.
[0047] Turning back to Fig. 3, the noise-suppressed sound source signals v'(n) are sent
to a speech reproducing circuit 52 which is supplied with the feature parameters from
the decoder 45. The speech reproducing circuit 52 reproduces the noise-suppressed
sound signals into a reproduced speech signal with reference to the feature parameters
in a known manner. The reproduced speech signal is delivered to a loudspeaker or the
like.
[0048] Thus, the noise suppressor according to this invention can be used in post-processing
the decoded sound source signals DIN in the above-mentioned manner.
[0049] While this invention has thus far been described in conjunction with a few embodiments
thereof, it will readily be possible for those skilled in the art to put this invention
into practice in various other manners. For example, the feature parameters may not
be always restricted to the linear prediction coefficients but may be any other parameters
known in the art. In addition, it is possible to use any other parameters than the
average amplitude, and the pitch prediction gains. The speech detection circuit 25
or 25' may be operated in a manner different from that illustrated in Figs. 1 and
3.
[0050] Moreover, the post-processing can be carried out to suppress the noise signal even
when the feature parameters are not transmitted from a transmitter and are not received
by the decoder 45 (Fig. 3). In this case, the speech signal is once reproduced by
a receiver to form a reproduced speech waveform and to thereafter calculate feature
parameters from the reproduced speech waveform in the manner mentioned in conjunction
with Fig. 1. Thus, the calculated feature parameters can be used to suppress the noise
signal in the above-mentioned manner.
[0051] With this structure, the noise suppression is possible during both the pre-processing
and the post-processing of the speech signal. Moreover, it is also possible to suppress
not only the noise signal appearing within the non-speech duration but also a non-speech
signal superposed on the speech signal appearing within the speech duration. Such
suppression can be accomplished on the waveform.
1. A noise suppressor supplied with an internal input signal which includes both a speech
signal and a noise signal to produce an output signal substantially free from said
noise signal, said speech signal being specified by a sound source, said noise suppressor
comprising feature parameter calculating means supplied with said internal input signal
for calculating a feature parameter specifying a feature of said speech signal to
produce a feature parameter signal representative of said feature parameter, and noise
suppressing means coupled to said feature parameter calculating means for suppressing
said noise signal from said internal input signal to produce said output signal, characterized
in that said noise suppressing means comprises:
filter means supplied with said feature parameter signal and said internal input
signal for filtering said internal input signal to produce a filtered signal which
is dependent on said feature parameter and which specifies said sound source;
a suppression unit coupled to said filter means for suppressing the noise signal
from said filtered signal by estimating said noise signal to produce a noise-suppressed
signal; and
output means for producing said noise-suppressed signal as said output signal.
2. A noise suppressor as claimed in Claim 1, said speech signal being divisible into
a speech duration and a non-speech duration, characterized in that said noise suppressor
further comprises:
speech detection means coupled to said feature parameter calculating means for
detecting said speech and said non-speech durations in response to the feature parameter
signal to produce a detection signal representative of either one of said speech and
said non-speech durations;
average calculation means coupled to said speech detection means for calculating
an average value of either power or an amplitude within said non-speech duration to
produce an average signal representative of said average value;
said noise suppressing means further comprising:
threshold level calculating means for calculating a threshold level from said average
signal to supply said suppression unit with a threshold level signal representative
of said threshold level, to make said suppression unit compare said filtered signal
with said threshold level signal, and to make said suppression unit suppress said
noise signal.
3. A noise suppressor as claimed in Claim 2, characterized in that said suppression unit
is further supplied with said detection signal to be put into an active state within
at least one of said speech and said non-speech durations.
4. A noise suppressor as claimed in Claim 1,2 or 3, characterized in that said feature
parameter calculating means calculates, as said feature parameter, spectrum parameters
representative of a spectrum of said internal input signal, a pitch period of said
internal input signal, and an average amplitude of said internal input signal.
5. A noise suppressor supplied with an internal input signal which includes both a speech
signal and a noise signal to produce an output signal substantially free from said
noise signal, said internal input signal being divided into a sequence of frames each
of which lasts for a predetermined interval of time, said speech signal being generated
by a sound source and having a spectrum specified by at least one feature parameter
and being divisible into a speech duration and a non-speech duration, said noise suppressor
comprising feature parameter calculating means for calculating said at least one feature
parameter to produce a feature parameter signal representative of said at least one
feature parameter and speech detection means coupled to said feature parameter calculating
means for detecting said speech and said non-speech durations in response to the feature
parameter signal to produce a detection signal representative of either one of said
speech and said non-speech durations, characterized by:
average memory means coupled to said speech detection means for memorizing an average
value of either one of power and an amplitude of said internal input signal within
said non-speech duration to produce an average signal representative of said average
value;
noise suppressing means coupled to said feature parameter calculating means, said
speech detection means, and said average calculating means for suppressing said noise
signal with reference to said internal input signal, said feature parameter signal,
said detection signal, said average signal, and said internal input signal to produce
said output signal,
said noise suppressing means comprising:
filter means for filtering said internal input signal into a filtered signal which
specifies said sound source;
suppressing means supplied with said filtered signal, said detection signal, and
said average signal for suppressing said noise signal from said filtered signal to
produce a noise-suppressed signal; and
means for producing said noise-suppressed signal as said output signal.
6. A noise suppressor operable in response to a feature parameter signal specifying a
speech signal and a sound source signal representative of a sound source of said speech
signal to suppress a noise signal from said speech signal and to produce an output
signal substantially free from said noise signal, said speech signal being divisible
into a speech duration and a non-speech duration, characterized in that said noise
suppressor comprises:
a noise suppressing circuit for suppressing said noise signal from said sound source
signal with reference to said feature parameter signal to produce a noise-suppressed
signal; and
means for producing said noise-suppressed signal as said output signal.
7. A noise suppressor as claimed in Claim 6, characterized by:
speech detection means supplied with said feature parameter signals for detecting
said speech and said non-speech durations to produce a detection signal representative
of either one of said speech and said non-speech durations; and
average memory means coupled to said speech detection means for memorizing an average
value of either one of power and an amplitude of said speech signal within said non-speech
duration to produce an average signal representative of said average value;
said noise suppressing circuit suppressing said noise signal with reference to
said average signal also.