Technical Field
[0001] The present invention relates to a low bit rate speech signal coder used in applications
such as a mobile communication system that transmit coded speech signals and speech
recorder.
Background Art
[0002] In the fields of digital mobile communications and speech storage, speech coders
are used which perform coding on speech information at a low bit rate for effective
utilization of radio frequency and recording media. Such conventional technologies
include the CS-ACELP coding system of the ITU-T Recommendation G.729 ("Coding of speech
at 8kbit/s using conjugate-structure algebraic-code-excited linear-prediction(CS-ACELP)")
and the CS-ACELP coding system with DTX (Discontinuous Transmission) control of the
same ITU-T Recommendation G.729 Annex B ("A silence compression scheme for G.729 optimized
for terminals conforming to Recommendation V.70").
[0003] FIG.1 is a block diagram showing a configuration of a coder based on the conventional
CS-ACELP coding system. In FIG.1, LPC analyzer/quantizer 1 performs an LPC (linear
prediction) analysis and quantization on an input speech signal and outputs LPC coefficients
and LPC quantization codes.
[0004] Then, an adaptive excitation signal and fixed excitation signal extracted from adaptive
excitation codebook 2 and fixed excitation codebook 3 are multiplied by a gain extracted
from gain codebook 4 and added up, subjected to a speech synthesis by LPC synthesis
filter 7, and an error signal between the synthesized signal and the input signal
is weighted by perceptual weighting filter 9, and an adaptive excitation code, fixed
excitation code and gain code which minimize the weighted error signal are output
together with the LPC quantization code as coded data. In FIG.1, reference numeral
5 denotes a multiplier, reference numeral 6 denotes an adder and reference numeral
8 denotes a subtractor.
[0005] FIG.2 is a block diagram showing a configuration of a coder based on the conventional
CS-ACELP coding system with DTX control. First, speech/non-speech decision section
11 decides whether the input signal is in a speech segment or non-speech segment speech
segment (segment with only background noise). In the case where speech/non-speech
decision section 11 decides that the input signal is in a speech segment, CS-ACELP
speech coder 12 performs speech coding on the speech segment. CS-ACELP speech coder
12 has a configuration shown in FIG.1.
[0006] On the other hand, in the case where speech/non-speech decision section 11 decides
that the input signal is in a non-speech segment speech segment, non-speech segment
speech segment coder 13 performs coding. This non-speech segment speech segment coder
13 calculates an LPC coefficient and LPC prediction residual energy of the input signal
similar to those of coding in the speech segment from the input signal and outputs
as coding data in the non-speech segment speech segment.
[0007] DTX controller & multiplexer 14 controls and multiplexes data to be sent as transmission
data from the outputs of speech/non-speech decision section 11, CS-ACELP speech coder
12 and non-speech segment speech segment coder 13 and outputs this data as transmission
data.
[0008] However, in the conventional CS-ACELP coder above, the speech coder performs coding
at a bit rate as low as 8 kbps using speech-specific redundancy, and therefore while
high-quality coding is possible in the case where a clean speech signal without superimposed
background noise is input, in the case where a speech signal with surrounding background
noise superimposed is input as the input signal, the conventional CS-ACELP coder above
involves a problem that the quality of the decoded signal deteriorates when the speech
signal with background noise noise signal is coded.
[0009] Furthermore, the conventional CS-ACELP coder with DTX control above performs coding
using the CS-ACELP coder only on the speech segment and performs coding using a dedicated
non-speech segment speech segment coder at a lower bit rate than that of the speech
coder on the non-speech segment speech segment (segment with only noise), and thereby
reduces an average bit rate for transmission. However, since the non-speech segment
speech segment coder performs coding using a signal model (which generates a decoded
signal by driving an AR type synthesis filter (LPC synthesis filter) with a random
signal at short intervals (approximately 10 to 50 ms) similar to that of the speech
coder, the conventional CS-ACELP coder with DTX control involves a problem that the
quality of the decoded signal deteriorates for a speech signal with background noise
superimposed thereon as in the case of the conventional CS-ACELP coder above.
Disclosure of Invention
[0010] It is an object of the present invention to provide a speech signal coder and decoder
with little deterioration of the quality of a decoded signal also for a speech signal
with background noise superimposed thereon, capable of reducing an average bit rate
necessary for transmission.
[0011] A theme of the present invention is to provide a speech signal coder and decoder
with little deterioration of the quality of the decoded signal also for a speech signal
with background noise superimposed thereon, also capable of reducing an average bit
rate necessary for transmission by calculating statistical characteristic quantities
on an input signal in a non-speech segment speech segment (segment with only noise),
storing information on a noise model that can express statistical characteristic quantities
on an input noise signal, detecting whether a noise model parameter expressing the
input noise signal has changed or not and updating the noise model.
Brief Description of Drawings
[0012]
FIG.1 is a block diagram showing a configuration of a conventional speech signal coder;
FIG.2 is a block diagram showing a configuration of the conventional speech signal
coder;
FIG.3 is a block diagram showing a configuration of a radio communication system equipped
with a speech signal coder and speech signal decoder according to an embodiment of
the present invention;
FIG.4 is a block diagram showing a configuration of a speech signal coder according
to Embodiment 1 of the present invention;
FIG.5 is a block diagram showing a configuration of a noise signal coder according
to Embodiment 1 of the present invention;
FIG.6 is a block diagram showing a configuration of a speech signal decoder according
to Embodiment 1 of the present invention;
FIG.7 is a block diagram showing a configuration of a noise signal generator of the
speech signal decoder according to Embodiment 1 of the present invention;
FIG.8 is a flow chart showing a processing flow of a speech signal coding method according
to Embodiment 1 of the present invention;
FIG.9 is a flow chart showing a processing flow of a noise signal coding method according
to Embodiment 1 of the present invention;
FIG.10 is a block diagram showing a configuration of a speech signal coder according
to Embodiment 2 of the present invention;
FIG.11 is a block diagram showing a configuration of a speech signal decoder according
to Embodiment 2 of the present invention;
FIG.12 is a flow chart showing a processing flow of a speech signal coding method
according to Embodiment 2 of the present invention;
FIG.13 is a block diagram showing a configuration of a speech signal coder according
to Embodiment 3 of the present invention; and
FIG.14 is a flow chart showing a processing flow of a speech signal coding method
according to Embodiment 3 of the present invention.
Best Mode for Carrying out the Invention
[0013] With reference now to the attached drawings, embodiments of the present invention
will be explained in detail below.
(Embodiment 1)
[0014] FIG.3 is a block diagram showing a configuration of a radio communication apparatus
equipped with a speech signal coder according to Embodiment 1 of the present invention.
[0015] In this radio communication apparatus, a speech signal is converted to an electric
analog signal by speech input apparatus 101 such as a microphone on the transmitting
side and output to A/D converter 102. The analog speech signal is converted to a digital
speech signal by A/D converter 102 and output to speech coder 103. Speech coder 103
performs speech coding processing on the digital speech signal and outputs the coded
information to modulation/demodulation section 104. Modulation/demodulation section
104 digital-modulates the coded speech signal and sends to radio transmission section
105. Radio transmission section 105 performs predetermined radio transmission processing
on the modulated signal. This signal is sent out via antenna 106.
[0016] On the other hand, on the receiving side of the radio communication apparatus, a
reception signal received from antenna 107 is subjected to predetermined radio reception
processing by radio reception section 108 and sent to modulation/demodulation section
104. Modulation/demodulation section 104 performs demodulation processing on the reception
signal and outputs the demodulated signal to speech decoding section 109. Speech decoding
section 109 performs decoding processing on the demodulated signal, obtains a digital
decoded speech signal and outputs the digital decoded speech signal to D/A converter
110. D/A converter 110 converts the digital decoded speech signal output from speech
decoding section 109 to an analog speech signal and outputs to speech output apparatus
111 such as a speaker. Finally, speech output apparatus 111 converts the electrical
analog speech signal to speech sound and outputs.
[0017] Speech coding section 103 shown in FIG.3 has a configuration shown in FIG.4. FIG.4
is a block diagram showing a configuration of the speech coding section according
to Embodiment 1 of the present invention.
[0018] Speech/non-speech decision section 201 decides whether an input signal is in a speech
segment or non-speech segment speech segment (segment with only noise) and outputs
the decision result to DTX controller & multiplexer 204. Speech/non-speech decision
section 201 can be of any type and a decision is generally made using an instantaneous
value or amount of change of a plurality of parameters such as power, spectrum and
pitch period of the input signal.
[0019] In the case where the result of decision by speech/non-speech decision section 201
shows that the input signal is in a speech segment speech coder 202 performs speech
coding on the input speech signal in the speech segment including the speech signal
and noise signal and outputs the coded data to DTX controller & multiplexer 204. This
speech coder 202 is a coder for the speech segment and any coder can be used as far
as the coder can perform efficient coding on speech sound.
[0020] On the other hand, in the case where the result of decision by speech/non-speech
decision section 201 shows that the input signal is in a non-speech segment speech
segment, noise signal coder 203 performs noise signal coding on the input signal in
the non-speech segment speech segment including only a noise signal and outputs information
on a noise model that expresses the input noise signal and a flag indicating whether
the noise model should be updated or not to DTX controller & multiplexer 204. Finally,
DTX controller & multiplexer 204 controls information to be sent as transmission data
using the outputs from speech/non-speech decision section 201, speech coder 202 and
noise signal coder 203, multiplexes transmission information and outputs as transmission
data.
[0021] Noise signal coder 203 in FIG.4 has a configuration shown in FIG.5. FIG.5 is a block
diagram showing a configuration of the noise signal coder according to Embodiment
1 of the present invention.
[0022] Noise signal analysis section 301 performs a signal analysis on a noise signal input
at certain intervals and calculates analysis parameters regarding the noise signal.
Analysis parameters extracted are parameters necessary to express statistical characteristic
quantities regarding an input signal such as short-time spectrum calculated through
FFT (Fast Fourier Transform) on a short-segment signal, input power, LPC spectrum
parameter, etc..
[0023] Then, noise model variation detection section 303 detects whether a noise model parameter
that should express the currently input noise signal has changed from the noise model
parameter retained in noise model storage section 302 or not.
[0024] Here, the noise model parameter refers to information on a noise model that can express
statistical characteristic quantities regarding an input noise signal, for example,
information that expresses statistical characteristic quantities such as average spectrum
of short-time spectra, variance, etc. using a statistical model such as HMM.
[0025] Then, noise model variation detection section 303 decides whether an analysis parameter
for the current input signal obtained from noise signal analysis section 301 is appropriate
or not as the output from the noise model, which is stored as the noise model expressing
preceding input signals (for example, in the case of an HMM model, whether the probability
of output of an analysis parameter for the current input signal is equal to or greater
than a specified value or not), and in the case where it is decided that the noise
model parameter that should express the currently input noise signal has changed from
the stored noise model, noise model variation detection section 303 outputs a flag
as to whether the noise model should be updated or not and information to be updated
(update information) to noise model updating section 304.
[0026] The external updating enable flag is a flag to externally instruct on whether the
updating of the noise model should be enabled or not and the updating of the noise
model is disabled in the case where the speech coder of the present invention, which
will be described later, is prevented from sending noise model parameters, for example,
for a period during which coded data in the speech segment is sent.
[0027] Then, when the noise model updating flag indicates updating, noise model updating
section 304 only outputs information of the updated noise model parameters or changed
parts of noise model parameters previously stored in noise model storage section 302
and at the same time updates noise model storage section 302 using the output information.
On the other hand, when the noise model updating flag indicates no updating, noise
model updating section 304 neither updates nor outputs updating information.
[0028] Next, speech decoding section 109 shown in FIG.3 has a configuration shown in FIG.6.
FIG.6 is a block diagram showing a configuration of the speech decoder according to
Embodiment 1 of the present invention.
[0029] Separator & DTX controller 401 receives transmission data which is an input signal
coded and sent on the coding side as reception data and separates this reception data
into speech coded data or noise model parameter, speech/non-speech decision flag and
noise model updating flag necessary for speech decoding and noise signal generation.
[0030] Then, in the case where the speech/non-speech decision flag indicates the speech
segment speech decoder 402 performs speech decoding from the speech coded data and
outputs the decoded speech to output switch 404.
[0031] On the other hand, in the case where the speech/non-speech decision flag indicates
the non-speech segment speech segment, noise signal generator 403 generates a noise
signal from the noise model parameter and noise model updating flag and outputs the
noise signal to output switch 404. Output switch 404 switches between the output of
speech decoder 402 and the output of noise signal generator 403 according to the result
of the speech/non-speech decision flag and outputs as an output signal.
[0032] Noise signal generator 403 in FIG.6 has a configuration shown in FIG.7. FIG.7 is
a block diagram showing a configuration of the noise signal generator of the speech
decoder according to Embodiment 1 of the present invention.
[0033] The noise model updating flag and noise model parameter (in the case of model updating)
output from noise signal coder 203 shown in FIG.5 are input to noise model updating
section 501. In the case where the noise model updating flag indicates updating, noise
model updating section 501 updates the noise model using the input noise model parameter
and the previous noise model parameter retained in noise model storage section 502
and newly stores the updated noise model parameter in noise model storage section
502.
[0034] Noise signal generator 503 generates and outputs a noise signal based on the information
of noise model storage section 502. Noise signals are generated based on the model
information which express statistical characteristic quantities so that the noise
signal generated becomes an appropriate signal as the output from the model. For example,
in the case where HMM is used as a statistical model, noise signal generator 503 stochastically
outputs signal parameters (for example, short-time spectra) necessary for generation
according to state transition probability and parameter output probability, etc. and
generates/outputs a noise signal based thereupon.
[0035] Then, operations of the speech coder and speech decoder in the configurations above
will be explained. FIG.8 is a flow chart showing a processing flow of the speech signal
coding method according to Embodiment 1. In this method, suppose the processing shown
in FIG.8 is repeated for every frame of a certain short segment (for example, approximately
10 to 50 ms).
[0036] First, in step (hereinafter referred to as "ST") 101, speech signals are input frame
by frame. Then, in ST102, a speech/non-speech decision is made on the input signal
and the decision result is output. In the case where the decision result is "speech",
in ST104, speech coding processing is performed on the input speech signal and the
coded data is output.
[0037] On the other hand, in the case where the decision result in ST103 is "non speech",
in ST105, the noise signal coder performs noise signal coding processing on the input
signal and outputs a flag as to whether information on a noise model that expresses
the input noise signal and noise model should be updated or not. The coding processing
on a noise signal will be described later.
[0038] Then, in ST106, information to be sent as transmission data is controlled and transmission
information is multiplexed using the output obtained as a result of speech/non-speech
decision, speech coding processing and noise signal coding processing and finally
in ST107, this data is output as transmission data.
[0039] FIG.9 is a flow chart showing a processing flow of the noise signal coding method
of the speech signal coding method according to this embodiment. According to this
method, the processing shown in FIG.9 is repeated for every frame of a fixed short
segment (for example, approximately 10 to 50 ms).
[0040] In ST201, a noise signal is input frame by frame. Then, in ST202, a signal analysis
is made on the noise signal frame by frame and analysis parameters for the noise signal
are calculated. Then, in ST203, it is detected from the analysis parameters whether
the noise model has changed or not and in the case where it is decided that the noise
model has changed, in ST205, a flag (updating) as to whether the noise model should
be updated or not and information to be updated (updating information) are output
and in ST206, noise model storage section 302 is updated using the output information.
[0041] On the other hand, in the case where it is decided in ST204 that the noise model
has not changed, in ST207, only a flag (no updating) as to whether the noise model
should be updated or not is output. In the case where the externally updating enable
flag, which is separately input from the outside, is "disabled" in ST203, it is decided
that the model has not changed and no noise model parameter is sent.
[0042] In this way, according to the random coding method according to this embodiment,
by modeling a noise signal with a noise model that can express with statistical characteristic
quantities, it is possible to generate a coded signal with little perceptual deterioration
with respect to a background noise signal. Moreover, since there is no need for faithful
coding on input signal waveforms and transmission is performed only in segments where
noise model parameters corresponding to the input signal are changed, it is possible
to provide, low bit rate, highly efficient coding.
[0043] Furthermore, the speech signal coding method according to this embodiment provides
high quality, highly efficient coding even in a background noise environment by performing
coding in a speech segment using a speech coder capable of coding a speech signal
with high quality and performing coding in a non-speech segment speech segment using
a noise signal coder with high efficiency and little perceptual deterioration.
(Embodiment 2)
[0044] FIG.10 is a block diagram showing a configuration of a speech signal coding section
according to Embodiment 2 of the present invention.
[0045] In this speech signal coding section 103, speech/noise signal separator 801 separates
an input speech signal into a speech signal and a background noise signal superimposed
on the speech signal. Speech/noise signal separator 801 can be of any type. As this
separation method, several methods are available such as a method called "spectrum
subtraction" which separates an input signal into a speech signal with noise signal
suppressed and the noise signal by subtracting a random spectrum from the input signal
in the frequency domain and a method of separating speech sound and noise signals
using input signals from a plurality of signal input devices.
[0046] Then, speech/non-speech decision section 802 decides from the speech signal after
the separation obtained from speech/noise signal separator 801 whether the signal
is in a speech segment or non-speech segment speech segment (segment with only noise)
and outputs the decision result to speech coder 803 and DTX controller & multiplexer
805. It is also possible to make this decision using an input signal before separation.
Speech/non-speech decision section 802 can be of any type. This decision is generally
made using instantaneous values or amount of variation of a plurality of parameters
such as power, spectrum and pitch period of the input signal.
[0047] In the case where the decision result of speech/non-speech decision section 802 shows
"speech", speech coder 803 performs speech signal coding on the speech signal after
the separation obtained from speech/noise signal separator 801 only in the speech
segment and outputs the coded data to DTX controller & multiplexer 805. This speech
coder 803 is a coder for the speech segment and any coder can be used as far as the
coder can perform efficient coding on speech sound.
[0048] On the other hand, noise signal coder 804 performs noise signal coding on the noise
signal after the separation obtained from speech/noise signal separator 801 over the
entire segment and outputs a flag as to whether information on the noise model that
expresses an input noise signal and noise model should be updated or not. Speech/noise
signal separator 801 is shown in FIG.5 explained in Embodiment 1.
[0049] By the way, in the case where the speech/non-speech decision result indicates "speech",
the speech/non-speech decision result flag input to noise signal coder 804 is designated
as the noise model updating disable flag in noise signal coder 804 and the model is
not updated.
[0050] Finally, DTX controller & multiplexer 805 controls information to be sent as transmission
data and multiplexes transmission information using the outputs from speech/non-speech
decision section 802, speech coder 803 and noise signal coder 804 and outputs as transmission
data.
[0051] FIG.11 is a block diagram showing a configuration of the speech signal decoder according
to Embodiment 2 of the present invention.
[0052] In the decoder shown in FIG.11, separator & DTX controller 901 receives transmission
data, which is an input signal coded and sent on the coding side as reception data
and separates the reception data into speech coded data or noise model parameter,
speech/non-speech decision flag and noise model updating flag necessary for speech
decoding and noise generation.
[0053] Then, in the case where the speech/non-speech decision flag indicates the speech
segment speech decoder 902 performs speech decoding from the speech coded data and
outputs the decoded speech to speech/noise signal adder 904.
[0054] On the other hand, noise signal generator 903 generates a noise signal from the noise
model parameter and noise model updating flag and outputs the noise signal to speech/noise
signal adder 904. Speech/noise signal adder 904 adds up the output of speech decoder
902 and the output of noise signal generator 903 and outputs as an output signal.
[0055] Then, with reference to FIG.12, a processing flow of the speech signal coding method
according to Embodiment 2 will be explained. In this method, suppose the processing
shown in FIG.12 is repeated for every frame of a certain short segment (for example,
approximately 10 to 50 ms).
[0056] First, in ST301, input signals are input frame by frame. Then, in ST302, an input
speech signal is separated into a speech signal and a background noise signal superimposed
on the speech signal. Then, in ST303, a speech/non-speech decision is made on the
input signal or the speech signal after the separation obtained in ST302 and the decision
result is output (ST304).
[0057] In the case where the decision result is "speech", in ST305, the speech coder performs
speech coding processing on the speech signal after the separation obtained in ST302
and outputs the coded data. Then, on the noise signal after the separation obtained
in ST302, the noise signal coder performs noise signal coding in ST306 and outputs
information on the noise model that expresses the input noise signal and a flag as
to whether the noise model should be updated or not.
[0058] In the case where the speech/non-speech decision result in ST303 is "speech", model
updating is not performed in noise signal coding processing in ST306. Then, in ST307,
information to be sent as transmission data is controlled and transmission information
is multiplexed using the output obtained as a result of the speech/non-speech decision,
speech coding processing and noise signal coding processing and finally in ST308,
this data is output as transmission data.
[0059] In this way, the speech signal coder of this embodiment can perform coding in a speech
segment using the speech coder providing high-quality coding on the speech signal
and perform coding on a noise signal using the noise signal coder of Embodiment 1
with high efficiency and little perceptual deterioration, and therefore can perform
high-quality and high-efficiency coding even in a background noise environment. Furthermore,
by providing a speech/noise signal separator, the speech signal coder of this embodiment
can remove superimposed background noise signals from the speech signal input to the
speech coder, providing higher-quality or higher efficiency coding in the speech segment.
(Embodiment 3)
[0060] FIG.13 is a block diagram showing a configuration of a speech coding section according
to Embodiment 3 of the present invention. The configuration on the decoding side of
this embodiment is the same as the configuration of the speech signal decoder shown
in FIG.6.
[0061] Input signal analyzer 1101 performs a signal analysis on an input signal input for
every certain segment and calculates analysis parameters for the input signal. Characteristic
parameters to be extracted include parameters necessary to express statistical characteristic
quantities on the input signal and parameters expressing speech characteristics. The
parameters necessary to express statistical characteristic quantities include short-time
spectra obtained by FFT on a short-segment signal, input power, LPC spectrum parameter,
etc.. On the other hand, the parameters expressing speech characteristics include
LPC parameter, input power and pitch period information, etc.
[0062] Then, mode decision section 1104 decides whether the input signal is in a speech
segment or non-speech segment speech segment (segment with only noise) and whether
a noise model is updated and updating information is sent or not in the case of a
non-speech segment speech segment, on the analysis parameters obtained by input signal
analyzer 1101 using the speech characteristic pattern retained in speech model storage
section 1102 and the noise model parameter retained in noise model storage section
1103.
[0063] Here, speech model storage section 1102 creates and stores speech characteristic
patterns beforehand and the speech characteristic patterns include information such
as distribution of LPC parameters, input signal power and pitch period information,
etc. in a speech (voiced) segment. Furthermore, the noise model parameters refer to
information on a noise model that can express statistical characteristic quantities
on the input noise signal such as information expressing statistical characteristic
quantities such as(?) average spectrum of short-time spectra, distribution value,
using a statistic model such as HMM.
[0064] Then, input signal analyzer 1101 decides whether statistical analysis parameters
for the current input signal obtained is appropriate as the output from the noise
model stored noise modelexpressing signals in the preceding random segment or not
(for example, in the case of an HMM model, whether the probability of output of an
analysis parameter for the current input signal is equal to or greater than a specified
value) and at the same time decides from the parameter expressing speech characteristics
on the input signal whether the signal is in a speech (voiced) segment or not.
[0065] In the case where mode decision section 1104 decides that the signal is in the speech
segment speech coder 1105 performs speech coding on the input signal and outputs the
coded data to DTX controller & multiplexer 1107. On the other hand, in the case where
mode decision section 1104 decides that the signal is in the non-speech segment speech
segment and noise model updating information is sent, noise model updating section
1106 updates the noise model and outputs the information on the updated noise model
to DTX controller & multiplexer 1107.
[0066] Finally, DTX controller & multiplexer 1107 controls information to be sent as transmission
data and multiplexes transmission information using the outputs from the speech coder
and noise model updating section 1106 and outputs as transmission data.
[0067] Then, with reference to FIG.14, a processing flow of the speech signal coding method
according to this embodiment will be explained. In this method, suppose the processing
shown in FIG.14 is repeated for every frame of a certain short segment (for example,
approximately 10 to 50 ms).
[0068] First, in ST401, input signals are input frame by frame. Then, in ST402, a signal
analysis is made on an input signal input for every certain segment and their analysis
parameters are calculated and output.
[0069] Then, in ST403, it is decided whether a currently input statistical analysis parameter
is appropriate or not as the output from the noise model retained in noise model storage
section 1103 in FIG.11 (ST404). In the case where the decision result shows that the
parameter is not appropriate, that is, the current input signal cannot be expressed
with the noise model currently retained, the process moves on to next ST405 and it
is decided from the speech characteristic parameter obtained by analyzing the input
signal whether the signal is in a speech (voiced) segment or not. In the case where
it is decided that the signal is in a speech segment, in ST406, the speech coder performs
speech coding processing and outputs the coded data.
[0070] On the other hand, in the case where it is decided in ST405 that the signal is not
in the speech segment, in ST407, the noise model is updated and information on the
updated noise model is output. In the case where it is decided in ST403 that the current
input can be expressed with the noise model which is currently retained, no processing
is performed and the process moves on to the next step. Then, in ST408, information
to be transmitted as transmission data is controlled and transmission information
is multiplexed using the outputs from the speech coder and noise model updater, and
in ST409 transmission data is output.
[0071] As described above, by providing a mode decision section, the speech signal coder
according to this embodiment can make decisions using a variation in statistical characteristic
quantities of an input signal and speech characteristic patterns. Therefore, this
embodiment can make more precise mode decisions and suppress deterioration of quality
due to decision errors.
[0072] The noise signal coder of the present invention adopts a configuration comprising
an analyzer that performs a signal analysis on a noise signal contained in a speech
signal, a storage device that stores information on a noise model expressing the noise
signal, a detector that detects a variation of information on the stored noise model
based on the result of a signal analysis of a current input noise signal and an updater
that updates, when a change of the information on the noise model is detected, information
on the noise model stored by the amount of the variation.
[0073] This configuration allows a noise signal to be modeled with a noise model capable
of expressing with statistical characteristic quantities, and thereby can generate
a decoded signal with little perceptual deterioration with respect to a background
noise signal. This modeling also eliminates the need for faithful coding for the input
signal waveform, providing low bit rate, highly efficient coding by only transmitting
a segment where a noise model parameter corresponding to the input signal changes.
[0074] The noise signal coder of the present invention in the above configuration adopts
a configuration with the analyzer extracting statistical characteristic quantities
on the noise signal and the storage device storing information capable of expressing
the statistical characteristic quantities as information on the noise model.
[0075] This configuration provides appropriate modeling of a noise signal and low bit rate,
highly efficient coding.
[0076] The speech signal coder of the present invention adopts a configuration comprising
a speech/non-speech decision section that decides whether an input speech signal is
in a speech segment or non-speech segment speech segment that includes only a noise
signal, a speech coder that performs speech coding on the input speech signal when
the decision result shows that the signal is in a speech segment, the noise signal
coder that, performs noise signal coding on the input signal when the decision result
shows that the signal is in a non-speech segment speech segment and a multiplexer
that multiplexes the outputs from the speech/non-speech decision section, speech coder
and noise signal coder.
[0077] According to this configuration, the speech coder capable of performing high quality
coding on the speech signal performs coding in a speech segment and the noise signal
coder with high efficiency and little perceptual deterioration performs coding in
a non-speech segment speech segment, thus providing high quality and highly efficient
coding even in a background noise environment.
[0078] The speech signal coder of the present invention adopts a configuration comprising
a speech/noise signal separator that separates an input speech signal into a speech
signal and a background noise signal superimposed on this speech signal, a speech/non-speech
decision section that decides the speech segment or non-speech segment speech segment
including only the noise signal from the speech signal obtained from the input speech
signal or the speech/noise signal speech/non-speechseparator, a speech coder that
performs speech coding on the input speech signal when the decision result indicates
a speech segment, the noise signal coder that performs coding on the background noise
signal obtained from the speech/noise signal speech/non-speech separator and a multiplexer
that multiplexes the outputs from the speech/noise signal speech/non-speech decision
section, speech coder and noise signal coder.
[0079] According to this configuration, the speech coder capable of performing high quality
coding on the speech signal performs coding in a speech segment and the noise signal
coder with high efficiency and little perceptual deterioration performs coding on
a noise signal, thus providing high quality and highly efficient coding even in a
background noise environment. Furthermore, provision of the speech/noise signal speech/non-speech
separator makes it possible to remove superimposed background noise from the speech
signal input to the speech coder, providing high quality, highly efficient coding
on the speech segment.
[0080] The speech signal coder of the present invention adopts a configuration comprising
an analyzer that performs a signal analysis on an input speech signal, a speech model
storage device that stores speech characteristic patterns necessary to decide whether
the input speech signal is a voiced signal or not, a noise model storage device that
stores information on a noise model expressing a noise signal included in the input
speech signal, a mode decision section that decides whether the input speech signal
is in a speech segment or non-speech segment speech segment containing only a noise
signal using the outputs of the analyzer, speech model storage device and noise model
storage device and in the case of the non-speech segment speech segment, decides whether
the noise model should be updated or not, a speech coder that performs speech coding
on the input speech signal when the mode decision section decides the speech segment,
a noise model updater that updates the noise model when the mode decision section
decides the non-speech segment speech segment and decides that the noise model will
be updated and a multiplexer that multiplexes the outputs from the speech coder and
noise model updater.
[0081] According to this configuration, provision of the mode decision section makes it
possible to make a decision using a variation of statistical characteristic quantities
of the input signal and speech characteristic patterns. Thus, this configuration provides
more precise mode decision and can suppress quality deterioration due to decision
errors.
[0082] The noise signal generator of the present invention adopts a configuration comprising
a noise model updater that updates a noise model as required according to noise model
parameters coded on the input noise signal on the coding side and the noise model
updating flag, a noise model storage device that stores information on the updated
noise model using the output of the noise model updater and a noise signal generator
that generates a noise signal from information on the noise model stored in the noise
model storage device.
[0083] According to this configuration, it is possible to generate a decoded signal with
little perceptual deterioration with respect to a background noise signal.
[0084] The noise signal generator of the present invention in the above configuration adopts
a configuration with the noise model parameters input to the noise model updater and
information stored in the noise model storage device being information capable of
expressing statistical characteristic quantities on the noise signal generated.
[0085] By modeling a noise signal with a noise model capable of expressing with statistical
characteristic quantities, this configuration can generate a decoded signal with little
perceptual deterioration with respect to a background noise signal.
[0086] The speech signal decoder of the present invention adopts a configuration comprising
a separator that receives a signal including speech data coded on the coding side,
noise model parameter, speech/non-speech decision flag and noise model updating flag
and separates the noise model parameter, speech/non-speech decision flag and noise
model updating flag from the signal, a speech decoder that performs speech decoding
on the speech data when the speech/non-speech decision flag indicates a speech segment,
a noise signal generator that generates a noise signal from the noise model parameter
and noise model updating flag when the speech/non-speech decision flag indicates a
non-speech segment speech segment and an output switch that switches between the decoded
speech output from the speech decoder and the noise signal output from the noise signal
generator according to the speech/non-speech decision flag and outputs as an output
signal.
[0087] This configuration makes it possible to generate a decoded signal with little perceptual
deterioration with respect to a background noise signal.
[0088] The speech signal decoder of the present invention adopts a configuration comprising
a separator that receives a signal including speech data coded on the coding side,
noise model parameter, speech/non-speech decision flag and noise model updating flag
and separates the noise model parameter, speech/non-speech decision flag and noise
model updating flag from the signal, a speech decoder that performs speech decoding
on the speech data when the speech/non-speech decision flag indicates a speech segment,
the noise signal generator that generates a noise signal from the noise model parameter
and noise model updating flag when the speech/non-speech decision flag indicates a
non-speech segment speech segment and a speech/noise signal adder that adds up the
decoded speech output from the speech decoder and noise signal output from the noise
signal generator.
[0089] This configuration makes it possible to generate a decoded signal with little perceptual
deterioration with respect to a background noise signal. Furthermore, after the coding
side separates a speech signal and a noise signal superimposed thereon, coders suited
to their respective signals perform coding and the decoding side adds up the signals
to generate a decoded signal, thus providing coding of a speech signal in a speech
segment with higher quality.
[0090] The speech signal coding method of the present invention comprises a speech/non-speech
deciding step of deciding whether an input speech signal is in a speech segment or
non-speech segment speech segment that includes only a noise signal, a speech coding
step of coding the input speech signal when the decision result shows that the signal
is in a speech segment, a noise signal coding step of performing noise signal coding
on the input signal when the decision result shows that the signal is in a non-speech
segment speech segment, and a multiplexing step of multiplexing the outputs from the
speech/non-speech deciding step, speech coding step and noise signal coding step,
and the noise signal coding step comprises an analyzing step of performing a signal
analysis on a noise signal contained in a speech signal, a storing step of storing
information on a noise model expressing the noise signal, a detecting step of detecting
a variation of information on the stored noise model based on the result of a signal
analysis of a current input noise signal and an updating step of updating information
on the noise model stored by the amount of the variation when a change of the information
on the noise model is detected.
[0091] According to this method, the speech coding section capable of performing high quality
coding on the speech signal performs coding in a speech segment and the noise signal
coder of the first embodiment with high efficiency and little perceptual deterioration
performs coding in a non-speech segment speech segment, thus providing high quality,
highly efficient coding even in a background noise environment.
[0092] The speech signal coding method of the present invention comprises a speech/noise
signal separating step of separating an input speech signal into a speech signal and
a background noise signal superimposed on this speech signal, a speech/non-speech
deciding step of deciding the speech segment or non-speech segment speech segment
that includes only the noise signal from the speech signal obtained in the input speech
signal or the speech/noise signal separating step, a speech coding step of performing
speech coding on the input speech signal when the decision result indicates a speech
segment, a noise signal coding step of performing noise signal coding on the input
signal when the decision result indicates a non-speech segment speech segment and
performing coding on the background noise signal obtained from the speech/noise signal
separating step and a multiplexing step of multiplexing the outputs from the speech/non-speech
deciding step, speech coding step and noise signal coding step, and the noise signal
coding step comprises an analyzing step of performing a signal analysis on a noise
signal contained in a speech signal, a storing step of storing information on a noise
model expressing the noise signal, a detecting step of detecting a variation of information
on the stored noise model based on the result of a signal analysis of a current input
noise signal and an updating step of updating when a variation of the information
on the noise model is detected, information on the stored noise model by the amount
of the variation.
[0093] According to this configuration, the speech coding section capable of performing
high quality coding on the speech signal performs coding in a speech segment and the
noise signal coder of the first embodiment with high efficiency and little perceptual
deterioration performs coding in a non-speech segment speech segment, thus providing
high quality and highly efficient coding even in a background noise environment. Furthermore,
provision of the speech/noise signal speech/non-speech separating section makes it
possible to remove superimposed background noise from the speech signal input to the
speech coding section, providing high quality, highly efficient coding on the speech
segment.
[0094] The speech signal coding method of the present invention comprises an analyzing step
of performing a signal analysis on an input speech signal, a speech model storing
step of storing speech characteristic patterns necessary to decide whether the input
speech signal is a voiced signal or not, a noise model storing step of storing information
on a noise model expressing a noise signal included in the input speech signal, a
mode deciding step of deciding whether the input speech signal is in a speech segment
or non-speech segment speech segment containing only a noise signal using the outputs
of the analyzing section, speech model storing section and noise model storing section
and when the decision result indicates the non-speech segment speech segment, deciding
whether the noise model should be updated or not, a speech coding step of performing
speech coding on the input speech signal when the mode decision section decides the
speech segment, a noise model updating step of updating the noise model when the mode
decision section decides the non-speech segment speech segment and decides that the
noise model will be updated, and a multiplexing step of multiplexing the outputs from
the speech coding section and noise model updating section.
[0095] According to this method, provision of the mode decision section allows decisions
to be made using a variation of statistical characteristic quantities and speech characteristic
patterns of the input signal. Thus, this method provides more precise mode decisions
and suppresses quality deterioration due to decision errors.
[0096] The recording medium of the present invention is a mechanically readable medium that
records a program to execute the steps of analyzing statistical characteristic quantities
on an input noise signal, storing information on a noise model expressing the statistical
characteristic quantities on the input noise signal, detecting a variation of the
noise model expressing the input noise signal and updating the noise model and outputting
information on the updated noise model as required.
[0097] As described above, the noise signal coder of the present invention can generate
a decoded signal with little perceptual deterioration with respect to a background
noise signal by modeling a noise signal with a noise model capable of expressing the
noise signal with statistical characteristic quantities. The noise signal coder of
the present invention also eliminates the need for faithful coding for the input signal
waveform, and thus provides low bit rate, highly efficient coding by transmitting
only a segment where a noise model parameter for the input signal changes.
[0098] Furthermore, the speech signal coder of the present invention provides high-quality,
highly efficient coding even in a background noise environment by performing coding
in a speech segment through a speech coder capable of coding a speech signal with
high quality and performing coding in a non-speech segment speech segment through
the noise signal coder with high efficiency and little perceptual deterioration.
[0099] This application is based on the Japanese Patent Application No.HEI 11-168545 filed
on June 15, 1999, entire content of which is expressly incorporated by reference herein.
Industrial Applicability
[0100] The present invention is applicable to a base station apparatus and communication
terminal apparatus in a digital radio communication system.
1. A noise signal coder comprising:
analyzing means for performing a signal analysis on a noise signal contained in a
speech signal;
storing means for storing information on a noise model expressing said noise signal;
detecting means for detecting a variation of information on the stored noise model
based on the signal analysis result of a current input noise signal; and
updating means for updating, when a variation of the information on the noise model
is detected, information on said noise model stored by the amount of the variation.
2. The noise signal coder according to claim 1, wherein the analyzing means extracts
statistical characteristic quantities on the noise signal and the storing means stores
information capable of expressing said statistical characteristic quantities as information
on the noise model.
3. A speech signal coder comprising:
speech/non-speech deciding means for deciding whether an input speech signal is in
a speech segment or non-speech segment speech segment that includes only a noise signal;
speech coding means for performing speech coding on said input speech signal when
the decision result indicates the speech segment;
the noise signal coder according to claim 1 or claim 2 that performs noise signal
coding on said input signal when the decision result indicates the non-speech segment
speech segment; and
multiplexing means for multiplexing the outputs from said speech/non-speech deciding
means, said speech coding means and said noise signal coder.
4. A speech signal coder comprising:
speech/noise signal separating means for separating an input speech signal into a
speech signal and a background noise signal superimposed on this speech signal;
speech/non-speech deciding means for deciding whether a signal is in a speech segment
or non-speech segment speech segment including only a noise signal from the speech
signal obtained from said input speech signal or said speech/noise signal separating
means;
speech coding means for performing speech coding on said input speech signal when
the decision result indicates the speech segment;
the noise signal coder according to claim 1 that performs coding on the background
noise signal obtained from said speech/noise signal separating means; and
multiplexing means for multiplexing the outputs from said speech/non-speech deciding
means, said speech coding means and said noise signal coder.
5. A speech signal coder comprising:
analyzing means for performing a signal analysis on an input speech signal;
speech model storing means for storing speech characteristic patterns necessary to
decide whether said input speech signal is a voiced signal or not;
noise model storing means for storing information on a noise model expressing a noise
signal contained in said input speech signal;
mode deciding means for deciding whether said input speech signal is in a speech segment
or non-speech segment speech segment containing only a noise signal using the outputs
of said analyzing means, speech model storing means and noise model storing means
and, when the decision result indicates the non-speech segment speech segment, deciding
whether the noise model should be updated or not;
speech coding means for performing speech coding on the input speech signal when the
mode deciding means decides the speech segment;
noise model updating means for updating the noise model when said mode deciding means
decides the non-speech segment speech segment and decides that the noise model will
be updated; and
multiplexing means for multiplexing the outputs from the speech coding means and noise
model updating means.
6. A base station apparatus equipped with a speech signal coder, said speech signal coder
comprising:
speech/non-speech deciding means for deciding whether an input speech signal is in
a speech segment or non-speech segment speech segment that includes only a noise signal;
speech coding means for performing speech coding on said input speech signal when
the decision result indicates the speech segment;
the noise signal coder according to claim 1 or claim 2 that performs noise signal
coding on said input signal when the decision result indicates the non-speech segment
speech segment; and
multiplexing means for multiplexing the outputs from said speech/non-speech deciding
means, said speech coding means and said noise signal coder.
7. A communication terminal apparatus equipped with a speech signal coder, said speech
signal coder comprising:
speech/non-speech deciding means for deciding whether an input speech signal is in
a speech segment or non-speech segment speech segment that includes only a noise signal;
speech coding means for performing speech coding on said input speech signal when
the decision result indicates the speech segment;
the noise signal coder according to claim 1 or claim 2 that performs noise signal
coding on said input signal when the decision result indicates the non-speech segment
speech segment; and
multiplexing means for multiplexing the outputs from said speech/non-speech deciding
means, said speech coding means and said noise signal coder.
8. A noise signal generator comprising:
noise model updating means for updating a noise model as required according to noise
model parameters coded on an input noise signal on the coding side and a noise model
updating flag;
noise model storing means for storing information on the updated noise model using
the output of said noise model updating means; and
noise signal generating means for generating a noise signal from information on the
noise model stored in said noise model storing means.
9. The noise signal generator according to claim 8, wherein the noise model parameters
input to said noise model updating means and information stored in said noise model
storing means are information capable of expressing statistical characteristic quantities
on the noise signal generated.
10. A speech signal decoder comprising:
separating means for receiving a signal including speech data coded on the coding
side, noise model parameter, speech/non-speech decision flag and noise model updating
flag and separating the noise model parameter, speech/non-speech decision flag and
noise model updating flag from said signal;
speech decoding means for performing speech decoding on said speech data when the
speech/non-speech decision flag indicates a speech segment;
the noise signal generator according to claim 8 that when said speech/non-speech decision
flag indicates a non-speech segment speech segment, generates a noise signal from
said noise model parameter and noise model updating flag; and
output switching means for switching between the decoded speech output from said speech
decoding means and the noise signal output from said noise signal generator according
to said speech/non-speech decision flag and outputting as an output signal.
11. A speech signal decoder comprising:
separating means for receiving a signal including speech data coded on the coding
side, noise model parameter, speech/non-speech decision flag and noise model updating
flag and separating the noise model parameter, speech/non-speech decision flag and
noise model updating flag from said signal;
speech decoding means for performing speech decoding on said speech data when said
speech/non-speech decision flag indicates a speech segment;
the noise signal generator according to claim 8 or claim 9 that, when said speech/non-speech
decision flag indicates a non-speech segment speech segment, generates a noise signal
from said noise model parameter and noise model updating flag; and
speech/noise signal adding means for adding up the decoded speech output from said
speech decoding means and the noise signal output from said noise signal generator.
12. A speech signal coding method comprising:
a speech/non-speech deciding step of deciding whether an input speech signal is in
a speech segment or non-speech segment speech segment that includes only a noise signal;
a speech coding step of performing speech coding on said input speech signal when
the decision result indicates the speech segment;
a noise signal coding step of performing noise signal coding on said input signal
when the decision result indicates the non-speech segment speech segment; and
a multiplexing step of multiplexing the outputs from said speech/non-speech deciding
step, said speech coding step and said noise signal coding step, wherein the noise
signal coding step comprising:
an analyzing step of performing a signal analysis on a noise signal contained in a
speech signal;
a storing step of storing information on a noise model expressing said noise signal;
a detecting step of detecting a variation of information on the stored noise model
based on the result of a signal analysis of a current input noise signal; and
an updating step of updating, when a variation of the information on the noise model
is detected, information on said noise model stored by the amount of said variation.
13. A speech signal coding method comprising:
a speech/noise signal separating step of separating an input speech signal into a
speech signal and a background noise signal superimposed on this speech signal;
a speech/non-speech deciding step of deciding a speech segment or non-speech segment
speech segment including only a noise signal from the speech signal obtained from
said input speech signal or said speech/noise signal separating step;
a speech coding step of performing speech coding on said input speech signal when
the decision result indicates a speech segment;
a noise signal coding step of performing coding on a background noise signal obtained
from said speech/noise signal separating step; and
a multiplexing step of multiplexing the outputs from said speech/non-speech deciding
step, said speech coding step and said noise signal coding step, wherein the noise
signal coding step comprising:
an analyzing step of performing a signal analysis on a noise signal contained in a
speech signal;
a storing step of storing information on a noise model expressing said noise signal;
a detecting step of detecting a variation of information on the stored noise model
based on the result of a signal analysis of a current input noise signal; and
an updating step of updating, when a variation of the information on the noise model
is detected, information on said noise model stored by the amount of said variation.
14. A speech signal coding method comprising:
an analyzing step of performing a signal analysis on an input speech signal;
a speech model storing step of storing speech characteristic patterns necessary to
decide whether said input speech signal is a voiced signal or not;
a noise model storing step of storing information on a noise model expressing a noise
signal contained in said input speech signal;
a mode deciding step of deciding whether said input speech signal is in a speech segment
or non-speech segment speech segment containing only a noise signal using the outputs
of said analyzing means, speech model storing means and noise model storing means
and when said decision result indicates the non-speech segment speech segment, deciding
whether the noise model should be updated or not;
a speech coding step of performing speech coding on the input speech signal when said
mode deciding means decides the speech segment;
a noise model updating step of updating the noise model when said mode deciding means
decides the non-speech segment speech segment and decides that the noise model will
be updated; and
a multiplexing step of multiplexing the outputs from the speech coding means and noise
model updating means.
15. A mechanically readable recording medium that records a program to execute the steps
of:
analyzing statistical characteristic quantities on an input noise signal;
storing information on a noise model expressing the statistical characteristic quantities
on the input noise signal;
detecting a variation of the noise model expressing the input noise signal; and
updating the noise model and outputting information on the updated noise model as
required.