[0001] The present invention relates generally to an arrangement and method for encoding
a discrete-time speech signal using a regular pulse excitation scheme and more specifically
to such an arrangement and method for encoding a speech signal at a low bit rate less
than 16k-bit per second.
[0002] In order to encode a speech signal with the limited number of calculations at a low
bit rate (less than approximately 16k-bit per second), it is a known practice to model
the characteristics of a human's vocal tract using a digital filter and further to
exhibit excitation signals by combining regular pulse sequences. Such a coding scheme
is known as a Regular Pulse Excitation - Long Term Prediction - Linear Predictive
Coder (hereinlater referred to as RPE-LTP), which has been proposed in the CEPT/CCH/GSM
Recommendation 06.10 entitled "GSM Full Rate Speech Transcoding" published by Conference
of European Postal and Telecommunications Administrations, September 19, 1988 (hereinafter
referred to as Paper 1).
[0003] Before describing the present invention, the regular pulse excitation coding scheme
disclosed in Paper 1 will be described with reference to Fig. 1.
[0004] In Fig. 1, an a/d (analog-to-digital) converted speech signal is applied via an input
terminal 10 to a pre-processing circuit 12 on a frame by frame basis. The speech frame
applied to the circuit 12 is pre-processed to produce an offset-free signal, which
signal is then subjected to a first order pre-emphasis filter. An original speech
signal has been sampled at a rate of 8 kHz. Since the frame length is 20 ms in this
prior art, the one frame consists of 160 signal samples. The 160 samples thus obtained
are applied to a short term LPC (Linear Predictive Coding) analysis circuit 14 and
also to a short term analysis filter 16. The 160 samples, applied to the short term
LPC analysis circuit 14, are analyzed to determine 8 orders of reflection coefficients
which represent a spectrum envelope of each frame. The LPC short term analysis circuit
14 further transforms or encodes the reflection coefficients to log area ratios (LAR),
which are applied to the short term analysis circuit 16 and a multiplexor 30. The
short term analysis circuit 16 decodes the LAR into the reflection coefficients and
obtains 160 samples of short term residual signals. In the above, the term "short
term analysis" has the same meaning as the spectrum envelope analysis. The short term
residual signal, outputted from the filter 16, is applied to a subtractor 18 and a
long term analysis circuit 22.
[0005] For the following operations, the long term analysis circuit 22 divides the speech
frame into 4 sub-frames (5 ms) each of which consists of 40 samples forming the short
term residual signal. Each sub-frame is processed blockwise by the subsequent function
blocks.
[0006] The long term analysis circuit 22 produces a long term prediction (LTP) lag and an
LTP gain on the basis of the two signals: the short term residual samples applied
from the circuit 16 and an output sequence from an adder 26. The term "long term analysis"
has the same meaning as pitch analysis, and the LTP lag and the LTP gain respectively
correspond to a pitch period and a pitch gain.
[0007] The subtractor 18 outputs a block of 40 long term residual signal samples by subtracting
the output of a long term analysis filter 20 from the short term residual signal applied
from the filter 16. An excitation pulse calculating circuit 24, using the long term
residual signal samples from the subtractor 18, obtains an RPE grid of an excitation
pulse sequence and an amplitude sequence of an excitation pulse series, which are
encoded and fed to the multiplexor 30 and also to an excitation pulse generator 28.
In connection with an RPE grid, reference should be made to Paper 1.
[0008] The position of j-th excitation pulse (m
j) within a sub-frame is given by the following equation.

where p denotes a predetermined pulse interval, q an RPE grid, and N the number of
samples within one sub-frame. By expressing the output sequence of the subtractor
18 as x
j(i), the RPE grid q of the excitation pulse sequence is obtained from the following
equation.

where max(q) indicates the maximum value of the right term when changing the value
of q. The amplitude of the excitation pulse sequence can be determined by quantizing
x
j(m
j).
[0009] The excitation pulse generator 28 decodes the signal applied from the circuit 24
to determine an excitation pulse, which is fed to the adder 26. The adder 26 adds
the excitation pulse from the circuit 28 and the output sequence of the long term
analysis filter 20, and applies the resultant sum to the filter 20 as well as the
analysis circuit 22. The long term analysis filter 20, utilizing the LTP lag and the
LTP gain, both applied from the circuit 22, filters the output sequence of the adder
26. The output sequence of the filter 20 is fed back to the adder 26 and also applied
to the subtractor 18.
[0010] The multiplexor 30 combines the encoded outputs of the blocks 14, 22 and 24, and
applies the result to a transmission line coupled to an output terminal 32.
[0011] However, the above-mentioned prior art has encountered the difficulty of low quality
of the reconstructed or reproduced speech. This is because the amplitude of each excitation
pulse is determined on the basis of the short term residual signal applied to the
subtractor 18. In other words, according to the prior art, the long term residual
signal outputted from the subtractor 18 is shifted by an RPE grid and then every predetermined
number of samples are quantized.
[0012] Furthermore, the aforesaid prior art has encountered another problem in that the
reproduced speech is degraded by quantizing distortion. This results from the fact
that the number of quantizing bits is insufficient at a bit rate in the order of 13k
bps.
[0013] EP-A-0 374 941 (document according to Article 54(3) EPC) discloses a communication
system for improving the speech quality by calculating the excitation multi-pulses
by means of an encoder for encoding a sequence of digital speech signals classified
into a voiced sound and an unvoiced sound into a sequence of output signals by the
use of a spectrum parameter and pitch parameters at every frame. A judging circuit
judges whether the digital speech signals are classified into the voiced sound or
the unvoiced sound in order to produce a judged signal representative of a result
of judging. A processing unit processes the digital speech signals in accordance with
the judged signal to selectively produce a first set of primary sound source signals
and secondary sound source signals. This first set is produced when the judged signal
represents the voiced sound and are representative of locations and amplitudes of
a first set of excitation multi-pulses calculated at every frame. The second set of
secondary sound source signals are produced when the judged signal represents the
unvoiced sound and are representative of the amplitudes of a second set of excitation
multi-pulses each of which is located at intervals of a preselected number of the
samples.
[0014] The document ICASSP 87, Dallas, Texas, 6th - 9th April 1987, vol. 2, pages 968-971,
IEEE, New York, US; A. Fukui et al: "Implementation of a multi-pulse speech codec
with pitch prediction on a single chip floating-point signal processor" discloses
a system of multi-pulse codec with pitch prediction which divides a discrete-time
speech signal and extracts a plurality of parameters from the divided speech signals.
The parameters are used to generate a signal and for generating an impulse response
function signal which, in turn, are used for generating an autocorrelation function
signal and a cross-correlation function signal.
[0015] It is an object of the present invention to provide an arrangement for encoding a
discrete-time speech signal using a regular pulse excitation scheme.
[0016] It is an object of the present invention to provide an arrangement for encoding a
discrete-time speech signal at a low bit rate less than 16k-bit per second through
the use of a regular pulse excitation scheme.
[0017] Another object of the present invention is to provide a method for encoding a discrete-time
speech signal at a low bit rate less than 16k-bit per second using a regular pulse
excitation scheme.
[0018] These objects are solved with the features of the claims.
[0019] In brief, a binary adder is comprised of a pre-processing circuit provided to receive
a discrete-time speech signal which are then divided into a plurality of frames. A
parameter extracting circuit is coupled to the pre-processing circuit and extracts
a plurality of parameters therefrom. A impulse response calculating circuit is coupled
to receive the plurality of parameters from the parameter extracting circuit, and
generates an impulse response function signal using the plurality of parameters. An
autocorrelation function circuit is coupled to receive the impulse response signal
and generates an autocorrelation function signal using the signal applied. A cross-correlation
function circuit generates a cross-correlation function signal using the discrete-time
speech signal and the autocorrelation function signal. A grid signal generator receive
the output of the cross-correlation function calculating circuit, and outputs a grid
signal indicative of a location of a first excitation pulse within one frame. A pulse
amplitude calculating circuit receives the autocorrelation function signal, the cross-correlation
function signal and the grid signal, and determines an amplitude sequence of excitation
pulses within one frame.
[0020] one aspect of this invention takes the form of an arrangement for encoding a speech
signal using a regular pulse excitation scheme, as set out in the appended claims.
[0021] Another aspect of this invention takes the form of a method for encoding a speech
signal using a regular pulse excitation scheme, as set out in the appended claims.
[0022] The features and advantages of the present invention will become more clearly appreciated
from the following description taken in conjunction with the accompanying drawings
in which like elements are denoted by like reference numerals and in which:
Fig. 1 is a block diagram illustrating a known RPE scheme, the drawing having been
referred to in the opening paragraphs of this specification;
Fig. 2 is a block diagram showing a first embodiment of this invention;
Figs. 3 and 4 each is a block diagram showing in detail a block in Fig. 2; and
Fig. 5 is a block diagram showing a second embodiment of this invention.
[0023] The present invention is characterized by algorithms for calculating an amplitude
of each of the excitation pulses. It should be noted that the location of the excitation
pulse can be determined in accordance with the prior art disclosed in Paper 1. The
above mentioned algorithms will be discussed below.
[0024] According to a so-called RPE coding scheme, the location of a j-th excitation pulse
within a frame can be specified by equation (1). For the convenience of description,
equation (1) is again shown as equation (3).

Algorithm of obtaining the RPE grid q will be described later.
[0025] An excitation pulse sequence d(n) can be represented by

where n denotes a given time within one frame, g
i an amplitude of an excitation pulse located at a position m
i, δ(n,m
i) the Kronecker's delta function which assumes 1 in the case of n = m
i and 0 in the case of n ≠ m
i, and K represents the number of pulses within one frame.
[0026] Fig. 3 shows a synthesis filter 122 which comprises two digital filters 310 and 320
coupled in series. The filter 310 includes an adder 322, a coefficient weighting circuit
324 and a delay 326. Similarly, the filter 320 includes an adder 328, a coefficient
weighting circuit 330 and a delay 332. The synthesis filter 122 forms part of the
arrangement shown in Fig. 2, and will again be referred to later. Consequently, the
detail description of Fig. 3 will be postponed.
[0027] The filter 310 is a long term prediction filter whose output represents a pitch structure,
while the filter 320 is a short term prediction filter whose output represents spectrum
envelope characteristics. For simplifying the description, it will be assumed that
the filter 310 is of a first order type. The synthesis filter 122 is supplied with
the excitation pulse series and outputs a reconstructed signal sequence x'(n) in accordance
with the following equation:

where β denotes an LTP gain representative of tap coefficients of the long term filter
310, Md a LTP lag indicative of a pitch period of an incoming speech signal. Further,
in equation (5), x
d(n) denotes an output signal of the filter 310, Np a prediction order of the short
term prediction filter 320, and a
i (1≤ i ≤Np) a prediction coefficient of the filter 320 (a
i corresponds to LAR in Fig. 3). β and Md can be obtained in accordance with the prior
art techniques disclosed in Paper 1. As an alternative, β and Md can be determined
by a peak amplitude of the autocorrelation function sequence of an input speech signal
and the position of said peak. The algorithms via which this can be achieved have
been disclosed in the document entitled "Adaptive predictive coding of speech signals"
by B.S. Atal et al., pages 1973 to 1986, The Bell System Technical Journal, October
1970 (referred to as Paper 2).
[0028] By defining the impulse response of the synthesis filter 122 as h(i) (0 ≤ i ≤ M-1
(M is the number of continuous samples)), the reconstructed signal x'(n) is given
by:

where the symbol * denotes convolution integration. Further, the square error J in
weighting between the input speech signal x(n) and the reproduced signal x'(n) within
one frame, can be represented by:

where N denotes the number of samples within one frame and w(n) a weighting function.
The weighting function w(n) implements weighting on a frequency axis, and the Z transform
W(Z) thereof is given by:

where a
i represents a prediction parameter of the synthesis filter 122 and r is a constant
(0≤ r ≤1) which determines the frequency characteristics of W(Z). In more detail,
in the event of r=1 then W(Z)=1. This means that the frequency characteristics is
flat. On the other hand, when r=0 then W(Z) represents an inverse frequency characteristics
of the synthesis filter 122. It follows that the value of r is able to change the
characteristics of W(Z). The reason why W(Z) is determined depending upon the frequency
characteristics of the synthesis filter 122 as shown in equation (8), stems from the
fact that an unaudible masking effect is utilized. In more detail, at a portion where
the power of the spectrum of the input speech signal is large (for example in the
vicinity of a formant), even if the difference or error between the spectrums of input
and reconstructed signals is somewhat large, such error does not affect the hearing
sense of the ears.
[0029] Algorithms for calculating an excitation pulse series which minimizes the weighted
square error J shown in equation (7), will be discussed in the followings. Equation
(7) is rewritten as follows.

where the term x'(n) * w(n) can be modified according to the following equation.
Thus by putting

and by performing Z conversion on both sides of equation (10), we obtain:

Further, X'(Z) can be expressed as follows:

where D(Z) represents the Z conversion of the excitation pulse series given by equation
(4), and H(Z) the Z conversion value of the impulse response of the synthesis filter
122. Substituting equation (12) into equation (11) gives:

By setting Hw(Z)=H(Z)·W(Z) and then implementing an inverse Z conversion on equation
(13), we obtain

where h
w(n) denotes an inverse Z conversion value of H
w(Z) and indicates the impulse response of a cascade coupled filter comprising a synthesis
filter and a weighting circuit. By substituting equation (4) into equation (14), we
obtain

where K represents the number of pulses within one frame. By substituting equations
(10) and (15) into equation (9), we obtain

Thus, equation (7) can be rewritten into equation (16).
[0030] The following equation can be obtained by partially differentiating equation (16)
with g
k and then setting it to zero, where g
k is an amplitude of the excitation pulse for minimizing equation (16).

where φ
xh(·) represents a cross-correlation function sequence computed from x
w(n) and h
w(n), and φ
hh(·) represents an autocorrelation function sequence of hw(n). These two sequences
are represented by the following equations (18) and (19). φ
hh(·) is referred to as a covariance function in the art of speech signal processing.


[0031] As will be understood from equation (17), the amplitude g
k of each of the excitation pulses is a function of the location m
k of the corresponding excitation pulse. This means that the most desirable amplitude
g
k at a given pulse position m
k can be computed. If an incoming speech signal sequence is assumed stationary, then
the covariance function φ
hh(m
i,m
k) can be represented by the following equation (20).

[0032] This equation indicates that under the above-mentioned assumption, φ
hh(m
i,m
k) is equal to an autocorrelation function R
hh(·) which depends on a delay

.
[0033] R
hh(

) in equation (20) can be represented as follows:

Consequently, equation (17) can be modified using equation (21) as follows:

[0034] The value of RPE grid q is calculated using the cross-correlation function obtained
by equation (18). That is to say, the RPE grid q can be determined so as to satisfy
the following equation.

where max(q) indicates the maximum value of the right term when changing the value
of q. The value that an RPE grid q can assume is 0, 1, 2, 3 in the prior art disclosed
in Paper 1 merely by way of example.
[0035] According to the present invention, an amplitude sequence of the excitation signal
can be precisely obtained using equation (22), and hence a high quality reproduced
voice can be realized.
[0036] A first embodiment of this invention will be discussed with reference to Figs. 2
to 4.
[0037] As previously mentioned in connection with Fig. 1, an a/d (analog-to-digital) converted
speech signal is applied via an input terminal 110 to a pre-processing circuit 112
on a frame by frame basis. The pre-processing circuit 112 can be configured in the
same manner as the circuit 12 of Fig. 1. The speech frame applied to the circuit 112
is pre-processed to produce an offset-free signal, which is then subjected to a first
order pre-emphasis filter. An original speech signal to be applied to the input terminal
110, has been sampled at a predetermined rate such as 8 kHz. In the event that the
frame length is 20 ms as in the prior art, merely by way of example, the one frame
consists of 160 signal samples. The samples thus obtained are applied to a short term
LPC (Linear Predictive Coding) analysis circuit 114 and also to a long term (pitch)
analysis filter 116.
[0038] The one frame samples, applied to the short term LPC analysis circuit 114, are analyzed
to determine predetermined orders of reflection coefficients (LAR(i)) (i=1···8) in
the same manner as disclosed in Paper 1. The reflection coefficients represent a spectrum
envelope of each frame. An LAR coding circuit 118 is supplied with the LAR(i)s and
transforms or encodes them into log area ratios (coded-LAR(i)) based on predetermined
quantizing levels (quantizing bits), and then applies them to a multiplexor 300. Further,
the LAR coding circuit 118 decodes the coded-LAR(i)s, applies the decoded LAR'(i)
to an impulse response calculating circuit 120 as well as a synthesis filter 122.
[0039] The long term analysis circuit 116 receives the one frame samples from the pre-processing
circuit 112 to calculate LTP lag Md and LTP gain β along with the algorithms as disclosed
in the above-mentioned Paper 2. The Md, β are fed to a long term (pitch) coding circuit
124, which encodes the Md, β and applies the coded-Md and coded-β to the multiplexor
300. Further, the long term coding circuit 124 decodes the coded-Md and the coded-β
into Md' and β', respectively. The decoded LTP lag (Md') and the decoded LTP gain
(β') are applied to the impulse response calculating circuit 120 and also to the synthesis
filter 122.
[0040] As shown in Fig. 4, the impulse response calculating circuit 120 comprises an impulse
generator 400, a long term prediction (LTP) filter 402 and a short term prediction
(STP) filter 404, which are coupled in series. The LTP filter 402 includes an adder
406, a coefficient weighting circuit 408 and a delay circuit 410. Similarly, the STP
filter 404 includes an adder 412, a coefficient weighting circuit 414 and a delay
circuit 416. The operation of each of the filters 402 and 404 are known to those in
the art, and hence the detail descriptions thereof will be omitted. The decoded Md'
and β' are applied to the coefficient weighting circuit 408, while the decoded LAR'(i)
to the coefficient weighting circuit 414.
[0041] The impulse response calculating circuit 120 determines an impulse response of a
predetermined number of samples and applies the output h
w(n) to an autocorrelation function calculating circuit 126 and a cross-correlation
function calculating circuit 128.
[0042] The circuit 126 calculates an autocorrelation function R
hh(

) according to equation (21), and applies the result to a pulse amplitude calculating
circuit 132.
[0043] A subtractor 134, coupled to the pre-processing circuit 112 and the synthesis filter
122, subtracts the output sequence of the filter 122 from the speech signal sequence
x(n), and applies the resultant difference to a weighting circuit 136. The synthesis
filter 122 has already stored one frame of response signal sequence, which is obtained
by using an excitation pulse one frame before the present frame as an excitation signal
and thereafter delayed to the present frame by making the excitation signal zero.
This is based on a consideration that if it is assumed that the effective sample number
of the impulse response of the synthesis filter in question is at most about two frames,
the speech signal sequence of the present frame can be expressed by the sum of a signal
sequence obtained by delaying the output signal of the synthesis filter driven by
an excitation pulse one frame before to the present frame by making the excitation
signal zero, and by the output signal sequence of the synthesis filter driven by the
excitation pulse sequence of the present frame.
[0044] The weighting circuit 136 is supplied with the parameter LAR'(i) from the LAR coding
circuit 118, and calculates the weighting function w(n) in a manner that the Z conversion
value thereof satisfies equation (8). This calculation can be implemented through
the use of another frequency weighting scheme. The weighting circuit 136 performs
a convolution integration of the difference from the subtractor 134 and the function
w(n), and applies the output thereof x
w(n) to the cross-correlation function circuit 128. This circuit 128 is further supplied
with the impulse response hw(n), and calculates the cross-correlation function φ
xh(-m
k) (where 1≤ m
k ≤N) which is applied to a RPE grid selector 130 and also to the pulse amplitude calculating
circuit 132.
[0045] The grid selector 130 determines or selects a grid q, using the cross-correlation
function φ
xh(-m
k), according to equation (23) and applies the selected grid to the pulse amplitude
calculating circuit 132. The circuit 132 is synchronously supplied with the above-mentioned
three outputs (viz., the autocorrelation function R
hh(

), the cross-correlation function φ
xh(-m
k) and the selected grid q), and determines an amplitude of each of the excitation
pulses within one frame. In other words, the circuit 132 determines a so-called amplitude
sequence of the excitation pulses in one frame.
[0046] A pulse coding circuit 137 receives the output sequence of the circuit 132 and encodes
the selected grid q and the amplitude sequence g
k of the excitation pulses using normalizing coefficients, and applies the encoded
information to the multiplexor 300. The normalizing coefficients are also encoded
within the pulse coding circuit 137 and applied to the multiplexor 300. The circuit
137 further decodes the encoded data (viz., the grid and the amplitude sequence and
the normalizing coefficients) to apply them to a pulse sequence generator 138. The
decoded grid and the decoded amplitude sequence are respectively denoted by q' and
g
k'. The operation of the pulse coding circuit 137 has been disclosed in the above-mentioned
Paper 1.
[0047] The pulse sequence generator 138 outputs an excitation pulse sequence of one frame
using g
k' and m
k', which pulse sequence has an amplitude g
k' at a position m
k'.
[0048] The synthesis filter 122 receives the excitation pulse sequence, and also receives
the coefficients LAR'(i) and the pitch information (Md' and β') from the circuits
118 and 124, respectively. It should be noted that the synthesis filter 122 converts
LAR'(i) into a prediction parameter a
i (1≤ i ≤Np) by means of a well known method. The filter 122 adds the excitation signal
applied thereto and one frame of 0 sequence together with to determine a response
signal sequence x(n) for the two frame signal. The sequence x'(n) can be represented
by:

This equation is identical to equation (5). The excitation signal d(n) represents
the output pulse signal generated by the pulse generating circuit 138 when 1≤ n ≤N,
while representing a series of all zeros in the case of (N + 1)≤ n ≤2N. The subtractor
134 receives x'(n) obtained using equation (24) (wherein N + 1≤ n ≤2N).
[0049] The multiplexor 300 combines the outputs of the circuits 137, 118 and 124, which
are applied to a transmission line via an output terminal 302.
[0050] A second embodiment of this invention will be discussed with reference to Figs. 3
to 5. The arrangement of Fig. 5 differs from that of Fig. 2 in that the former arrangement
further includes a switch 500, a decision circuit 502, a gate 504 and a section 506.
This section 506 is arranged in exactly the same manner as the arrangement of a section
508, although the functions of the two sections 506 and 508 are slightly different.
For the convenience of description, each of the blocks 120', 126', 128', 130', 132'
and 136' in the section 506 bears the same reference numeral as the counterpart in
the section 508 but has a prime for the purposes of differentiation. The section 508
operates in the same manner as described above and hence further descriptions thereof
will be omitted for simplicity. Similarly, in the case where the blocks included in
the section 506 operates in the same manner as their counterparts in the section 508,
the operations thereof may not be described for simplicity.
[0051] The impulse response calculating circuit 120' in the section 506 receives the decoded
LAR'(i) at the coefficient weighting circuit 414 (Fig. 4), and determines an impulse
response of a predetermined number of samples and applies the output h
w'(n) to the autocorrelation function calculating circuit 126' as well as the cross-correlation
function calculating circuit 128'. This means that the circuit 120' utilizes only
the short term prediction filter 404. It should be noted that as shown in Fig. 5 the
line provided for the pitch information (Md' and β') is not coupled to the block 120'
for disabling the long term prediction filter 402.
[0052] The autocorrelation function calculating circuit 126' calculates an autocorrelation
function R
hh'(

) according to equation (21), and applies the result to the pulse amplitude calculating
circuit 132'. The weighting circuit 136' operates in the same manner as the counterpart
136, and applies the output thereof x
w(n) to the cross-correlation function calculating circuit 128'. This circuit 138'
is further supplied with the impulse response hw'(n), and calculates the cross-correlation
function φ
xh'(-m
k) (where 1≤ m
k ≤N) which is applied to the RPE grid selector 130' and also to the pulse amplitude
calculating circuit 132'.
[0053] The grid selector 130' determines or selects a grid q', using the cross-correlation
function φ
xh'(-m
k), according to equation (23) and applies the selected grid q' to the pulse amplitude
calculating circuit 132'.
[0054] The circuit 132' is synchronously supplied with the above-mentioned three outputs
(viz., the autocorrelation function R
hh'(|m
i - m
k|), the cross-correlation function φ
xh'(-m
k) and the selected grid q'), and determines an amplitude of each of the excitation
pulses within one frame.
[0055] The decision circuit 502 is coupled to the circuits 132 and 132' to be supplied with
the outputs: the autocorrelation functions R
hh(|m
i - m
k|) and R
hh'(|m
i - m
k|), the cross-correlation function φ
xh(-m
k) and φ
xh'(-m
k), and the selected grids q and q'. The decision circuit 502 determines power or energy
J of an error signal between the incoming and reconstructed signals, according to
the following equation (25), in connection with each of the two excitation pulse series
which are obtained at the sections 508 and 506.

Equation (25) can be obtained by substituting equations (15) and (22) into equation
(9). In equation (25), R
xx(0) represents power or energy of the output x
w(n) of the weighting circuit 136 (or 136').
[0056] Alternatively, the error signal energy can approximately be obtained using the following
equation (26) instead of equation (25).

Equation (26) utilizes an error of the cross-correlation function, which can be obtained
by calculating the excitation pulse series.
[0057] The decision circuit 502 compares the two kinds of power or energy: one obtained
depending on the parameters from the section 508 (referred to as Jo) and the other
obtained depending on the parameters from the section 506 (referred to as Jo'). In
the event of Jo'<Jo, the decision circuit 502 determines that the excitation pulse
series obtained through the section 506 is suitable for use relative to that obtained
through the section 508. In this case, the decision circuit 502 instructs the switch
500 to relay the output of the section 506 to the pulse coding circuit 137. Further,
the decision circuit 502 opens the gate 504 allowing the coded information (coded-LAR(i),
coded-Md and coded-β) to be applied to the multiplexor 300. In this case, the gate
504 attaches a predetermined code to the coded-Md and -β). Contrarily, in the event
of Jo'>Jo, the decision circuit 502 forces the switch 500 to relay the output of the
section 508 to the circuit 137, and opens the gate 504 to pass the above-mentioned
coded information therethrough.
[0058] As shown, the two sections 506 and 508 are separately provided in the second embodiment.
However, this invention is not limited to such an arrangement. That is to say, the
impulse response calculating circuit 120 can be adapted to calculate the above-mentioned
two functions h
w(n) and h
w'(n). In this case the circuit 120 generates hw'(n) by making zero the parameters
Md' and β' which are applied to the coefficient weighting circuit 408. It goes without
saying that h
w(n) is first calculated and thereafter computation of the h
w'(n) is performed or vice versa, which can be applied to the other blocks wherein
two kinds of computation are implemented.
[0059] In the above-mentioned embodiments, various calculations can be carried out on a
sub-frame basis as in the prior art.
[0060] The second embodiment can be modified such that the pitch gain β is compared with
a predetermined threshold. If the pitch gain β is less than the threshold then the
pitch gain β is rendered zero. This means that the excitation pulses are generated
using the spectrum parameters only. It is understood that this modification no longer
requires the provision of the decision circuit 502 and the calculations of equations
(25) and (26). This variation can result in the reduced number of operations.
[0061] While the foregoing description describes only two embodiments of the present invention,
the various alternatives and modifications possible without departing from the scope
of the present invention, which is limited only by the appended claims, will be apparent
to those skilled in the art.
Claims for the following Contracting State(s): DE, FR, GB
1. An arrangement for encoding a speech signal using a regular pulse excitation scheme,
comprising:
first means (112, 114, 116) for being supplied with a discrete-time speech signal
and for dividing said discrete-time speech signal into a plurality of frames;
second means (118, 124) for extracting a plurality of parameters from each of said
frames supplied by said first means;
synthesis means (122) for generating a signal using said plurality of parameters and
a sequence of excitation pulses;
third means (120) for generating an impulse response function signal using said plurality
of parameters;
fourth means (126) for generating an autocorrelation function signal using said impulse
response signal;
fifth means (128) for generating a cross-correlation function signal using said impulse
response function signal and a weighted difference between one of said frames of discrete-time
speech signal and one frame of said signal generated by said synthesis means;
sixth means (130) for generating a grid signal indicative of a location of a first
excitation pulse within one frame using said cross-correlation function
signal; and seventh means (132) for receiving said autocorrelation function signal,
said cross-correlation function signal and said grid signal, said seventh means determining
an amplitude sequence of excitation pulses within one frame;
characterized in that
said third means (120) comprises:
an impulse generator (400) for generating an impulse;
a long term prediction filter (402) receiving said impulse as well as second and third
parameters being respectively representative of a pitch period and a pitch gain; and
a short term prediction filter (404) being coupled in series with said long term prediction
filter (402) and receiving one or more first parameters representative of a spectrum
envelope and the output of said long term prediction filter (402).
2. An arrangement as claimed in claim 1, wherein said second means (118, 124) comprises:
eighth means extracting said first parameters representative of a spectrum envelope
from each of said frames supplied by said first means, encoding the first parameters,
decoding the encoded first parameters and obtaining the decoded first parameters;
and
ninth means extracting said second and third parameters from each of said frames supplied
by said first means, said second and third parameters being respectively representative
of a pitch period and a pitch gain, said ninth means decoding the coded second and
third parameters and obtaining the decoded second and third parameters,
wherein the decoded first, second and third parameters are applied to said third means
(120).
3. A method for encoding a speech signal using a regular pulse excitation scheme, comprising
the steps of:
(a) receiving a discrete-time speech signal and dividing said discrete-time speech
signal into a plurality of frames;
(b) extracting a plurality of parameters from each of said frames of said discrete-time
speech signal;
c) generating a signal using said plurality of parameters and a sequence of excitation
pulses;
(d) generating an impulse response function signal using said plurality of parameters;
(e) generating an autocorrelation function signal using said impulse response signal;
(f) generating a cross-correlation function signal using said impulse response function
signal and a weighted difference between one of said frames of discrete-time speech
signal and one frame of said signal;
(g) generating a grid signal indicative of a location of a first excitation pulse
within one frame using said cross-correlation function signal; and
(h) receiving said autocorrelation function signal, said cross-correlation function
signal and said grid signal, and determining an amplitude sequence of excitation pulses
within one frame; and
(i) wherein said step (d) comprises the steps of:
generating an impulse;
receiving said impulse as well as second and third parameters being respectively representative
of a pitch period and a pitch gain, and generating an output representative of a pitch
structure; and
receiving first parameters representative of a spectrum envelope and said output representative
of a pitch structure, and generating an output representative of spectrum envelope
characteristics.
4. A method as claimed in claim 3, wherein said step (b) comprises the steps of:
extracting said first parameters representative of a spectrum envelope from each of
said frames of said discrete-time speech signal and encoding the first parameters,
decoding the encoded first parameters and obtaining the decoded first parameters;
and
extracting said second and third parameters from each of said frames of said discrete-time
speech signal wherein said second and third parameters are respectively representative
of a pitch period and a pitch gain, and decoding the coded second and third parameters
and obtaining the decoded second and third parameters,
wherein the decoded first, second and third parameters correspond to said plurality
of parameters in said step (d).
Claims for the following Contracting State(s): BE, NL, SE
1. An arrangement for encoding a speech signal using a regular pulse excitation scheme,
comprising:
first means (112, 114, 116) for being supplied with a discrete-time speech signal
and for dividing said discrete-time speech signal into a plurality of frames;
second means (118, 124) for extracting a plurality of parameters from each of said
frames supplied by said first means;
synthesis means (122) for generating a signal using said plurality of parameters and
a sequence of excitation pulses;
third means (120) for generating an impulse response function signal using said plurality
of parameters;
fourth means (126) for generating an autocorrelation function signal using said impulse
response signal; and
fifth means (128) for generating a cross-correlation function signal using said impulse
response function signal and a weighted difference between one of said frames of discrete-time
speech signal and one frame of said signal generated by said synthesis means;
characterized by:
sixth means (130) for generating a grid signal indicative of a location of a first
excitation pulse within one frame using said cross-correlation function signal; and
seventh means (132) for receiving said autocorrelation function signal, said cross-correlation
function signal and said grid signal, said seventh means determining an amplitude
sequence of excitation pulses within one frame.
2. An arrangement as claimed in claim 1, wherein said second means (118, 124) comprises:
eighth means extracting one or more first parameters representative of a spectrum
envelope from each of said frames supplied by said first means, encoding the first
parameters, decoding the encoded first parameters and obtaining the decoded first
parameters; and
ninth means extracting second and third parameters from each of said frames supplied
by said first means, said second and third parameters being respectively representative
of a pitch period and a pitch gain, said ninth means decoding the coded second and
third parameters and obtaining the decoded second and third parameters,
wherein the decoded first, second and third parameters are applied to said third means
(120).
3. An arrangement as claimed in claim 2, wherein said third means (120) comprises:
an impulse generator (400) for generating an impulse;
a long term prediction filter (402) receiving said impulse as well as said second
and third parameters; and
a short term prediction filter (404) being coupled in series with said long term prediction
filter (402) and receiving said first parameters and the output of said long term
prediction filter.
4. A method for encoding a speech signal using a regular pulse excitation scheme, comprising
the steps of:
(a) receiving a discrete-time speech signal and dividing said discrete-time speech
signal into a plurality of frames;
(b) extracting a plurality of parameters from each of said frames of said discrete-time
speech signal;
(c) generating a signal using said plurality of parameters and a sequence of excitation
pulses;
(d) generating an impulse response function signal using said plurality of parameters;
(e) generating an autocorrelation function signal using said impulse response signal;
and
(f) generating a cross-correlation function signal using said impulse response function
signal and a weighted difference between one of said frames of discrete-time speech
signal and one frame of said signal;
characterized by:
(g) generating a grid signal indicative of a location of a first excitation pulse
within one frame using said cross-correlation function signal; and
(h) receiving said autocorrelation function signal, said cross-correlation function
signal and said grid signal, and determining an amplitude sequence of excitation pulses
within one frame.
5. A method as claimed in claim 4, wherein said step (b) comprises the steps of:
extracting one or more first parameters representative of a spectrum envelope from
each of said frames of said discrete-time speech signal and encoding the first parameters,
decoding the encoded first parameters and obtaining the decoded first parameters;
and
extracting second and third parameters from each of said frames of said discrete-time
speech signal wherein said second and third parameters are respectively representative
of a pitch period and a pitch gain, and decoding the coded second and third parameters
and obtaining the decoded second and third parameters,
wherein the decoded first, second and third parameters correspond to said plurality
of parameters in said step (d).
6. A method as claimed in claim 5, wherein said step (d) comprises the steps of:
generating an impulse;
receiving said impulse as well as said second and third parameters, and generating
an output representative of a pitch structure; and
receiving said first parameters and said output representative of a pitch structure,
and generating an output representative of spectrum envelope characteristics.
Patentansprüche für folgende(n) Vertragsstaat(en): DE, FR, GB
1. Anordnung zur Codierung eines Sprachsignals unter Verwendung eines Regulär-Pulsanregungsschemas
mit:
einer ersten Einrichtung (112, 114, 116), die dazu bestimmt ist, mit einem Diskretzeit-Sprachsignal
versorgt zu werden und das Diskretzeit-Sprachsignal in mehrere Rahmen zu teilen;
einer zweiten Einrichtung (118, 124) zum Extrahieren mehrerer Parameter aus jedem
der von der ersten Einrichtung übergebenen Rahmen;
Syntheseeinrichtung (122) zum Erzeugen eines Signals unter Verwendung der mehreren
Parameter und einer Folge von Anregungsimpulsen;
einer dritten Einrichtung (120) zum Erzeugen eines Impulsantwortfunktionssignals unter
Verwendung der mehreren Parameter;
einer vierten Einrichtung (126) zum Erzeugen eines Autokorrelationsfunktionssignals
unter Verwendung des Impulsantwortfunktionssignals;
einer fünften Einrichtung (128) zum Erzeugen eines Kreuzkorrelationsfunktionssignals
unter Verwendung des Impulsantwortfunktionssignals und einer gewichteten Differenz
zwischen einem der Rahmen des Diskretzeit-Sprachsignals und einem Rahmen des durch
die Syntheseeinrichtung erzeugten Signals;
einer sechsten Einrichtung (130) zum Erzeugen eines Rastersignals, das die Lage eines
ersten Anregungsimpulses innerhalb eines Rahmens anzeigt, unter Verwendung des Kreuzkorrelationsfunktionssignals;
und
einer siebenten Einrichtung (132) zum Empfangen des Autokorrelationsfunktionssignals,
des Kreuzkorrelationsfunktionssignals und des Rastersignals, wobei die siebente Einrichtung
eine Amplitudenfolge der Anregungsimpulse innerhalb eines Rahmens bestimmt;
dadurch gekennzeichnet, daß die dritte Einrichtung (120) aufweist:
einen Impulsgenerator (400) zum Erzeugen eines Impulses;
ein Langzeit-Prädiktionsfilter (402), das den Impuls sowie zweite und dritte Parameter
empfängt, die jeweils eine Tonhöhenperiode bzw. eine Tonhöhenverstärkung darstellen;
und
ein Kurzzeit-Prädiktionsfilter (404), das in Reihe mit dem Langzeit-Prädiktionsfilter
(402) geschaltet ist und einen oder mehrere erste Parameter empfängt, die eine spektrale
Hüllkurve und das Ausgangssignal des Langzeit-Prädiktionsfilters (402) darstellen.
2. Anordnung nach Anspruch 1, wobei die zweite Einrichtung (118, 124) aufweist:
eine achte Einrichtung, die die ersten Parameter, die eine spektrale Hüllkurve darstellen,
aus jedem der von der ersten Einrichtung übergebenen Rahmen extrahiert, die ersten
Parameter codiert, die codierten ersten Parameter decodiert und die decodierten ersten
Parameter erzeugt; und
eine neunte Einrichtung, die die zweiten und dritten Parameter aus jedem der von der
ersten Einrichtung übergebenen Rahmen extrahiert, wobei die zweiten und dritten Parameter
jeweils eine Tonhöhenperiode bzw. eine Tonhöhenverstärkung darstellen, wobei die neunte
Einrichtung die codierten zweiten und dritten Parameter decodiert und die decodierten
zweiten und dritten Parameter erzeugt,
wobei die decodierten ersten, zweiten und dritten Parameter an die dritte Einrichtung
(120) übergeben werden.
3. Verfahren zum Codieren eines Sprachsignals unter Verwendung eines Regulär-Pulsanregungsschemas
mit den Schritten:
(a) Empfangen eines Diskretzeit-Sprachsignals und Teiler. des Diskretzeit-Sprachsignals
in mehrere Rahmen;
(b) Extrahieren mehrerer Parameter aus jedem der Rahmen des Diskretzeit-Sprachsignals;
(c) Erzeugen eines Signals unter Verwendung der mehreren Parameter und einer Folge
von Anregungsimpulsen;
(d) Erzeugen eines Impulsantwortfunktionssignals unter Verwendung der mehreren Parameter;
(e) Erzeugen eines Autokorrelationsfunktionssignals unter Verwendung des Impulsantwortsignals;
(f) Erzeugen eines Kreuzkorrelationsfunktionssignals unter Verwendung des Impulsantwortfunktionssignals
und einer gewichteten Differenz zwischen einem der Rahmen des Diskretzeit-Sprachsignals
und einem Rahmen des Signals;
(g) Erzeugen eines Rastersignals, das die Lage eines ersten Anregungsimpulses innerhalb
eines Rahmens kennzeichnet, unter Verwendung des Kreuzkorrelationsfunktionssignals;
und
(h) Empfangen des Autokorrelationsfunktionssignals, des Kreuzkorrelationsfunktionssignals
und des Rastersignals und Bestimmen einer Amplitudenfolge der Anregungsimpulse innerhalb
eines Rahmens; und
(i) wobei der Schritt (d) die Schritte aufweist:
Erzeugen eines Impulses;
Empfangen des Impulses sowie zweiter und dritter Parameter, die jeweils eine Tonhöhenperiode
bzw. eine Tonhöhenverstärkung darstellen, und Erzeugen eines Ausgangssignals, das
eine Tonhöhenstruktur darstellt; und
Empfangen erster Parameter, die eine spektrale Hüllkurve darstellen, und des Ausgangssignals,
das eine Tonhöhenstruktur darstellt, und Erzeugen eines Ausgangssignals, das eine
spektrale Hüllkurvencharakteristik darstellt.
4. Verfahren nach Anspruch 3, wobei der Schritt (b) die Schritte aufweist:
Extrahieren der ersten Parameter, die eine spektrale Hüllkurve darstellen, aus jedem
der Rahmen des Diskretzeit-Sprachsignals und Codieren der ersten Parameter, Decodieren
der codierten ersten Parameter und Erzeugen der decodierten ersten Parameter; und
Extrahieren der zweiten und dritten Parameter aus jedem der Rahmen des Diskretzeit-Sprachsignals,
wobei die zweiten und dritten Parameter jeweils eine Tonhöhenperiode bzw. eine Tonhöhenverstärkung
darstellen, und Decodieren der codierten zweiten und dritten Parameter und Erzeugen
der decodierten zweiten und dritten Parameter,
wobei die decodierten ersten, zweiten und dritten Parameter den mehreren Parametern
in Schritt (d) entsprechen.
Patentansprüche für folgende(n) Vertragsstaat(en): BE, NL, SE
1. Anordnung zur Codierung eines Sprachsignals unter Verwendung eines Regulär-Pulsanregungsschemas
mit:
einer ersten Einrichtung (112, 114, 116), die dazu bestimmt ist, mit einem Diskretzeit-Sprachsignal
versorgt zu werden und das Diskretzeit-Sprachsignal in mehrere Rahmen zu teilen;
einer zweiten Einrichtung (118, 124) zum Extrahieren mehrerer Parameter aus jedem
der von der ersten Einrichtung übergebenen Rahmen;
Syntheseeinrichtung (122) zum Erzeugen eines Signals unter Verwendung der mehreren
Parameter und einer Folge von Anregungsimpulsen;
einer dritten Einrichtung (120) zum Erzeugen eines Impulsantwortfunktionssignals unter
Verwendung der mehreren Parameter;
einer vierten Einrichtung (126) zum Erzeugen eines Autokorrelationsfunktionssignals
unter Verwendung des Impulsantwortfunktionssignals; und
einer fünften Einrichtung (128) zum Erzeugen eines Kreuzkorrelationsfunktionssignals
unter Verwendung des Impulsantwortfunktionssignals und einer gewichteten Differenz
zwischen einem der Rahmen des Diskretzeit-Sprachsignals und einem Rahmen des durch
die Syntheseeinrichtung erzeugten Signals;
gekennzeichnet durch:
eine sechste Einrichtung (130) zum Erzeugen eines Rastersignals, das die Lage eines
ersten Anregungsimpulses innerhalb eines Rahmens anzeigt, unter Verwendung des Kreuzkorrelationsfunktionssignals;
und
eine siebente Einrichtung (132) zum Empfangen des Autokorrelationsfunktionssignals,
des Kreuzkorrelationsfunktionssignals und des Rastersignals, wobei die siebente Einrichtung
eine Amplitudenfolge der Anregungsimpulse innerhalb eines Rahmens bestimmt.
2. Anordnung nach Anspruch 1, wobei die zweite Einrichtung (118, 124) aufweist:
eine achte Einrichtung, die einen oder mehrere erste Parameter, die eine spektrale
Hüllkurve darstellen, aus jedem der von der ersten Einrichtung übergebenen Rahmen
extrahiert, die ersten Parameter codiert, die codierten ersten Parameter decodiert
und die decodierten ersten Parameter erzeugt; und
eine neunte Einrichtung, die zweite und dritte Parameter aus jedem der von der ersten
Einrichtung übergebenen Rahmen extrahiert, wobei die zweiten und dritten Parameter
jeweils eine Tonhöhenperiode bzw. eine Tonhöhenverstärkung darstellen, wobei die neunte
Einrichtung die codierten zweiten und dritten Parameter decodiert und die decodierten
zweiten und dritten Parameter erzeugt,
wobei die decodierten ersten, zweiten und dritten Parameter an die dritte Einrichtung
(120) übergeben werden.
3. Anordnung nach Anspruch 2, wobei die dritte Einrichtung (120) aufweist:
einen Impulsgenerator (400) zum Erzeugen eines Impulses;
ein Langzeit-Prädiktionsfilter (402), das den Impuls sowie die zweiten und dritten
Parameter empfängt; und
ein Kurzzeit-Prädiktionsfilter (404), das in Reihe mit dem Langzeit-Prädiktionsfilter
(402) geschaltet ist und die ersten Parameter und das Ausgangssignal des Langzeit-Prädiktionsfilters
empfängt.
4. Verfahren zum Codieren eines Sprachsignals unter Verwendung eines Regulär-Pulsanregungsschemas
mit den Schritten:
(a) Empfangen eines Diskretzeit-Sprachsignals und Teilen des Diskretzeit-Sprachsignals
in mehrere Rahmen;
(b) Extrahieren mehrerer Parameter aus jedem der Rahmen des Diskretzeit-Sprachsignals;
(c) Erzeugen eines Signals unter Verwendung der mehreren Parameter und einer Folge
von Anregungsimpulsen;
(d) Erzeugen eines Impulsantwortfunktionssignals unter Verwendung der mehreren Parameter;
(e) Erzeugen eines Autokorrelationsfunktionssignals unter Verwendung des Impulsantwortsignals;
und
(f) Erzeugen eines Kreuzkorrelationsfunktionssignals unter Verwendung des Impulsantwortfunktionssignals
und einer gewichteten Differenz zwischen einem der Rahmen des Diskretzeit-Sprachsignals
und einem Rahmen des Signals;
gekennzeichnet durch:
(g) Erzeugen eines Rastersignals, das die Lage eines ersten Anregungsimpulses innerhalb
eines Rahmens kennzeichnet, unter Verwendung des Kreuzkorrelationsfunktionssignals;
und
(h) Empfangen des Autokorrelationsfunktionssignals, des Kreuzkorrelationsfunktionssignals
und des Rastersignals und Bestimmen einer Amplitudenfolge der Anregungsimpulse innerhalb
eines Rahmens.
5. Verfahren nach Anspruch 4, wobei der Schritt (b) die Schritte aufweist:
Extrahieren eines oder mehrerer erster Parameter, die eine spektrale Hüllkurve darstellen,
aus jedem der Rahmen des Diskretzeit-Sprachsignals und Codieren der ersten Parameter,
Decodieren der codierten ersten Parameter und Erzeugen der decodierten ersten Parameter;
und
Extrahieren von zweiten und dritten Parametern aus jedem der Rahmen des Diskretzeit-Sprachsignals,
wobei die zweiten und dritten Parameter jeweils eine Tonhöhenperiode bzw. eine Tonhöhenverstärkung
darstellen, und Decodieren der codierten zweiten und dritten Parameter und Erzeugen
der decodierten zweiten und dritten Parameter,
wobei die decodierten ersten, zweiten und dritten Parameter den mehreren Parametern
in Schritt (d) entsprechen.
6. Verfahren nach Anspruch 5, wobei der Schritt (d) die Schritte aufweist:
Erzeugen eines Impulses;
Empfangen des Impulses sowie der zweiten und dritten Parameter und Erzeugen eines
Ausgangssignals, das eine Tonhöhenstruktur darstellt; und
Empfangen der ersten Parameter und des Ausgangssignals, das eine Tonhöhenstruktur
darstellt, und Erzeugen eines Ausgangssignals, das eine spektrale Hüllkurvencharakteristik
darstellt.
Revendications pour l'(les) Etat(s) contractant(s) suivant(s): DE, FR, GB
1. Dispositif pour coder un signal vocal en utilisant une méthode d'excitation régulière
d'impulsion, comprenant :
des premiers moyens (112, 114, 116) servant à être délivrés avec un signal vocal à
temps discret et à diviser ledit signal vocal à temps discret en plusieurs trames
;
des deuxièmes moyens (118, 124) servant à extraire plusieurs paramètres depuis chacune
desdites trames délivrées par lesdits premiers moyens ;
des moyens de synthèse (122) servant à produire un signal en utilisant lesdits plusieurs
paramètres et une séquence d'impulsions d'excitation ;
des troisièmes moyens (120) servant à produire un signal à fonction de réponse impulsionnelle
en utilisant lesdits plusieurs paramètres ;
des quatriémes moyens (126) servant a produire un signal à fonction d'autocorrélation
en utilisant ledit signal à fonction de réponse impulsionnelle;
des cinquièmes moyens (128) servant à produire un signal à fonction d'intercorrélation
en utilisant ledit signal à fonction de réponse impulsionnelle et une différence pondérée
entre l'une desdites trames de signal vocal à temps discret et une trame dudit signal
produit par lesdits moyens de synthèse ;
des sixièmes moyens (130) servant à produire un signal de grille indiquant un emplacement
d'une première impulsion d'excitation dans une trame en utilisant ledit signal à fonction
d'intercorrélation ; et
des septièmes moyens (132) servant à recevoir ledit signal à fonction d'autocorrélation,
ledit signal à fonction d'intercorrélation et ledit signal de grille, lesdits septièmes
moyens déterminant une séquence d'amplitude d'impulsions d'excitation dans une trame
;
caractérisé en ce que
lesdits troisièmes moyens (120) comprennent :
un générateur d'impulsion (400) pour produire une impulsion ;
un filtre de prédiction à long terme (402) recevant ladite impulsion ainsi que des
deuxièmes et troisièmes paramètres représentant respectivement une période de pas
et un gain de pas ; et
un filtre de prédiction à court terme (404) connecté en série audit filtre de prédiction
à long terme (402) et recevant un ou plusieurs premiers paramètres représentant une
enveloppe de spectre et la sortie dudit filtre de prédiction à long terme (402).
2. Dispositif selon la revendication 1, dans lequel lesdits deuxièmes moyens (118, 124)
comprennent :
des huitièmes moyens extrayant lesdits premiers paramètres représentant une enveloppe
de spectre depuis chacune desdites trames délivrées par lesdits premiers moyens, codant
les premiers paramètres, décodant les premiers paramètres codés et obtenant les premiers
paramètres décodés ; et
des neuvièmes moyens extrayant lesdits deuxièmes et troisièmes paramètres depuis chacune
desdites trames délivrées par lesdits premiers moyens, lesdits deuxièmes et troisièmes
paramètres représentant respectivement une période de pas et un gain de pas, lesdits
neuvièmes moyens décodant les deuxièmes et troisièmes paramètres codés et obtenant
les deuxièmes et troisièmes paramètres décodés,
dans lequel les premiers, deuxièmes et troisièmes paramètres décodés sont appliqués
auxdits troisièmes moyens (120).
3. Procédé pour coder un signal vocal en utilisant une méthode d'excitation régulière
d'impulsion, comprenant les étapes consistant à :
(a) recevoir un signal vocal à temps discret et à diviser ledit signal vocal à temps
discret en plusieurs trames ;
(b) extraire plusieurs paramètres depuis chacune desdites trames dudit signal vocal
à temps discret ;
(c) produire un signal en utilisant lesdits plusieurs paramètres et une séquence d'impulsions
d'excitation ;
(d) produire un signal à fonction de réponse impulsionnelle en utilisant lesdits plusieurs
paramètres ;
(e) produire un signal à fonction d'autocorrélation en utilisant ledit signal à réponse
impulsionnelle ;
(f) produire un signal à fonction d'intercorrélation en utilisant ledit signal à fonction
de réponse impulsionnelle et une différence pondérée entre l'une desdites trames de
signal vocal à temps discret et une trame dudit signal ;
(g) produire un signal de grille indiquant un emplacement d'une première impulsion
d'excitation dans une trame en utilisant ledit signal à fonction d'intercorrélation
; et à
(h) recevoir ledit signal à fonction d'autocorrélation, ledit signal à fonction d'intercorrélation
et ledit signal de grille, et déterminer une séquence d'amplitude d'impulsions d'excitation
dans une trame ; et
(i) dans lequel ladite étape (d) comprend les étapes consistant à :
produire une impulsion ;
recevoir ladite impulsion ainsi que les deuxièmes et troisièmes paramètres représentant
respectivement une période de pas et un gain de pas, et à produire une sortie représentant
une structure de pas ; et à
recevoir les premiers paramètres représentant d'une enveloppe de spectre et ladite
sortie représentant une structure de pas, et à produire une sortie représentant des
caractéristiques d'enveloppe de spectre.
4. Procédé selon la revendication 3, dans lequel ladite étape (b) comprend les étapes
consistant à :
extraire lesdits premiers paramètres représentant une enveloppe de spectre depuis
chacune desdites trames dudit signal vocal à temps discret et à coder les premiers
paramètres, à décoder les premiers paramètres codés et à obtenir les premiers paramètres
décodés ; et à
extraire lesdits deuxièmes et troisièmes paramètres depuis chacune desdites trames
dudit signal vocal à temps discret dans lequel lesdits deuxièmes et troisièmes paramètres
représentent respectivement une période de pas et un gain de pas, et à décoder les
deuxièmes et troisièmes paramètres codés et à obtenir les deuxièmes et troisièmes
paramètres décodés,
dans lequel les premiers, deuxièmes et troisièmes paramètres décodés correspondent
auxdits plusieurs paramètres dans ladite étape (d).
Revendications pour l'(les) Etat(s) contractant(s) suivant(s): BE, NL, SE
1. Dispositif pour coder un signal vocal en utilisant une méthode d'excitation régulière
d'impulsion, comprenant :
des premiers moyens (112, 114, 116) servant à être délivrés avec un signal vocal à
temps discret et à diviser ledit signal vocal à temps discret en plusieurs trames
;
des deuxièmes moyens (118, 124) servant à extraire plusieurs paramètres depuis chacune
desdites trames délivrées par lesdits premiers moyens ;
des moyens de synthèse (122) servant à produire un signal en utilisant lesdits plusieurs
paramètres et une séquence d'impulsions d'excitation ;
des troisièmes moyens (120) servant à produire un signal à fonction de réponse impulsionnelle
en utilisant lesdits plusieurs paramètres ;
des quatrièmes moyens (126) servant à produire un signal à fonction d'autocorrélation
en utilisant ledit signal à fonction de réponse impulsionnelle;
des cinquièmes moyens (128) servant à produire un signal à fonction d'intercorrélation
en utilisant ledit signal à fonction de réponse impulsionnelle et une différence pondérée
entre l'une desdites trames de signal vocal à temps discret et une trame dudit signal
produit par lesdits moyens de synthèse ;
caractérisé par
des sixièmes moyens (130) servant à produire un signal de grille indiquant un emplacement
d'une première impulsion d'excitation dans une trame en utilisant ledit signal à fonction
d'intercorrélation ; et par
des septièmes moyens (132) servant à recevoir ledit signal à fonction d'autocorrélation,
ledit signal à fonction d'intercorrélation et ledit signal de grille, lesdits septièmes
moyens déterminant une séquence d'amplitude d'impulsions d'excitation dans une trame.
2. Dispositif selon la revendication 1, dans lequel lesdits deuxièmes moyens (118, 124)
comprennent :
des huitièmes moyens extrayant un ou plusieurs premiers paramètres représentant une
enveloppe de spectre depuis chacune desdites trames délivrées par lesdits premiers
moyens, codant les premiers paramètres, décodant les premiers paramètres codés et
obtenant les premiers paramètres décodés ; et
des neuvièmes moyens extrayant les deuxièmes et troisièmes paramètres depuis chacune
desdites trames délivrées par lesdits premiers moyens, lesdits deuxièmes et troisièmes
paramètres représentant respectivement une période de pas et un gain de pas, lesdits
neuvièmes moyens décodant les deuxièmes et troisièmes paramètres codés et obtenant
les deuxièmes et troisièmes paramètres décodés,
dans lequel les premiers, deuxièmes et troisièmes paramètres décodés sont appliqués
auxdits troisièmes moyens (120).
3. Dispositif selon la revendication 2, dans lequel lesdits troisièmes moyens (120) comprennent
:
un générateur d'impulsion (400) pour produire une impulsion ;
un filtre de prédiction à long terme (402) recevant ladite impulsion ainsi que lesdits
deuxièmes et troisièmes paramètres ; et
un filtre de prédiction à court terme (404) connecté en série audit filtre de prédiction
à long terme (402) et recevant lesdits premiers paramètres et la sortie dudit filtre
de prédiction à long terme (402).
4. Procédé pour coder un signal vocal en utilisant une méthode d'excitation régulière
d'impulsion, comprenant les étapes consistant à :
(a) recevoir un signal vocal à temps discret et à diviser ledit signal vocal à temps
discret en plusieurs trames ;
(b) extraire plusieurs paramètres depuis chacune desdites trames dudit signal vocal
à temps discret ;
(c) produire un signal en utilisant lesdits plusieurs paramètres et une séquence d'impulsions
d'excitation ;
(d) produire un signal à fonction de réponse impulsionnelle en utilisant lesdits plusieurs
paramètres ;
(e) produire un signal à fonction d'autocorrélation en utilisant ledit signal à réponse
impulsionnelle ; et à
(f) produire un signal à fonction d'intercorrélation en utilisant ledit signal à fonction
de réponse impulsionnelle et une différence pondérée entre l'une desdites trames de
signal vocal à temps discret et une trame dudit signal ;
caractérisé par le fait de
(g) produire un signal de grille indiquant un emplacement d'une première impulsion
d'excitation dans une trame en utilisant ledit signal à fonction d'intercorrélation
; et de
(h) recevoir ledit signal à fonction d'autocorrélation, ledit signal à fonction d'intercorrélation
et ledit signal de grille, et de déterminer une séquence d'amplitude d'impulsions
d'excitation dans une trame.
5. Procédé selon la revendication 4, dans lequel ladite étape (b) comprend les étapes
consistant à :
extraire un ou plusieurs premiers paramètres représentant une enveloppe de spectre
depuis chacune desdites trames dudit signal vocal à temps discret et à coder les premiers
paramètres, à décoder les premiers paramètres codés et à obtenir les premiers paramètres
décodés ; et à
extraire lesdits deuxièmes et troisièmes paramètres depuis chacune desdites trames
dudit signal vocal à temps discret dans lequel lesdits deuxièmes et troisièmes paramètres
représentent respectivement une période de pas et un gain de pas, et à décoder les
deuxièmes et troisièmes paramètres codés et à obtenir les deuxièmes et troisièmes
paramètres décodés,
dans lequel les premiers, deuxièmes et troisièmes paramètres décodés correspondent
auxdits plusieurs paramètres dans ladite étape (d).
6. Procédé selon la revendication 5, dans lequel ladite étape (d) comprend les étapes
consistant à :
produire une impulsion ;
recevoir ladite impulsion ainsi que lesdits deuxièmes et troisièmes paramètres, et
à produire une sortie représentant une structure de pas ; et à
recevoir lesdits premiers paramètres et ladite sortie représentant une structure de
pas, et à produire une sortie représentant des caractéristiques d'enveloppe de spectre.