TECHNICAL FIELD
[0001] The present disclosure relates to a technique for digitally encoding a sound signal,
for example a speech or audio signal, in view of transmitting or storing, and synthesizing
this sound signal. An encoder converts the sound signal into a digital bit-stream
using a bit-budget. A decoder or synthesizer then operates on the transmitted or stored
bit-stream and converts it back to the sound signal. The encoder and decoder/synthesizer
are commonly known as a codec.
[0002] More specifically, but not exclusively, the present disclosure relates a method and
device for efficiently distributing the bit-budget in a codec.
BACKGROUND
[0003] One of the best techniques for encoding sound at low bit rates is the Code-Excited
Linear Prediction (CELP) coding. In CELP coding, the sound signal is sampled and the
sampled sound signal is processed in successive blocks of
L samples usually called frames, where
L is a predetermined number corresponding typically to 20 ms. The main principle behind
CELP is called
"Analysis-by-Synthesis" where possible decoder outputs are synthesized during the encoding process and then
compared to the original sound signal. This search minimizes a mean-squared error
between the input sound signal and the synthesized sound signal in a perceptually
weighted domain.
[0004] In CELP-based coding, the sound signal is typically synthesized by filtering an excitation
through an all-pole digital filter 1/
A(
z), often called synthesis filter. Filter
A(
z) is estimated by means of Linear Prediction (LP) and represents shortterm correlations
between sound signal samples. The LP filter coefficients are usually calculated once
per frame. In CELP codecs, the frame is further divided into several (usually two
(2) to five (5)) sub-frames to encode the excitation that is typically composed of
two portions searched sequentially. Their respective gains may then be jointly quantized.
In the following description, the number of sub-frames is denoted as
N and the index of a particular sub-frame is denoted as
n where
n = 0,...,
N-1.
[0005] The first portion of the excitation is usually selected from an adaptive codebook.
The adaptive codebook excitation portion exploits the quasi periodicity (or long-term
correlations) of voiced speech signal by searching in the past excitation the segment
most similar to the segment being currently encoded. The adaptive codebook excitation
portion is described by an adaptive codebook index, i.e. a delay parameter corresponding
to a pitch period, and an appropriate adaptive codebook gain, both sent to the decoder
or stored to reconstruct the same excitation as in the encoder.
[0006] The second portion of the excitation is usually an innovation signal selected from
an innovation codebook. The innovation signal models the evolution (difference) between
the previous speech segment and the currently encoded segment. The second portion
of the excitation is described by an index of a codevector selected from the innovation
codebook, and by an innovation codebook gain (this is also referred to as fixed codebook
index and fixed codebook gain).
[0007] In order to improve the coding efficiency, recent codecs such as, for example, G.718
as described in Reference [1] and EVS as described in Reference [2], are based on
classification of the input sound signal. Based on the signal characteristics, basic
CELP coding is expanded into several different coding modes. Consequently, the classification
needs to be transmitted to the decoder or stored as a signaling information. Another
signaling information that is usually efficient to transmit is, for example, an audio
bandwidth information.
[0008] Thus, in a CELP codec, so-called CELP
"core module" parts may include:
- The LP filter coefficients;
- The adaptive codebook;
- The innovation (fixed) codebook; and
- The adaptive and innovation codebook gains.
[0009] Most recent CELP codecs are based on a constant bit rate (CBR) principle. In CBR
codecs a bit-budget to encode a given frame is constant during the encoding, regardless
of the sound signal content or network characteristics. In order to obtain the best
possible quality at a given constant bit rate, the bit-budget is carefully distributed
among the different coding parts. In practice, the bit-budget per coding part at a
given bit rate is usually fixed and stored in codec ROM tables. However, when the
number of bit rates supported by a codec increases, the length of the ROM tables proportionally
increases and the search within these tables becomes less efficient.
[0010] The problem of large ROM tables is even more significant in complex codecs where
the bit-budget allocated to the CELP core module might fluctuate even at codec constant
bit rate. For example, in a complex multi-module codec where the bit-budget at a constant
bit rate is allocated between different modules based on, for example, a number of
input audio channels, network feedback, audio bandwidth, input signal characteristics,
etc., the codec total bit-budget is distributed among the CELP core module and other
different modules. Examples of such other different modules may comprise, but are
not limited to, a bandwidth extension (BWE), a stereo module, a frame error concealment
(FEC) module etc. which are collectively referred to in the present description as
"supplementary codec modules". It is usually advantageous to keep the allocated bit-budget per supplementary module
variable based on signal characteristics or network feedback. Also, the supplementary
codec modules can be adaptively switched on and off. This variability usually does
not cause problems for encoding supplementary modules as the number of parameters
in these modules is usually small. However, the fluctuating bit-budget allocated to
supplementary codec modules results in a fluctuating bit-budget allocated to the relatively
complex CELP core module.
[0011] In practice, the bit-budget allocated to the CELP core module at a given bit rate
is usually obtained by reducing the codec total bit-budget with the bit-budget allocated
to all active supplementary codec modules which may include a codec signaling bit-budget.
Consequently, the bit-budget allocated to the CELP core module can fluctuate between
a relatively large minimum and maximum bit rate span with a granularity as small as
1 bit (i.e. 0.05 kbps at a frame length of 20 ms).
[0012] Dedicating ROM table entries for all possible CELP core module bit rates is obviously
inefficient. Therefore, there is a need for a more efficient and flexible distribution
of the bit-budget among the different modules with fine bit rate granularity based
on a limited number of intermediate bit rates.
US2010/241425A1 discloses a technique for digitally encoding a sound signal. It relates to a method
for encoding transition frames and frames following the transition in a sound signal,
in order to reduce the error propagation at the decoder in case of frame erasure and/or
to enhance coding efficiency mainly at the beginning of voiced segments. In particular,
the method replaces the adaptive codebook by a codebook of glottal impulse shapes
in transition frames and in frames following the transition.
SUMMARY
[0013] According to a first aspect, the present disclosure is concerned with a method for
encoding a sound signal according to claim 1.
[0014] According to a second aspect, there is provided a device for encoding a sound signal
according to claim 5.
[0015] The foregoing and other objects, advantages and features of the method and device
will become more apparent upon reading of the following non-restrictive description
of illustrative embodiments thereof, given by way of example only with reference to
the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] In the appended drawings:
Figure 1 is a schematic block diagram of a stereo sound processing and communication
system depicting a possible context of implementation of the bit-budget allocating
method and device as disclosed in the following description;
Figure 2 is a block diagram illustrating concurrently a bit-budget allocating method
and device of the present disclosure; and
Figure 3 is a simplified block diagram of an example configuration of hardware components
forming the bit-budget allocating method and device of the present disclosure.
DETAILED DESCRIPTION
[0017] Figure 1 is a schematic block diagram of a stereo sound processing and communication
system 100 depicting a possible context of implementation of the bit-budget allocating
method and device as disclosed in the following description. It should be noted that
the presented bit-budget allocating method and device are not limited to stereo, but
can be used also in multi-channel coding or mono coding.
[0018] The stereo sound processing and communication system 100 of Figure 1 supports transmission
of a stereo sound signal across a communication link 101. The communication link 101
may comprise, for example, a wire or an optical fiber link. Alternatively, the communication
link 101 may comprise at least in part a radio frequency link. The radio frequency
link often supports multiple, simultaneous communications requiring shared bandwidth
resources such as may be found with cellular telephony. Although not shown, the communication
link 101 may be replaced by a storage device in a single device implementation of
the processing and communication system 100 that records and stores the encoded stereo
sound signal for later playback.
[0019] Still referring to Figure 1, for example a pair of microphones 102 and 122 produces
the left 103 and right 123 channels of an original analog stereo sound signal detected.
As indicated in the foregoing description, the sound signal may comprise, in particular
but not exclusively, speech and/or audio.
[0020] The left 103 and right 123 channels of the original analog sound signal are supplied
to an analog-to-digital (A/D) converter 104 for converting them into left 105 and
right 125 channels of an original digital stereo sound signal. The left 105 and right
125 channels of the original digital stereo sound signal may also be recorded and
supplied from a storage device (not shown).
[0021] A stereo sound encoder 106 encodes the left 105 and right 125 channels of the digital
stereo sound signal thereby producing a set of encoding parameters that are multiplexed
under the form of a bit-stream 107 delivered to an optional error-correcting encoder
108. The optional error-correcting encoder 108, when present, adds redundancy to the
binary representation of the encoding parameters in the bit-stream 107 before transmitting
the resulting bit-stream 111 over the communication link 101.
[0022] On the receiver side, an optional error-correcting decoder 109 utilizes the above
mentioned redundant information in the received digital bit-stream 111 to detect and
correct errors that may have occurred during transmission over the communication link
101, producing a bit-stream 112 with received encoding parameters. A stereo sound
decoder 110 converts the received encoding parameters in the bit-stream 112 for creating
synthesized left 113 and right 133 channels of the digital stereo sound signal. The
left 113 and right 133 channels of the digital stereo sound signal reconstructed in
the stereo sound decoder 110 are converted to synthesized left 114 and right 134 channels
of the analog stereo sound signal in a digital-to-analog (D/A) converter 115.
[0023] The synthesized left 114 and right 134 channels of the analog stereo sound signal
are respectively played back in a pair of loudspeaker units 116 and 136 (the pair
of loudspeaker units 116 and 136 can obviously be replaced by a headphone). Alternatively,
the left 113 and right 133 channels of the digital stereo sound signal from the stereo
sound decoder 110 may also be supplied to and recorded in a storage device (not shown).
[0024] As a non-limitative example, the bit-budget allocating method and device according
to the present disclosure can be implemented in the sound encoder 106 and decoder
110 of Figure 1. It should be noted that Figure 1 can be extended to cover the case
of multi-channel and/or scene-based audio and/or independent streams encoding and
decoding (e.g. surround and high order ambisonics).
[0025] Figure 2 is a block diagram illustrating concurrently the bit-budget allocating method
200 and device 250 according to the present disclosure.
[0026] Here, it should be noted that the bit-budget allocating method 200 and device 250
operate on a frame by frame basis and the following description is related to one
of the successive frames of the sound signal being encoded, unless otherwise stated.
[0027] In Figure 2, CELP core module encoding whose bit-budget fluctuates from frame to
frame as a result of a fluctuating number of bits used for encoding the supplementary
codec modules is considered. Also, the distribution of bit-budget among the different
CELP core module parts is symmetrically done at the encoder 106 and the decoder 110
and is based on the bit-budget allocated to encoding of the CELP core module.
[0028] The following description presents a non-restrictive example of implementation in
an EVS-based codec using the Generic Coding mode. The EVS-based codec is a codec based
on the EVS standard as described in Reference [2], with modifications to permit other
CELP-core bit rates or codec improvements. The EVS-based codec in this disclosure
is used within a coding framework using supplementary coding modules such as metadata,
stereo or multi-channel coding (this is referred to hereinafter as Extended EVS codec).
Principles similar to those as described in the present disclosure can be applied
to other coding modes (e.g. Voiced Coding, Transition Coding, Inactive Coding, ...)
within the EVS-based codec. Moreover, similar principles can be implemented in any
other codec different from EVS and using a coding scheme other than CELP.
Operation 201
[0029] Referring to Figure 2, a total bit-budget
btotal is allocated to the codec for each successive frame of the sound signal. In case
of CBR, this codec total bit-budget
btotal is constant. It is also possible to use the bit-budget allocating method 200 and
device 250 in variable bit rate codecs wherein the codec total bit-budget
btotal could vary from frame to frame (as in the case with the extended EVS codec).
Operations 202
[0030] In operations 202, counters 252 determine (count) the number of bits (bit-budget)
bsupplementary used for encoding the supplementary codec modules and the number of bits (bit-budget)
bcodec_signaling (not shown) for transmitting codec signaling to the decoder.
[0031] Supplementary codec modules may comprise a stereo module, a Frame-Erasure concealment
(FEC) module, a BandWidth Extension (BWE) module, metadata coding module, etc. In
the following illustrative embodiment, the supplementary modules comprise a stereo
module and a BWE module. Of course, different or additional supplementary codec modules
could be used.
Stereo Module
[0032] A codec may be designed to support encoding of more than one input audio channel.
In case of two audio channels, a mono (single channel) codec may be extended by a
stereo module to form a stereo codec. The stereo module then forms one of the supplementary
codec modules. A stereo codec can be implemented using several different stereo encoding
techniques. As non-limitative examples, the use of two stereo encoding techniques
that can be efficiently used at low bit rates is discussed hereinafter. Obviously,
other stereo encoding techniques can be implemented.
[0033] A first stereo encoding technique is called parametric stereo. Parametric stereo
encodes two audio channels as a mono signal using a common mono codec plus a certain
amount of stereo side information (corresponding to stereo parameters) which represents
a stereo image. The two input audio channels are down-mixed into a mono signal, and
the stereo parameters are then computed usually in transform domain, for example in
the Discrete Fourier Transform (DFT) domain, and are related to so-called binaural
or interchannel cues. The binaural cues (See Reference [5]) comprise Interaural Level
Difference (ILD), Interaural Time Difference (ITD) and Interaural Correlation (IC).
Depending on the signal characteristics, stereo scene configuration, etc., some or
all binaural cues are encoded and transmitted to the decoder. Information about what
cues are encoded is sent as signaling information, which is usually part of the stereo
side information. A particular binaural cue can be also quantized using different
encoding techniques which results in a variable number of bits being used. Then, in
addition to the quantized binaural cues, the stereo side information may contain,
usually at medium and higher bit rates, a quantized residual signal that results from
the down-mixing. The residual signal can be encoded using an entropy encoding technique,
e.g. an arithmetic encoder. Consequently, the number of bits used for encoding the
residual signal can fluctuate significantly from frame to frame.
[0034] Another stereo encoding technique is a technique operating in time-domain. This stereo
encoding technique mixes the two input audio channels into so-called primary channel
and secondary channel. For example, following the method described in Reference [6],
time-domain mixing can be based on a mixing factor, which determines respective contributions
of the two input audio channels upon production of the primary channel and the secondary
channel. The mixing factor is derived from several metrics, e.g. normalized correlations
of the input channels with respect to a mono signal or a long-term correlation difference
between the two input channels. The primary channel can be encoded by a common mono
codec while the secondary channel can be encoded by a lower bit rate codec. The secondary
channel encoding may exploit coherence between the primary and secondary channels
and might reuse some parameters from the primary channel. Consequently, the number
of bits used for encoding the primary channel and the secondary channel can fluctuate
significantly from frame to frame based on channel similarities and encoding modes
of the respective channels.
[0035] Stereo encoding techniques are otherwise known to those of ordinary skill in the
art and, therefore, will not be further described in the present specification. Although
stereo was described as a way of example of supplementary coding modules, the disclosed
method can be used in a 3D audio coding framework including ambisonics (scene-based
audio), multichannel (channel-based audio), or objects plus metadata (object-based
audio). Supplementary modules may also comprise any of these techniques.
BWE Module
[0036] In most of the recent speech codecs, including wideband (WB) or super wideband (SWB)
codecs, the input signal is processed in blocks (frames) while employing frequency
band-split processing. A lower frequency band is usually encoded using the CELP model
and covers frequencies up to a cut-off frequency. Then the higher frequency band is
efficiently encoded or estimated separately by a BWE technique in order to cover the
rest of the encoded spectrum. The cut-off frequency between the two bands is a design
parameter of each codec. For example, in the EVS codec as described in Reference [2],
the cut-off frequency depends upon the operational mode and bit rate of the codec.
In particular, the lower frequency band extends up to 6.4 kHz at bit rates of 7.2
- 13.2 kbps or up to 8 kHz at bit rates of 16.4 - 64 kbps. A BWE then further extends
the audio bandwidth for WB (up to 8 kHz), SWB (Up to 14.4 or 16 kHz), or Full Band
(FB, up to 20 kHz) encoding.
[0037] The idea behind BWE is to exploit the intrinsic correlation between the lower and
higher frequency bands and make benefit of the higher perceptual tolerance to encoding
distortions in higher frequencies compared to lower frequencies. Consequently, the
number of bits used for the higher band BWE encoding is usually very low compared
to the lower band CELP encoding, or even zero. For example, in the EVS codec as described
in Reference [2], a BWE where no bit-budget is transmitted (a so-called blind BWE)
is used at bit rates of 7.2 - 8.0 kbps while a BWE with some bit-budget (a so-called
guided BWE) is used at bit rates of 9.6 - 64 kbps. The exact bit-budget of a guided
BWE is dependent on the actual codec bit rate.
[0038] In the following description guided BWE is considered, which forms one of the supplementary
codec modules. The number of bits used for the higher band BWE encoding can fluctuate
from frame to frame and is much lower (typically 1 - 3 kbps) than the number of bits
used for the lower band CELP encoding.
[0039] Again, BWE is otherwise known to those of ordinary skill in the art and, therefore,
will not be further described in the present specification.
Codec signaling
[0040] The bit-stream, usually at its beginning, contains codec signaling bits. These bits
(codec signaling bit-budget) usually represent very high level codec parameters, for
example codec configuration or information about the nature of the supplementary codec
modules that are encoded. In case of a multi-channel codec, these bits can represent
for example a number of encoded (transport) channels and/or codec format (scene based
or object based, etc.). In case of stereo encoding, these bits can represent for example
the stereo encoding technique being used. Another example of codec parameter that
can be sent using codec signaling bits is an audio signal bandwidth.
[0041] Again, codec signaling is otherwise known to those of ordinary skill in the art and,
therefore, will not be further described in the present specification. Also, a counter
(not shown) can be used for counting the number of bits (bit-budget) used for codec
signaling.
Operation 204
[0042] Referring back to Figure 2, in operation 204, a subtractor 254 subtracts the bit-budget
bsupplementary for encoding of the supplementary codec modules and the bit-budget
bcodec_signaling for transmitting codec signaling, from the codec total bit-budget
btotal to obtain a bit-budget b
core of the CELP core module, using the following relation:

[0043] As explained above, the number of bits
bsupplementary for encoding the supplementary codec modules and the bit-budget
bcodec_signaling for transmitting codec signaling to the decoder fluctuates from frame to frame and,
therefore, the bit-budget b
core of the CELP core module also fluctuates from frame to frame.
Operation 205
[0044] In operation 205, a counter 255 counts the number of bits (bit-budget)
bsignaling for transmitting to the decoder CELP core module signaling. CELP core module signaling
may comprise, for example, audio bandwidth, CELP encoder type, sharpening flag, etc.
Operation 206
[0045] In operation 206, a subtractor 256 subtracts the bit-budget
bsignaling for transmitting CELP core module signaling from the CELP core module bit-budget
b
core to find a bit-budget b
2 for encoding the CELP core module parts, using the following relation:

Operation 207
[0046] In operation 207, an intermediate bit rate selector 257 comprises a calculator which
converts the bit-budget b
2 into a CELP core module bit rate by dividing the number of bits b
2 by the duration of a frame. The selector 257 finds an intermediate bit rate based
on the CELP core module bit rate.
[0047] A small number of candidate intermediate bit rates is used. In an example of implementation
within the EVS-based codec, the following fifteen (15) bit rates may be considered
as candidate intermediate bit rates: 5.00 kbps, 6.15 kbps, 7.20 kbps, 8.00 kbps, 9.60
kbps, 11.60 kbps, 13.20 kbps, 14.80 kbps, 16.40 kbps, 19.40 kbps, 22.60 kbps, 24.40
kbps, 32.00 kbps, 48.00 kbps, and 64.00 kbps. Of course, it is possible to use a number
of candidate intermediate bit rates different from fifteen (15) and also to use candidate
intermediate bit rates of different values.
[0048] In the same example of implementation, within the EVS-based codec, the found intermediate
bit rate is the nearest higher candidate intermediate bit rate to the CELP core module
bit rate. For example, for a 9.00 kbps CELP core module bit rate the found intermediate
bit rate would be 9.60 kbps when using the candidate intermediate bit rates listed
in the previous paragraph.
[0049] In another example of implementation, the found intermediate bit rate is the nearest
lower candidate intermediate bit rate to the CELP core module bit rate. Using the
same example, for a 9.00 kbps CELP core module bit rate the found intermediate bit
rate would be 8.00 kbps when using the candidate intermediate bit rates listed in
the previous paragraph.
Operations 208
[0050] In operation 208, ROM tables 258 store, for each candidate intermediate bit rate,
respective, pre-determined bit-budgets for encoding first parts of the CELP core module.
As a non-limitative example, the CELP core module first parts for which bit-budgets
are stored in the ROM tables 258 may comprise the LP filter coefficients, the adaptive
codebook, the adaptive codebook gain, and the innovation codebook gain. In this implementation,
no bit-budget for encoding the innovation codebook is stored in the ROM tables 258.
[0051] In other words, when one of the candidate intermediate bit rates is selected by the
selector 257, the associated bit-budgets stored in the ROM tables 258 are allocated
to encoding of the above identified CELP core module first parts (the LP filter coefficients,
the adaptive codebook, the adaptive codebook gain, and the innovation codebook gain).
However, in the described implementation, no bit-budget for encoding the innovation
codebook is stored in the ROM tables 258.
[0052] The following Table 1 is an example of ROM table 258 storing, for each candidate
intermediate bit rate, a respective bit-budget (number of bits)
bLPC for encoding the LP filter coefficients. The right column identifies the candidate
intermediate bit rates while the left column indicates the respective bit-budgets
(number of bits)
bLPC. For simplicity the bit-budget for encoding the LP filter coefficients is a single
value per frame although it could be a sum of several bit-budget values when more
than one LP analysis are done in a current frame (for example a mid-frame and an end-frame
LP analysis).

[0053] The following Table 2 is an example of ROM table 258 storing, for each candidate
intermediate bit rate, respective bit-budgets (number of bits)
bACBn for encoding the adaptive codebook. The right column identifies the candidate intermediate
bit rates while the left column indicates the respective bit-budgets (number of bits)
bACBn. As the adaptive codebook is searched in every sub-frame
n, N bit-budget
bACBn (one per sub-frame) are obtained for every candidate intermediate bit rate, N representing
the number of sub-frames in a frame. It should be noted that the bit-budgets
bACBn may be different in different sub-frames. Specifically, Table 2 is an example of
ROM table 258 storing bit-budgets
bACBn in the EVS-based codec using the above defined fifteen (15) candidate intermediate
bit rates.

[0054] It should be noted that, in the example using the EVS-based codec, four (4) bit-budgets
bACBn per intermediate bit rate are stored at lower bit rates where the frame of 20 ms
is composed of four (4) sub-frames (N=4) and five (5) bit-budgets
bACBn per intermediate bit rate are stored at higher bit rates where the frame of 20 ms
is composed of five (5) sub-frames (N=5). Referring to Table 2, for a CELP core module
bit rate of 9.00 kbps corresponding to an intermediate bit rate of 9.60 kbps, the
bit-budgets
bACBn in the individual sub-frames are 9, 6, 9, and 6 bits, respectively.
[0055] The following Table 3 is an example of ROM table 258 storing, for each candidate
intermediate bit rate, respective bit-budgets (number of bits)
bGn for encoding the adaptive codebook gain and the innovation codebook gain. In the
example below, the adaptive codebook gain and the innovation codebook gain are quantized
using a vector quantizer and thus represented as only one quantization index. The
right column identifies the candidate intermediate bit rates while the left column
indicates the respective bit-budgets (number of bits)
bGn. As can be seen from Table 3, there is one bit-budget
bGn for every sub-frame
n of a frame. Accordingly, N bit-budgets
bGn are stored for every candidate intermediate bit rate, N representing the number of
sub-frames in a frame. It should be noted that, depending on the gain quantizer and
size of the quantization table being used, the bit-budgets
bGn may be different in different sub-frames.

[0056] In the same manner, a bit-budget for quantizing other CELP core module first parts
(if they are present) can be stored in the ROM tables 258 for each candidate intermediate
bit rate. An example could be a flag of an adaptive codebook low-pass filtering (one
bit per sub-frame). Therefore, a bit-budget associated to all CELP core module parts
(first parts) except of the innovation codebook can be stored in the ROM tables 258
for each candidate intermediate bit rate while a certain bit-budget b
4 still remains available.
Operation 209
[0057] In operation 209, a bit-budget allocator 259 allocates for encoding the above mentioned
CELP core module first parts (the LP filter coefficients, the adaptive codebook, the
adaptive and innovation codebook gains, etc.) the bit-budgets stored in the ROM tables
258 and associated to the intermediate bit rate selected by the selector 257.
Operation 210
[0058] In operation 210, a subtractor 260 subtracts from the bit-budget
b2 (a) bit-budget
bLPC for encoding the LP filter coefficients associated to the candidate intermediate
bit rate selected by the selector 257, (b) the sum of the bit-budgets
bACBn of the
N sub-frames associated to the selected candidate intermediate bit rate, (c) the sum
of the bit-budgets
bGn for quantizing the adaptive and innovation codebook gains of the
N sub-frames associated to the selected candidate intermediate bit rate, and (d) the
bit-budget, associated to the selected intermediate bit rate, for encoding other CELP
core module first parts (if they are present), to find a remaining bit-budget (number
of bits)
b4 still available for encoding the innovation codebook (second CELP core module part).
For that purpose, the following relation can be used by the subtractor 260:

Operation 211
[0059] In operation 211, a FCB bit allocator 261 distributes the remaining bit-budget
b4 for encoding the innovation codebook (Fixed CodeBook (FCB); second CELP core module
part) between the
N sub-frames of the current frame. Specifically, the bit-budget b
4 is divided into bit-budgets
bFCBn allocated to the various sub-frames
n. For example, this can be done by an iterative procedure which divides the bit-budget
b
4 between the
N sub-frames as equally as possible.
[0060] In other non-limitative implementations, the FCB bit allocator 261 can be designed
by assuming at least one of the following requirements:
- I. According to a first requirement not covered by the claims, in case the bit-budget
b4 cannot be distributed equally between all the sub-frames, a highest possible (i.e.
a larger) bit-budget is allocated to the first sub-frame. As an example, if b4 = 106 bits, the FCB bit-budget per 4 sub-frames is allocated as 28-26-26-26 bits.
- II. According to a second requirement not covered by the claims, if there are more
bits available to potentially increase other sub-frame FCB codebooks, the FCB bit-budget
(number of bits) allocated to at least one next sub-frames after the first sub-frame
(or at least one sub-frame following the first sub-frame) is increased. As an example,
if b4 = 108 bits, the FCB bit-budget per 4 sub-frames is allocated as 28-28-26-26 bits.
In an additional example, if b4 = 110 bits, the FCB bit-budget per 4 sub-frames is allocated as 28-28-28-26 bits.
- III. According to a third requirement not covered by the claims, the bit-budget b4 is not necessarily distributed as equally as possible between all the sub-frames
but rather to use as much as possible the bit-budget b4. As an example, if b4 = 87 bits, the FCB bit-budget per 4 sub-frames is allocated as 26-20-20-20 bits rather
than e.g. 24-20-20-20 bits or 20-20-20-24 bits when requirement III is not considered.
In another example, if b4 = 91 bits, the FCB bit-budget per 4 sub-frames is allocated as 26-24-20-20 bits while
e.g. 20-24-24-20 bits would be allocated if requirement III is not considered. Consequently,
in both examples, only 1 bit remains unused when requirement III is considered while
3 bits remain unused otherwise.
Requirement III enables that the FCB bit allocator 261 selects two non-consecutive
lines from a FCB configuration table, for example Table 4 herein below. As a non-limitative
example, consider b4 = 87 bits. The FCB bit allocator 261 first chooses line 6 from Table 4 for all sub-frames
to be employed to configure the FCB search (this results in 20-20-20-20 bit-budget
allocation). Then requirement I changes the allocation such that lines 6 and 7 (24-20-20-20
bits) are employed and requirement III selects the allocation by using lines 6 and
8 (26-20-20-20) from the FCB configuration table (Table 4).
Below is Table 4 as the example of the FCB configuration table (copied from EVS (Reference
[2])):

where the first column corresponds to the number of FCB codebook bits and the fourth
column corresponds to the number of FCB pulses per sub-frame. It should be noted that
in the example above for b4 = 87 bits, there does not exist a 22 bit codebook and the FCB allocator thus selects
two non-consecutive lines from the FCB configuration table resulting in 26-20-20-20
FCB bit-budget allocation.
- IV. According to a fourth requirement covered by the claims, in case the bit-budget
cannot be equally distributed between all the sub-frames when encoding using a Transition
Coding (TC) mode (See Reference [2]), the largest possible (larger) bit-budget is
allocated to the sub-frame using a glottal-impulse-shape codebook. As an example,
if b4 = 122 bits and the glottal-impulse-shape codebook is used in the third sub-frame,
the FCB bit-budget per 4 sub-frames is allocated as 30-30-32-30 bits.
- V. According to a fifth requirement not covered by the claims, if, after applying
requirement IV, there are more bits available to potentially increase another FCB
codebook in a TC mode frame, the FCB bit-budget (number of bits) allocated to the
last sub-frame is increased. As an example, if b4 = 116 bits and the glottal-impulse-shape codebook is used in the second sub-frame,
the FCB bit-budget per 4 sub-frames is allocated as 28-30-28-30 bits. The idea behind
this requirement is to better build the part of the excitation after the onset/transition
event which is perceptually more important than the part of excitation before it.
[0061] A glottal-impulse-shape codebook may consist of quantized normalized shapes of truncated
glottal impulses placed at specific positions as described in Section 5.2.3.2.1 (Glottal
pulse codebook search) of Reference [2]. The codebook search then comprises selection
of the best shape and the best position. For example, glottal impulse shapes can be
represented by codevectors containing only one non-zero element corresponding to candidate
impulse positions. Once selected, the position codevector is convolved with the impulse
response of a shaping filter.
[0063] where function SWAP() swaps/interchanges the two input values. The function fcb_table()
then selects the corresponding line of the FCB (fixed or innovation codebook) configuration
table (as defined above) and returns the number of bits needed for encoding the selected
FCB (fixed or innovation codebook).
Operation 212
[0064] A counter 262 determines the sum of the bit-budgets (number of bits)
bFCBn allocated to the
N various sub-frames for encoding the innovation codebook (Fixed CodeBook (FCB); second
CELP core module part).

Operation 213
[0065] In operation 213, a subtractor 263 determines the number of bits
b5 remaining after encoding of the innovation codebook, using the following relation:

[0066] Ideally, after encoding of the innovation codebook, the number of remaining bits
b
5 is equal to zero. However, it may not be possible to achieve this result because
the granularity of the innovation codebook index is greater than 1 (usually 2-3 bits).
Consequently, a small number of bits often remain unemployed after encoding of the
innovation codebook.
Operation 214
[0067] In operation 214, a bit allocator 264 assigns the unemployed bit-budget (number of
bits)
b5 to increase the bit-budget of one of the CELP core module parts (CELP core module
first parts) except of the innovation codebook. For example, the unemployed bit-budget
b5 can be used to increase the bit-budget
bLPC obtained from the ROM tables 258, using the following relation:

[0068] The unemployed bit-budget
b5 may also be used to increase the bit-budget of other CELP core module first parts,
for example the bit-budgets
bACBn or
bGn. Also, the unemployed bit-budget
b5, when greater than 1 bit, can be redistributed between two or even more CELP core
module first parts. Alternatively, the unemployed bit-budget
b5 can be used to transmit FEC information (if not already counted in the supplementary
codec modules), for example a signal class (See Reference [2]).
High bit rate CELP
[0069] Traditional CELP has limitations of scalability and complexity when it is used at
high bit rates. To overcome these limitations, the CELP model can be extended by a
special transform-domain codebook as described in References [3] and [4]. In contrast
to traditional CELP where the excitation is composed from the adaptive and the innovation
excitation contributions only, the extended model introduces a third part of the excitation,
namely a transform-domain excitation contribution. The additional transform-domain
codebook usually comprises a pre-emphasis filter, a time-domain to frequency-domain
transformation, a vector quantizer, and a transform-domain gain. In the extended model,
a substantial number (at least tens) of bits is assigned to the vector quantizer in
every sub-frame.
[0070] In high bit rate CELP, bit-budget is allocated to the CELP core module parts using
the procedure as described above. Following this procedure, the sum of the bit-budgets
bFCBn for encoding the innovation codebook in the
N sub-frames should be equal or approach bit-budget
b4. In the high bit rate CELP, the bit-budgets
bFCBn are usually modest, and the number of unemployed bits
b5 is relatively high and is used to encode the transform-domain codebook parameters.
[0071] First, the sum of the bit-budget
bTDGn for encoding the transform-domain gain in the
N sub-frames and eventually the bit-budget of other transform-domain codebook parameters
except the bit-budget for the vector quantizer are subtracted from the unemployed
bit-budget
b5, using the following relation:

[0072] Then, the remaining bit-budget (number of bits)
b7 is allocated to the vector quantizer within the transform-domain codebook and distributed
among all sub-frames. The bit-budget (number of bits) by sub-frame of the vector quantizer
is denoted as
bVQn. Depending on the vector quantizer being used (for example an AVQ quantizer as used
in EVS), the quantizer does not consume all of the allocated bit-budget
bVQn leaving a small variable number of bits available in each sub-frame. These bits are
floating bits employed in the following sub-frame within the same frame. For a better
effectiveness of the transform-domain codebook, a slightly higher (larger) bit-budget
(number of bits) is allocated to the vector quantizer in the first sub-frame. An example
of implementation is given in the following pseudocode:

[0073] where

denotes the largest integer less than or equal to
x and
N is the number of sub-frames in one frame. Bit-budget (number of bits)
b7 is distributed equally between all the sub-frames while the bit-budget for the first
sub-frame is eventually slightly increased by up to
N-1 bits. Consequently, in high bit rate CELP, there are no remaining bits after this
operation.
Other aspects related to the extended EVS codec
[0074] In many instances, there are more than one alternative for encoding a given CELP
core module part. In complex codecs like EVS several different techniques are available
for encoding a given CELP core module part and the selection of one technique is usually
made on the basis of the CELP core module bit rate (the core module bit rate corresponds
to the bit-budget b
core of the CELP core module multiplied by number of frames per second). An example is
gain quantization where there are three (3) different techniques available in the
EVS codec as described in Reference [2], Generic Coding (GC) mode:
- a vector quantizer based on sub-frame prediction (GQ1; used at core bit rates equal
or below 8.0 kbps);
- a memory-less vector quantizer of adaptive and innovation gains (GQ2; used at core
bit rates higher than 8 kbps and lower or equal to 32 kbps); and
- two scalar quantizers (GQ3; used at core bit rates higher than 32 kbps).
[0075] Also, at a constant codec total bit rate
btotal, different techniques for encoding and quantizing a given CELP core module part can
be switched on a frame by frame basis depending on the CELP core module bit rate.
An example is parametric stereo coding mode at 48 kbps, in which different gain quantizers
(See Reference [2]) are used in different frames as shown in Table 5 below:
Table 5
| Example usage of different gain quantizers in the extended EVS codec with fluctuating
core bit rate |
| frame # |
k |
k+1 |
k+2 |
k+3 |
k+4 |
k+5 |
k+6 |
| core bit rate |
35.20 kbps |
38.05 kbps |
31.35 kbps |
32.00 kbps |
32.45 kbps |
34.30 kbps |
33.60 kbps |
| gain quantizer |
GQ3 |
GQ3 |
GQ2 |
GQ2 |
GQ3 |
GQ3 |
GQ3 |
[0076] It is also interesting to note that there can be different bit-budget allocations
for a given CELP core module bit rate depending on the codec configuration. As an
example, encoding of the primary channel in EVS-based TD stereo coding mode works,
in a first scenario, at a total codec bit rate of 16.4 kbps and, in a second scenario,
at a total codec bit rate of 24.4 kbps. There can happen in both scenarios that the
CELP core module bit rate is the same even though the total codec bit rate is different.
But a different codec configuration can lead to a different bit-budget distribution.
[0077] In the EVS-based stereo framework, the different codec configurations between 16.4
kbps and 24.4 kbps is related to a different CELP core internal sampling rate which
is 12.8 kHz at 16.4 kbps and 16 kHz at 24.4 kbps, respectively. Thus CELP core module
coding with four (4), respectively five (5) sub-frames is employed and a corresponding
bit-budget distribution is used. Below are shown these differences between the two
mentioned total codec bit rates (one value per table cell corresponds to one parameter
per frame while more values correspond to parameters per sub-frames).
Table 6
| Bit-budget comparison for same core bit rate at two different total bit rates. |
| total bit rate |
16.4 kbps |
24.40 kbps |
| core bit rate |
13.30 kbps |
13.30 kbps |
| |
|
|
| core module part |
bit-budget [bits] |
bit-budget [bits] |
| Signaling |
7 |
9 |
| LPCQ |
36 |
42 |
| |
5 |
5 |
| ACBQ |
10+6+10+6 |
10+6+10+6+6 |
| FCBQ |
43+36+36+36 |
26+26+26+26+26 |
| GQ |
5 |
5 |
| |
6+6+6+6 |
6+6+6+6+6 |
| ACB low-pass filtering flag |
1+1+1+1 |
1+1+1+1+1 |
| FEC |
2 |
2 |
| |
|
|
| Total |
266 |
266 |
[0078] Accordingly, the above table shows that there can be different bit-budget distributions
for the same core bit rate at different codec total bit rates.
Encoder process flow
[0079] When the supplementary codec modules comprises a stereo module and a BWE module,
the flow of the encoder process may be as follows:
- Stereo side (or secondary channel) information is encoded and the bit-budget allocated
thereto is subtracted from the codec total bit-budget. Codec signaling bits are also
subtracted from the total bit-budget.
- The bit-budget for encoding the BWE supplementary module is then set based on the
codec total bit-budget minus the stereo module and codec signaling bit-budgets.
- The BWE bit-budget is subtracted from the codec total bit-budget minus the "stereo
supplementary module" and "codec signaling" bit-budgets.
- The above-described procedure for allocating the core module bit-budget is performed.
- CELP core module is encoded.
- BWE supplementary module is encoded.
Decoder
[0080] The CELP core module bit rate is not directly signaled in the bit-stream but is computed
at the decoder based on the bit-budgets of the supplementary codec modules. In the
example of implementation comprising stereo and BWE supplementary modules, the following
procedure could be followed:
- Codec signaling is written/read to/from the bit-stream.
- Stereo side (or secondary channel) information is written/read to/from the bit-stream.
The bit-budget for coding the stereo side information fluctuates and depends on the
stereo side signaling and on the technique used for coding. Basically (a) in parametric
stereo the arithmetic coder and the stereo side signaling determines when to stop
the writing/reading of the stereo side information while (b) in time-domain stereo
coding the mixing factor and coding mode determine the bit-budget of the stereo side
information.
- The bit-budgets for codec signaling and the stereo side information are subtracted
from the codec total bit-budget.
- Then, the bit-budget for the BWE supplementary module is also subtracted from the
codec total bit-budget. The BWE bit-budget granularity is usually small: a) there
is only one bit rate per audio bandwidth (WB/SWB/FB) and the bandwidth information
is transmitted as part of the codec signaling in the bit-stream, or b) the bit-budget
for a particular bandwidth may have a certain granularity and the BWE bit-budget is
determined from the codec total bit-budget minus the stereo module bit-budget. In
an illustrative embodiment, for instance the SWB time-domain BWE may have a bit rate
of 0.95 kbps, 1.6 kbps or 2.8 kbps depending on the codec total bit rate minus the
stereo module bit rate.
[0081] What remains is the CELP core bit-budget b
core, which is an input parameter to the bit-budget allocation procedure described in
the foregoing description. The same allocation is called for at the CELP encoder (just
after preprocessing) and at the CELP decoder (at the beginning of CELP frame decoding).
[0083] Figure 3 is a simplified block diagram of an example configuration of hardware components
forming the bit-budget allocating device and implementing the bit-budget allocating
method.
[0084] The bit-budget allocating device may be implemented as a part of a mobile terminal,
as a part of a portable media player, or in any similar device. The bit-budget allocating
device (identified as 300 in Figure 3) comprises an input 302, an output 304, a processor
306 and a memory 308.
[0085] The input 302 is configured to receive for example the codec total bit-budget
btotal (Figure 2). The output 304 is configured to supply the various allocated bit-budgets.
The input 302 and the output 304 may be implemented in a common module, for example
a serial input/output device.
[0086] The processor 306 is operatively connected to the input 302, to the output 304, and
to the memory 308. The processor 306 is realized as one or more processors for executing
code instructions in support of the functions of the various modules of the bit-budget
allocating device of Figure 2.
[0087] The memory 308 may comprise a non-transient memory for storing code instructions
executable by the processor 306, specifically a processor-readable memory comprising
non-transitory instructions that, when executed, cause a processor to implement the
operations and modules of the bit-budget allocating method and device of Figure 2.
The memory 308 may also comprise a random access memory or buffer(s) to store intermediate
processing data from the various functions performed by the processor 306.
[0088] Those of ordinary skill in the art will realize that the description of the bit-budget
allocating method and device are illustrative only and are not intended to be in any
way limiting. Other embodiments will readily suggest themselves to such persons with
ordinary skill in the art having the benefit of the present disclosure. Furthermore,
the disclosed bit-budget allocating method and device may be customized to offer valuable
solutions to existing needs and problems related to allocation or distribution of
bit-budget.
[0089] In the interest of clarity, not all of the routine features of the implementations
of the bit-budget allocating method and device are shown and described. It will, of
course, be appreciated that in the development of any such actual implementation of
the bit-budget allocating method and device, numerous implementation-specific decisions
may need to be made in order to achieve the developer's specific goals, such as compliance
with application-, system-, network- and business-related constraints, and that these
specific goals will vary from one implementation to another and from one developer
to another. Moreover, it will be appreciated that a development effort might be complex
and time-consuming, but would nevertheless be a routine undertaking of engineering
for those of ordinary skill in the field of sound processing having the benefit of
the present disclosure.
[0090] In accordance with the present disclosure, the modules, processing operations, and/or
data structures described herein may be implemented using various types of operating
systems, computing platforms, network devices, computer programs, and/or general purpose
machines. In addition, those of ordinary skill in the art will recognize that devices
of a less general purpose nature, such as hardwired devices, field programmable gate
arrays (FPGAs), application specific integrated circuits (ASICs), or the like, may
also be used. Where a method comprising a series of operations and sub-operations
is implemented by a processor, computer or a machine and those operations and sub-operations
may be stored as a series of non-transitory code instructions readable by the processor,
computer or machine, they may be stored on a tangible and/or non-transient medium.
[0091] Modules of the bit-budget allocating method and device as described herein may comprise
software, firmware, hardware, or any combination(s) of software, firmware, or hardware
suitable for the purposes described herein.
REFERENCES
[0092] The following references are referred to in the present specification.
- [1] ITU-T Recommendation G.718: "Frame error robust narrowband and wideband embedded variable
bit-rate coding of speech and audio from 8-32 kbps," 2008.
- [2] 3GPP Spec. TS 26.445: "Codec for Enhanced Voice Services (EVS). Detailed Algorithmic
Description," v.12.0.0, Sep. 2014.
- [3] B. Bessette, "Flexible and scalable combined innovation codebook for use in CELP coder
and decoder," US Patent 9,053,705, June 2015.
- [4] V. Eksler, "Transform-Domain Codebook in a CELP Coder and Decoder," US Patent Publication
2012/0290295, November 2012, and US Patent 8,825,475, September 2014.
- [5] F. Baumgarte, C. Faller, "Binaural cue coding - Part I: Psychoacoustic fundamentals
and design principles," IEEE Trans. Speech Audio Processing, vol. 11, pp. 509-519,
Nov. 2003.
- [6] Tommy Vaillancourt, "Method and system using a long-term correlation difference between
left and right channels for time domain down mixing a stereo sound signal into primary
and secondary channels," PCT Application WO2017/049397A1.
1. Verfahren zum Codieren eines Tonsignals unter Verwendung eines CELP-Kernmoduls und
zusätzlicher Codec-Module eines Tonsignal-Encoders (106), umfassend:
Zuweisen (202) an die zusätzlichen Codec-Module eines Bit-Budgets (bzusätzlich), das basierend auf dem Tonsignal schwankt;
Subtrahieren (204), von einem gesamten Codec-Bit-Budget (bgesamt), des zusätzlichen Codec-Modul-Bit-Budgets (bzusätzlich),um ein schwankendes CELP-Kernmodul-Bit-Budget (bkern) zu bestimmen;
Zuweisen des schwankenden CELP-Kernmodul-Bit-Budgets (bkern) zu einer Vielzahl von ersten Teilen und einem zweiten Teil des CELP-Kernmoduls,
umfassend in einem Rahmen des Tonsignals umfassende Unterrahmen:
Zuweisen (209) an die ersten CELP-Kernmodulteile von jeweiligen Bit-Budgets aus dem
schwankenden CELP-Kernmodul-Bit-Budget;
Zuweisen (211) eines Bit-Budgets, das von dem schwankenden CELP-Kernmodul-Bit-Budget
nach dem Zuweisen (209) der jeweiligen Bit-Budgets zu den ersten CELP-Kernmodulteilen
verbleibt, zu dem zweiten CELP-Kernmodulteil, wobei der zweite CELP-Kernmodulteil
ein festes CELP-Codebuch ist, wobei das CELP-Kernmodul in einem Unterrahmen des Rahmens
des Tonsignals ein Glottal-Impuls-Form-Codebuch verwendet, und wobei das Zuweisen
des Bit-Budgets des zweiten CELP-Kernmodulteils, wenn das Bit-Budget des zweiten CELP-Kernmodulteils
nicht gleichmäßig auf die Unterrahmen des Rahmens verteilt werden kann, Folgendes
umfasst: Verteilen (211) des Bit-Budgets des zweiten CELP-Kernmodulteils zwischen
den Unterrahmen des Rahmens durch (a) Zuweisen einer höchsten gleichen Anzahl von
Bits aus dem Bit-Budget des zweiten CELP-Kernmodulteils zu den Unterrahmen des Rahmens
und (b) Zuweisen von Bits, die von dem Bit-Budget des zweiten CELP-Kernmodulteils
nach dem Zuweisen der höchsten gleichen Anzahl von Bits zu den Unterrahmen des Rahmens
verbleiben, an den Unterrahmen unter Verwendung des Glottal-Impuls-Form-Codebuchs;
Codieren der zusätzlichen Codec-Module unter Verwendung des zusätzlichen Codec-Modul-Bit-Budgets
(bzusätzlich);
Codieren der ersten CELP-Kernmodulteile unter Verwendung der jeweiligen, ersten Bit-Budgets
der CELP-Kernmodulteile; und
Codieren des zweiten CELP-Kernmodulteils unter Verwendung des Bit-Budgets des zweiten
CELP-Kernmodulteils, das auf die Unterrahmen des Rahmens verteilt ist.
2. Verfahren zum Codieren eines Tonsignals nach Anspruch 1, wobei:
der Rahmen des Tonsignals vier Unterrahmen umfasst, das Glottal-Impuls-Form-Codebuch
im dritten Unterrahmen des Rahmens verwendet wird und das Bit-Budget des zweiten CELP-Kernmodulteils
gleich 122 Bits ist, die Anzahl der den vier Unterrahmen (211) zugewiesenen Bits jeweils
30-30-32-30 beträgt.
3. Verfahren zum Codieren eines Tonsignals nach Anspruch 1 oder 2, wobei das Zuweisen
von jeweiligen Bit-Budgets zu den ersten CELP-Kernmodulteilen das Zuweisen (209) von
jeweiligen Bit-Budgets zu den ersten CELP-Kernmodulteilen umfasst, die den ersten
CELP-Kernmodulteilen durch Zuweisungstabellen (258) für Bit-Budgets zugeordnet sind.
4. Verfahren zum Codieren eines Tonsignals nach einem der Ansprüche 1 bis 3, umfassend:
Zuweisen eines Codec-Signalisierungs-Bit-Budgets; und
Subtrahieren (204), von dem gesamten Codec-Bit-Budget (bgesamt), sowohl des Codec-Signalisierungs-Bit-Budgets als auch des zusätzlichen Codec-Modul-Bit-Budgets
(bzusätzlich), um das schwankende CELP-Kernmodule-Bit-Budget (bkern) zu bestimmen.
5. Vorrichtung zum Codieren eines Tonsignals unter Verwendung eines CELP-Kernmoduls und
zusätzlicher Codec-Module eines Tonsignal-Encoders (106), umfassend:
einen Bit-Budget-Zuweiser (252) für zusätzliche Codec-Module, der so eingerichtet
ist, dass er den zusätzlichen Codec-Modulen ein Bit-Budget (bzusätzlich) zuweist, das basierend auf dem Tonsignal schwankt;
einen Subtraktor (254), der so eingerichtet ist, dass er von einem gesamten Codec-Bit-Budget
(bgesamt), das zusätzliche Codec-Modul-Bit-Budget (bzusätzlich) subtrahiert, um ein schwankendes CELP-Kernmodul-Bit-Budget (bkern) zu bestimmen;
eine Vorrichtung zum Zuweisen des schwankenden CELP-Kernmodul-Bit-Budgets (bkern) zu einer Vielzahl von ersten Teilen und einem zweiten Teil des CELP-Kernmoduls,
umfassend für einen Rahmen des Tonsignals umfassend Unterrahmen:
einen ersten Zuweiser (259), der so eingerichtet ist, dass er den ersten CELP-Kernmodulteilen
jeweilige Bit-Budgets aus dem schwankenden CELP-Kernmodul-Bit-Budget zuweist; und
einen zweiten Zuweiser (261), der so eingerichtet, dass er dem zweiten CELP-Kernmodulteil
ein aus dem schwankenden CELP-Kernmodul-Bit-Budget verbleibendes Bit-Budget zuweist,
nachdem er den ersten CELP-Kernmodulteilen die jeweiligen Bit-Budgets zuweist (259),
wobei der zweite CELP-Kernmodulteil ein festes CELP-Codebuch ist, wobei das CELP-Kernmodul
in einem Unterrahmen des Rahmens des Tonsignals ein Glottal-Impuls-Form-Codebuch verwendet,
und wobei der zweite Zuweiser (261), wenn das zweite Bit-Budget des CELP-Kernmodulteils
nicht gleichmäßig zwischen den Unterrahmen der Rahmen verteilt werden kann, so eingerichtet
ist, dass es das zweite Bit-Budget des CELP-Kernmodulteils zwischen den Unterrahmen
des Rahmens verteilt durch (a) Zuweisen einer höchsten gleichen Anzahl von Bits aus
dem Bit-Budget des zweiten CELP-Kernmodulteils zu den Unterrahmen des Rahmens und
(b) Zuweisen von Bits, die von dem Bit-Budget des zweiten CELP-Kernmodulteils nach
dem Zuweisen der höchsten gleichen Anzahl von Bits zu den Unterrahmen des Rahmens
verbleiben, an den Unterrahmen unter Verwendung des Glottal-Impuls-Form-Codebuchs;
Mittel zum Codieren der zusätzlichen Codec-Module unter Verwendung des zusätzlichen
Codec-Modul-Bit-Budgets (bergänzend); und
einen CELP-Kernmodulcodierer zum Codieren der ersten CELP-Kernmodulteile unter Verwendung
der jeweiligen Bit-Budgets der ersten CELP-Kernmodulteile und des zweiten CELP-Kernmodulteils
unter Verwendung des Bit-Budgets des zweiten CELP-Kernmodulteils, das zwischen den
Unterrahmen des Rahmens verteilt ist.
6. Vorrichtung zum Codieren eines Tonsignals nach Anspruch 5, wobei:
wobei der Rahmen des Tonsignals vier Unterrahmen umfasst, das Glottal-Impuls-Form-Codebuch
im dritten Unterrahmen des Rahmens verwendet wird und das Bit-Budget des zweiten CELP-Kernmodulteils
gleich 122 Bits ist, ist der zweite Zuweiser (261) so eingerichtet, dass er den vier
Unterrahmen jeweils eine Anzahl von Bits zuweist, die gleich 30-30-32-30 ist.
7. Vorrichtung zum Codieren eines Tonsignals nach Anspruch 5 oder 6, wobei der erste
Zuweiser (259) so eingerichtet ist, dass er den ersten CELP-Kernmodulteilen jeweilige
Bit-Budgets zuweist, die den ersten CELP-Kernmodulteilen durch Bit-Budget-Zuweisungstabellen
(258) zugewiesen sind.
8. Vorrichtung zum Codieren eines Tonsignals nach einem der Ansprüche 5 bis 7, umfassend:
einen Codec-Signalisierungs-Bit-Budget-Zuweiser, der so eingerichtet ist, dass er
der Codec-Signalisierung ein Bit-Budget zuweist;
wobei der Subtraktor (254) so eingerichtet ist, dass er das zusätzliche Codec-Modul-Bit-Budget
(bzusätzlich) und das Codec-Signalisierungs-Bit-Budget von dem gesamten Codec-Bit-Budget (bgesamt) subtrahiert, um das schwankende CELP-Kernmodul-Bit-Budget (bkern) zu bestimmen.
1. Procédé de codage d'un signal sonore à l'aide d'un module central CELP et de modules
de codec supplémentaires d'un codeur de signal sonore (106), comprenant :
l'attribution (202) aux modules de codec supplémentaires d'un budget de bits (bsupplerrentary) fluctuant sur la base du signal sonore ;
la soustraction (204), à un budget de bits de codec total (btotal), du budget de bits de modules de codec supplémentaires (bsupplerrentary) pour déterminer un budget de bits de module central CELP fluctuant (bcore) ;
l'attribution du budget de bits de module central CELP fluctuant (bcore) à une pluralité de premières parties et à une deuxième partie du module central CELP,
comprenant dans une trame le signal sonore comprenant des sous-trames :
l'attribution (209), aux premières parties de module central CELP, de budgets de bits
respectifs à partir du budget de bits de module central CELP fluctuant ;
l'attribution (211) à la deuxième partie de module central CELP d'un budget de bits
restant du budget de bits de module central CELP fluctuant après l'attribution (209)
aux premières parties de module central CELP desdits budgets de bits respectifs, dans
lequel la deuxième partie de module central CELP est un livre de codes fixe CELP,
dans lequel le module central CELP utilise, dans une sous-trame de la trame du signal
sonore, un livre de codes de forme d'impulsion glottique, et dans lequel l'attribution
du budget de bits de deuxième partie de module central CELP comprend, lorsque le budget
de bits de deuxième partie de module central CELP ne peut pas être réparti équitablement
entre les sous-trames de la trame, la répartition (211) du budget de bits de deuxième
partie de module central CELP entre les sous-trames de la trame par (a) attribution
d'un nombre égal le plus élevé de bits du budget de bits de deuxième partie de module
central CELP aux sous-trames de la trame et (b) attribution à la sous-trame, à l'aide
du livre de codes de forme d'impulsion glottique, de bits restants du budget de bits
de deuxième partie de module central CELP après l'attribution du nombre égal le plus
élevé de bits aux sous-trames de la trame ;
le codage des modules de codec supplémentaires à l'aide du budget de bits de modules
de codec supplémentaires (bsupplerrentary) ;
le codage des premières parties de module central CELP à l'aide des budgets de bits
de premières parties de module central CELP respectifs ; et
le codage de la deuxième partie de module central CELP à l'aide du budget de bits
de deuxième partie de module central CELP réparti entre les sous-trames de la trame.
2. Procédé de codage d'un signal sonore selon la revendication 1, dans lequel :
avec la trame du signal sonore comprenant quatre sous-trames, le livre de codes de
forme d'impulsion glottique utilisé dans la troisième sous-trame de la trame, et le
budget de bits de deuxième partie de module central CELP égal à 122 bits, les nombres
de bits attribués aux quatre sous-trames (211) sont respectivement 30-30-32-30.
3. Procédé de codage d'un signal sonore selon la revendication 1 ou 2, dans lequel l'attribution
de budgets de bits respectifs aux premières parties de module central CELP comprend
l'attribution (209) aux premières parties de module central CELP de budgets de bits
respectifs attribués aux premières parties de module central CELP au moyen de tables
d'attribution de budget de bits (258).
4. Procédé de codage d'un signal sonore selon l'une quelconque des revendications 1 à
3, comprenant :
l'attribution d'un budget de bits à une signalisation de codec ; et
la soustraction (204), au budget de bits de codec total (btotal), à la fois du budget de bits de signalisation de codec et du budget de bits de modules
de codec supplémentaires (bsupplerrentary) pour déterminer le budget de bits de module central CELP fluctuant (bcore).
5. Dispositif de codage d'un signal sonore à l'aide d'un module central CELP et de modules
de codec supplémentaires d'un codeur de signal sonore (106), comprenant :
un dispositif d'attribution de budget de bits de modules de codec supplémentaires
(252) configuré pour attribuer aux modules de codec supplémentaires un budget de bits
(bsupplerrentary) fluctuant sur la base du signal sonore ;
un soustracteur (254) configuré pour soustraire, à un budget de bits de codec total
(btotal), le budget de bits de modules de codec supplémentaires (bsupplerrentary) pour déterminer un budget de bits de module central CELP fluctuant (bcore) ;
un dispositif pour attribuer le budget de bits de module central CELP fluctuant (bcore) à une pluralité de premières parties et à une deuxième partie du module central CELP,
comprenant pour une trame du signal sonore comprenant des sous-trames :
un premier dispositif d'attribution (259) configuré pour attribuer, aux premières
parties de module central CELP, des budgets de bits respectifs à partir du budget
de bits de module central CELP fluctuant ; et
un deuxième dispositif d'attribution (261) configuré pour attribuer, à la deuxième
partie de module central CELP, un budget de bits restant du budget de bits de module
central CELP fluctuant après l'attribution (259) aux premières parties de module central
CELP desdits budgets de bits respectifs, dans lequel la deuxième partie de module
central CELP est un livre de codes fixe CELP, dans lequel le module central CELP utilise,
dans une sous-trame de la trame du signal sonore, un livre de codes de forme d'impulsion
glottique, et dans lequel le deuxième dispositif d'attribution (261), lorsque le budget
de bits de deuxième partie de module central CELP ne peut pas être réparti équitablement
entre les sous-trames de la trame, est configuré pour répartir le budget de bits de
deuxième partie de module central CELP entre les sous-trames de la trame en (a) attribuant
un nombre égal le plus élevé de bits du budget de bits de deuxième partie de module
central CELP aux sous-trames de la trame et (b) attribuant à la sous-trame, à l'aide
du livre de codes de forme d'impulsion glottique, de bits restants du budget de bits
de deuxième partie de module central CELP après l'attribution du nombre égal le plus
élevé de bits aux sous-trames de la trame ;
des moyens de codage des modules de codec supplémentaires à l'aide du budget de bits
de modules de codec supplémentaires (bsupplerrentary) ; et
un codeur de module central CELP pour coder les premières parties de module central
CELP à l'aide des budgets de bits de premières parties de module central CELP respectifs,
et la deuxième partie de module central CELP à l'aide du budget de bits de deuxième
partie de module central CELP réparti entre les sous-trames de la trame.
6. Dispositif de codage d'un signal sonore selon la revendication 5, dans lequel :
avec la trame du signal sonore comprenant quatre sous-trames, le livre de codes de
forme d'impulsion glottique utilisé dans la troisième sous-trame de la trame, et le
budget de bits de deuxième partie de module central CELP égal à 122 bits, le deuxième
dispositif d'attribution (261) est configuré pour attribuer des nombres de bits aux
quatre sous-trames égaux à 30-30-32-30, respectivement.
7. Dispositif de codage d'un signal sonore selon la revendication 5 ou 6, dans lequel
le premier dispositif d'attribution (259) est configuré pour attribuer aux premières
parties de module central CELP des budgets de bits respectifs attribués aux premières
parties de module central CELP au moyen de tables d'attribution de budgets de bits
(258).
8. Dispositif de codage d'un signal sonore selon l'une quelconque des revendications
5 à 7, comprenant :
un dispositif d'attribution de budget de bits de signalisation de codec configuré
pour attribuer un budget de bits à une signalisation de codec ;
dans lequel le soustracteur (254) est configuré pour soustraire le budget de bits
de modules de codec supplémentaires (bsupplerrentary) et le budget de bits de signalisation de codec au budget de bits de codec total
(btotal) pour déterminer le budget de bits de module central CELP fluctuant (bcore).