FIELD OF THE INVENTION
[0001] The present invention relates to transcoding.
BACKGROUND OF THE INVENTION
[0002] In the last decades, networks of different type have been developed, like mobile
GSM, UMTS, CDMA and IP, providing alternative ways to the 'classical' circuit switched
network. The interconnection of all these networks leads to an interoperability problem
regarding transmission of speech. Indeed, non-compatible speech standards have been
adopted in the different networks, although, most of the codecs at medium rate (5-16,5
kbit/s for narrowband codecs, 5-25 kbit/s for wideband codecs) are based on the same
model Code Excited Linear Prediction (CELP). The simplest method to provide inter-connectivity
consists of decoding one codec standard compressed bitstream A and re-encoding it
into the other codec standard bitstream B. This conventional method is called tandem
transcoding. It suffers from several problems such as complexity, delay and degradation
of speech.
[0003] Recently, so-called 'smart transcoding' solutions have been proposed, which are based
on the fact that the different standards are based on the CELP principle. They aim
at reducing the complexity of the transcoding as many functions at encoder B can be
skipped, decreasing the delay and enhancing the quality or at least getting the same
quality as with the normal transcoding. The basic idea is to use redundancy on the
standard to avoid computing parameters that have already been computed. Reference
is made to Figure 1 that shows the principle of smart coding. When transcoding from
a bitstream format of codec A into a bitstream format of codec B, bitstream A is first
decoded in decoder A. The obtained decoded signal is then encoded into target format
B (bitstream B) by encoder B. In case both codecs are CELP codecs, bitstreams A and
B transmit a similar set of parameters, such as Linear Prediction Coding (LPC) coefficients,
pitch delays, fixed codebook indexes and fixed and adaptive gains. The key idea of
smart transcoding consists of avoiding the computation of parameters already available.
An intelligent mapping and quantization of the parameters available in bitstream A
into bitstream B parameters allow the skipping of many functions and hence reduce
the computation load of the transcoding. As depicted in Figure 1, only a partial decoding
is necessary to extract the parameters from bitstream A. Their mapping as well as
a partial encoding then builds the accurate bitstream B.
[0005] One of the possible parameters mapped between speech codecs in transcoding is the
Linear Prediction Coefficients vector (LPC). The mapping of the LPC coefficients is
relatively straightforward when the speech codecs are applied to the signal at the
same sampling frequency. A transposition of the LPCs from decoder A to encoder B leads
to good quality and reduction of complexity as shown in the above-mentioned article.
However, such a solution cannot be applied when codecs A and B employ different sampling
frequencies. In that case if the LPC filters of codec A and B model signals of different
sampling frequencies, it leads to a different number of coefficients and different
meanings of the LPC coefficients. Existing solutions that provide mapping of LPC parameters
for smart transcoding purposes are only based on the mapping of the LPC filter at
the same sampling frequency (e.g. narrowband signal at 8 kHz).
BRIEF DESCRIPTION OF THE INVENTION
[0006] An object of the present invention is thus to provide a method and an apparatus for
implementing the method so as to solve the above problem or at least to alleviate
it. The objects of the invention are achieved by a method, a computer program product,
an apparatus and a module which are characterized by what is stated in the independent
claims. The preferred embodiments of the invention are disclosed in the dependent
claims.
[0007] The invention is based on recognizing the problem and on the realization that in
transcoding between two codec formats employing different sampling frequencies, the
LPC coefficients of the LPC filter of the target codec format can be estimated by
applying a modification on the sampling frequency of the extracted LPC coefficients.
[0008] An advantage of the method and apparatus of the invention is that it enables smart
transcoding between two codec formats employing different sampling frequencies.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] In the following the invention will be described in greater detail by means of preferred
embodiments with reference to the accompanying drawings, in which
Figure 1 is a block diagram showing the principle of smart transcoding; and
Figure 2 is a block diagram of an embodiment.
DETAILED DESCRIPTION OF THE INVENTION
[0010] The following embodiments are exemplary. Although the specification may refer to
"an", "one", or "some" embodiment(s) in several locations, this does not necessarily
mean that each such a reference is to the same embodiment(s), or that the feature
only applies to a single embodiment. Single features of different embodiments may
also be combined to provide other embodiments. The present invention is applicable
to any communication system or any combination of different communication systems
such as GSM (Global System for Mobile Communications), WCDMA (Wideband Code Division
Multiple Access), WLAN (Wireless Local Area Network) UMTS (Universal Mobile Telecommunications
System), CDMA and/or IP (Internet Protocol) standard, or any other suitable standard/non-standard
communication means. The communication system may be a fixed communication system
or a wireless communication system or a communication system utilizing both fixed
networks and wireless networks. The protocols used and the specifications of communication
systems, especially in wireless communication, develop rapidly. Such a development
may require extra changes to an embodiment. Therefore, all terms and expressions should
be interpreted broadly and they are intended to illustrate, not to restrict, the embodiment.
In the following, different embodiments will be described using, as an example a system
architecture to which the embodiments may be applied, without restricting the embodiment
to such an architecture, however.
[0011] Notation and environment: in the following, two codecs A and B based on an LPC analysis
at sampling frequencies
Fs(
A) and
Fs(
B) respectively are considered. The CELP codecs family is a subset of such codecs.
A transcoding scheme where a signal
sA(
t) at sampling frequency
Fs(
A) is encoded by encoder A is considered. In a 'classical' transcoding scheme (i.e.
without smart transcoding), the signal is decoded into a pcm signal which is resampled
into the sampling frequency
Fs(
B) into a signal
sB(
t) and signal
sB(
t) is then encoded by encoder B.
[0012] The LPC analysis within encoder A provides an autoregressive (AR) model of signal
sA(
t), so that an approximation of the signal
sA(
t) is given by:

[0013] In this example the LPC filter

with
ao =1 is considered.
[0014] Similarly, encoder B provides an AR estimate of
sB(
t) through its LPC analysis:

[0015] In that case the LPC filter is:

[0016] Taking into account these notations, the invention generally deals with the process
of finding an estimation of filter
B(
z) when LPC filter
A(
z) is known. Let's note as
B̂(
z) the filter obtained from
A(
z). The coefficient of the constructed filter
B̂(
z) can be mapped into encoder B in a similar way as smart transcoding based on LPC
mapping, thus avoiding the computation of the LPC coefficients within encoder B and
accordingly saving computation load. Filters
A(
z) and
B(
z) are AR models of two signals with different sampling frequencies. To obtain the
coefficient of filter
B̂(
z), coefficients
ai need to be extrapolated if N > M or interpolated if N < M. The interpolation/extrapolation
can be seen as a modification of the sampling frequency of signal [
ai]
i=0...M (or alternatively [
ai],
i=1...M) from sampling frequency
Fs(
A) to sampling frequency
Fs(
B). Accordingly, according to an embodiment, finding the coefficients of
B̂(
z) that approximate filter
B(
z) can be done through the following steps:
- 1. Extracting LPC coefficients [ai], i=1...M from bitstream A in decoder A
- 2. Applying a modification of the sampling frequency on LPC coefficients [ai], i=1...M, thus obtaining coefficients b̂i
- 3. Mapping coefficients b̂i in encoder B for quantization and for computation of the rest of the coefficients
of encoder B (in CELP codecs e.g. pitch, gains, fixed codebook).
[0017] In step 2 above it is alternatively possible to apply the modification of the sampling
frequency on LPC coefficients [
ai]
i=0...M. In that case the target LPC filter is preferably forced to set b
0=1. According to an embodiment, modification of the sampling frequency comprises up-sampling
the extracted linear predictive coding coefficients when the sampling frequency of
the target codec (B) format is higher than the sampling frequency of the source codec
(A) format. According to an embodiment, the up-sampling factor is equal to the ratio
of the sampling frequency of the target codec format to the sampling frequency of
the source codec format. According to an embodiment, modification of the sampling
frequency comprises down-sampling the extracted linear predictive coding coefficients
when the sampling frequency of the target codec format is lower than the sampling
frequency of the source codec format. According to an embodiment, the down-sampling
factor is equal to the ratio of the sampling frequency of the second codec format
to the sampling frequency of the first codec format. Acccordingly, when applying a
modification of the sampling frequency to LPC coefficients [
ai],
i=1...M from sampling frequency
Fs(
A) to sampling frequency
Fs(
B), M*
Fs(
B)/
Fs(A) coefficients
b̂i are obtained. According to an embodiment, the number of coefficients
b̂i can be further adjusted to the number of coefficients of target LPC filter
B(
z) if necessary. For instance if M*
Fs(
B)/
Fs(
A)>N, the number of coefficients
b̂i can be restricted to N, and if M*
Fs(
B)/
Fs(
A)>N, N- M*
Fs(
B)/
Fs(
A), zeros can be added to the vector
b̂t.
[0018] Figure 2 is a block diagram of an apparatus according to an embodiment. Different
modules or units 10, 20 and 30 of the apparatus may be implemented in one or more
physical or logical entities. Figure 2 is a simplified diagram that only shows some
elements and functional entities relevant to understanding the various embodiments
described here and whose implementation may differ from what is shown. The connections
shown in Figure 2 are logical connections; the actual physical connections may be
different. In the example shown bitstream A of codec format A enters decoder A 10.
Decoder A 10 may be a plain decoder or a codec unit, for example. In Decoder A 10
bitstream A is partially decoded by extracting at least LPC coefficients from bitstream
A. Other parameters, such as pitch delays, fixed codebook indexes, and fixed and adaptive
gains, may also be extracted. The LPC coefficients and possible other extracted parameters
as well as the partially decoded bitstream (signal) are further transmitted to a frequency
modification unit 30. The frequency modification unit 30 applies a modification of
the sampling frequency to the LPC coefficients according to the embodiments described
above. According to an embodiment, the partially decoded bitstream (signal) is up-sampled
or down-sampled from the sampling frequency employed by source codec format A to the
sampling frequency employed by target codec format B. This is also preferably done
in the frequency modification unit 30. The modified LPC coefficients and possible
other parameters as well as the modified signal are then transmitted to encoder B
20. Encoder B 20 may be a plain encoder or a codec unit, for example. In encoder B
20 the modified LPC coefficients are mapped into LPC coefficients of codec format
B and the partially decoded bitstream is encoded into a bitstream of codec format
B using the mapped LPC coefficients. It should be noted that the partial encoding,
i.e. the extraction of LPC coefficients in decoder A and the mapping of parameters
and encoding in encoder B, can be performed in a similar manner as in existing transcoding
solutions. Therefore, they need not to be discussed in more detail here. It should
be further noted that not only e.g. existing mapping schemes can be used but also
any future mapping schemes may be utilized.
[0019] The modification of the sampling frequency (step 2 above) can be implemented in many
different ways. Concrete performance of smart transcoding depends on the way the modification
of the sampling frequency is done. One possible problem related to up-sampling and
down-sampling deals with smoothing that may appear either in low frequency or high
frequency of the vector [
b̂i]. Therefore, it is preferable to enhance the obtained [
b̂i] by resynthesizing properly the lower or higher frequency. In order to achieve this,
a separate step may be used before the mapping step 3 above, in which an appropriate
property of filter
B̂(
z), such as (but not restricted to) frequency response in the low and high frequency,
is assured.
[0020] The following will now describe in more detail an implementation example according
to an embodiment. The example presents transcoding between AMR 12.2 kbit/s and AMRWB
23.05 kbit/s codecs. It should be noted, however, that the use of the invention is
not restricted to any particular codec format or standard or a particular mode of
a given codec format. For example, the following codec formats could be used in connection
with the invention: Full Rate (FR), Half Rate (HR), Enhanced Full Rate (EFR), Adaptive
Multi-Rate (AMR), Adaptive Multi Rate WideBand (AMR-WB), Adaptive Multi Rate WideBand
plus, G.723.1, G.729, G.729.1, Enhanced Variable Rate Codec (EVRC), Variable-Rate
Multi-Mode Wideband (VMR-WB) and Speex.
[0021] In the example the source codec format is AMR and the target codec format is AMR-WB.
The AMR codec processes signals at a sampling frequency of
Fs(
A) = 8 kHz and provides an LPC analysis on 10 coefficients. The AMR-WB codec operates
with a signal of 16 kHz and its LPC analysis is done on a signal of
Fs(
B) =12.8 kHz (a down-sampling is applied within the encoding). The LPC filter of the
AMR-WB has 16 coefficients. In this case M*
Fs(
B)/
Fs(
A) = N. Thus, the correct amount of LPC coefficients may be obtained directly by the
modification of the sampling frequency. The modification of the sampling frequency
may be done in two phases such that first an up-sampling of vector [
ai], i=0...M by a factor 3 is applied. A low pass filter is preferably applied to the
up-sampled signal to avoid aliasing. A down-sampling by a factor 2 is then achieved.
Thus the total up-sampling factor is 3/2. It has to be noted that considering
Fs(
B) and
Fs(
A), the factors of the down-sampling and up-sampling could have been 8 and 5, respectively.
But the numbers 3 and 2 can lead to a better performance as the smoothing applied
to the low or high frequency are less important. Considering that i=0...M, b
0 is set to be 1 and thus the resulting number of coefficients is 1 + 10*3/2 = 16.
[0022] Additionally in the exemplary embodiment, the obtained [
b̂i] can be enhanced through the following exemplary processing: the zeros of filter
B̂(
z) are modified by taking into account the zeros of filter
A(
z) and of an additive estimated filter
B̂1(
z). Such an operation makes it possible to avoid smoothing in the down-sampling phase
of the above example which tends to reduce the number of zeros of the LPC analysis.
B̂(
z) is preferably designed so that smoothing is applied to the high frequency and no
smoothing to the low frequency.
B̂1(
z) is designed reversely by high smoothing in the low frequency and low smoothing in
the high frequency domain.
A(
z) presents only information and zeros in the low frequency (since LPC filter
A(
z) models a 8 kHz signal and has no zeros above 4 kHz). Accordingly with
A(
z) and
B̂1(
z), we consider two additional filters which apply a correction to the zeros of
B̂(
z) in the low and high frequency, respectively. It permits an accurate estimation of
B(
z), providing good performance of the smart transcoding based on mapping of the LPC.
[0023] In the above-described detailed example, an up-sampling was applied to the LPC coefficients
because of transcoding from 8 kHz AMR codec format to 12.8 kHz AMR-WB codec format.
Transcoding from e.g. AMR-WB codec format to AMR codec format can be arranged in a
similar manner but by applying down-sampling to the LPC coefficients instead of up-sampling.
[0024] An apparatus according to an embodiment, such as the one shown in Figure 2, may be
implemented as one unit (e.g. a transcoding unit) or as two or more separate units
that are configured to implement the functionality of the various embodiments described.
Here the term 'unit' refers generally to a physical or logical entity, such as a physical
device or a part thereof or a software routine. For example, units 10, 20 and 30 may
be physically separate units or implemented as one entity.
[0025] An apparatus according to any one of the embodiments can be implemented by means
of a computer or corresponding digital signal processing equipment with suitable software
therein, for example. Such a computer or digital signal processing equipment preferably
comprises at least a working memory (RAM) providing storage area used for arithmetical
operations and a central processing unit (CPU), such as a general-purpose digital
signal processor (DSP). The CPU may comprise a set of registers, an arithmetic logic
unit, and a control unit. The control unit is controlled by a sequence of program
instructions transferred to the CPU from the RAM. The control unit may contain a number
of microinstructions for basic operations. The implementation of microinstructions
may vary depending on the CPU design. The program instructions may be coded by a programming
language, which may be a high-level programming language, such as C, Java, etc., or
a low-level programming language, such as a machine language, or an assembler. The
computer may also have an operating system which may provide system services to a
computer program written with the program instructions. It is also possible to use
a specific integrated circuit or circuits, or corresponding components and devices
for implementing the functionality according to any one of the embodiments
[0026] The invention can be implemented in existing system elements, such as various communication
system elements, or by using separate dedicated elements or devices in a centralized
or distributed manner. An example of such a system element is a media gateway or an
internet protocol telephony gateway. Present elements for communication systems typically
comprise processors and memory that can be utilized in the functions according to
the embodiments. Thus, all modifications and configurations required for implementing
an embodiment in existing devices may be performed as software routines, which may
be implemented as added or updated software routines. If the functionality of the
embodiments is implemented by software, such software can be provided as a computer
program product comprising computer program code which, when run on a computer, causes
the computer or corresponding arrangement to perform the functionality according to
the invention as described above. Such a computer program code can be stored on a
computer readable medium, such as suitable memory means, e.g. a flash memory or a
disc memory, from which it is loadable to the unit or units executing the program
code. In addition, such a computer program code implementing the invention can be
loaded to the unit or units executing the computer program code via a suitable data
network, for example, and it can replace or update a possibly existing program code.
[0027] A frequency modification unit 30 may be implemented as a module for interfacing between
two codec formats. Such a module may be a physical device, a part of a physical device
or a software module, for example. According to an embodiment, such a module is configured
to modify the sampling frequency of extracted linear predictive coding coefficients
according to the various embodiments described. For this purpose the module may comprise
an up/down-sampling unit. Further such a module is configured to receive the linear
predictive coding coefficients extracted from a bitstream from a decoder and to send
the linear predictive coding coefficients obtained from the modification of the sampling
frequency to an encoder. For this purpose the module may comprise e.g. suitable input
and output terminals and receiving and sending units in connection thereto.
[0028] It will be obvious to a person skilled in the art that, as the technology advances,
the inventive concept can be implemented in various ways. The invention and its embodiments
are not limited to the examples described above but may vary within the scope of the
claims.
1. A method for transcoding, the method comprising:
partially decoding a first bitstream of a first codec format by extracting at least
linear predictive coding coefficients from the first bitstream;
mapping the extracted linear predictive coding coefficients into linear predictive
coding coefficients of a second codec format; and
encoding the partially decoded first bitstream into a second bitstream of a second
codec format using the mapped linear predictive coding coefficients, characterized in that the first and second codec formats employ different sampling frequencies and in that the method comprises:
modifying the sampling frequency of the extracted linear predictive coding coefficients
before the mapping of the extracted linear predictive coding coefficients.
2. A method according to claim 1,
characterized in that the modifying of the sampling frequency of the extracted linear predictive coding
coefficients comprises:
up-sampling the extracted linear predictive coding coefficients when the sampling
frequency of the second codec format is higher than the sampling frequency of the
first codec format.
3. A method according to claim 2, characterized in that the up-sampling factor is equal to the ratio of the sampling frequency of the second
codec format to the sampling frequency of the first codec format.
4. A method according to claim 1, 2 or 3,
characterized in that the modifying of the sampling frequency of the extracted linear predictive coding
coefficients comprises:
down-sampling the extracted linear predictive coding coefficients when the sampling
frequency of the second codec format is lower than the sampling frequency of the first
codec format.
5. A method according to claim 4, characterized in that the down-sampling factor is equal to the ratio of the sampling frequency of the second
codec format to the sampling frequency of the first codec format.
6. A method according to any one of claims 1 to 5, characterized in that the method comprises up-sampling or down-sampling the partially decoded first bitstream
from the sampling frequency employed by the first codec format to the sampling frequency
employed by the second codec format before encoding.
7. A method according to any one of claims 1 to 6, characterized in that the method comprises adjusting the number of linear predictive coding coefficients
after modifying the sampling frequency of the extracted linear predictive coding coefficients
to the number of coefficients required for encoding the partially decoded first bitstream
into a second bitstream of a second codec format.
8. A method according to any one of claims 1 to 7, characterized in that the first and/or the second codec format is selected from the following: Full Rate,
Half Rate, Enhanced Full Rate, Adaptive Multi-Rate, Adaptive Multi Rate WideBand,
Adaptive Multi Rate WideBand plus, G.723.1, G.729, G.729.1, Enhanced Variable Rate
Codec, Variable-Rate Multi-Mode Wideband and Speex.
9. A method according to any one of claims 1 to 8, characterized in that the first and second codec formats are Adaptive Multi-Rate employing a sampling frequency
of 8 kHz and Adaptive Multi Rate WideBand employing a sampling frequency of 12.8 kHz.
10. A computer program product comprising computer program code, wherein the execution
of the program code in a computer causes the computer to carry out the steps of the
method according to any one of claims 1 to 9.
11. An apparatus for transcoding comprising:
means for partially decoding a first bitstream of a first codec format by extracting
at least linear predictive coding coefficients from the first bitstream;
means for mapping the extracted linear predictive coding coefficients into linear
predictive coding coefficients of a second codec format; and
means for encoding the partially decoded first bitstream into a second bitstream of
a second codec format using the mapped linear predictive coding coefficients, characterized in that the apparatus comprises means for modifying the sampling frequency of the extracted
linear predictive coding coefficients before the mapping of the extracted linear predictive
coding coefficients.
12. An apparatus according to claim 11, characterized in that the means for modifying the sampling frequency of the extracted linear predictive
coding coefficients is arranged to up-sample the extracted linear predictive coding
coefficients when the sampling frequency of the second codec format is higher than
the sampling frequency of the first codec format.
13. An apparatus according to claim 12, characterized in that the up-sampling factor is equal to the ratio of the sampling frequency of the second
codec format to the sampling frequency of the first codec format.
14. An apparatus according to claim 11, 12 or 13, characterized in that the means for modifying the sampling frequency of the extracted linear predictive
coding coefficients is arranged to down-sample the extracted linear predictive coding
coefficients when the sampling frequency of the second codec format is lower than
the sampling frequency of the first codec format.
15. An apparatus according to claim 14, characterized in that the down-sampling factor is equal to the ratio of the sampling frequency of the second
codec format to the sampling frequency of the first codec format.
16. An apparatus according to any one of claims 11 to 15, characterized in that the apparatus comprises means for up-sampling or down-sampling the partially decoded
first bitstream from the sampling frequency employed by the first codec format to
the sampling frequency employed by the second codec format before encoding.
17. An apparatus according to any one of claims 11 to 16, characterized in that the apparatus comprises means for adjusting the number of linear predictive coding
coefficients after the modification of the sampling frequency of the extracted linear
predictive coding coefficients to the number of coefficients required for encoding
the partially decoded first bitstream into a second bitstream of a second codec format.
18. An apparatus according to any one of claims 11 to 17, characterized in that the first and/or the second codec format is selected from the following: Full Rate,
Half Rate, Enhanced Full Rate, Adaptive Multi-Rate, Adaptive Multi Rate WideBand,
Adaptive Multi Rate WideBand plus, G.723.1, G.729, G.729.1, Enhanced Variable Rate
Codec, Variable-Rate Multi-Mode Wideband and Speex.
19. An apparatus according to any one of claims 11 to 18, characterized in that the first and second codec formats are Adaptive Multi-Rate employing a sampling frequency
of 8 kHz and Adaptive Multi Rate WideBand employing a sampling frequency of 12.8 kHz.
20. A module for interfacing between codec formats,
characterized in that the module comprises:
means for receiving from a decoder linear predictive coding coefficients extracted
from a bitstream;
means for modifying the sampling frequency of the extracted linear predictive coding
coefficients; and
means for sending the linear predictive coding coefficients obtained from the modification
of the sampling frequency to an encoder.
21. A module according to claim 20,
characterized in that the module comprises:
means for receiving from the decoder a partially decoded bitstream from which at least
the linear predictive coding coefficients have been extracted;
means for up-sampling or down-sampling the partially decoded bitstream from the sampling
frequency employed by the decoder to the sampling frequency employed by the encoder;
and
means for sending the up- or down-sampled partially decoded bitstream to the encoder.
22. A module according to claim 20 or 21, characterized in that the module is a module for a gateway.
23. A module according to claim 22, characterized in that the gateway is a media gateway or an internet protocol telephony gateway.