[Technical Field]
[0001] The present invention relates to an audio reproducing device which decodes an encoded
audio signal and reproduces the decoded audio signal.
[Background Art]
[0002] There is a conventional audio reproducing device which receives a low-band audio
signal and bandwidth extension information, and generates an extended high-band audio
signal by using a spectral band replication (hereinafter, referred to as SBR) technique.
SBR reconstructs the high-band of the received signal by predicting the high-band
with reference to side information included in the bandwidth extension information.
Here, only a small amount of side information is necessary; and thus, the SBR enhances
the sound quality of the encoded audio signal at low bit rates.
[0003] Two types of SBR are defined, which are high-quality SBR (hereinafter, referred to
as HQ-SBR) and low-power SBR (hereinafter, referred to as LP-SBR).
[0004] HQ-SBR performs complex arithmetic for overall processing of sub-band analysis, high-band
generation, and sub-band synthesis. Thus, HQ-SBR is suitable for enhancing sound quality,
but requires a large amount of computation.
[0005] LP-SBR performs real number operations instead of the complex arithmetic of HQ-SBR.
LP-SBR is designed to reduce aliasing distortion generated by the real number operation.
Thus, LP-SBR is capable of significantly reducing the amount of computation, and achieving,
at low bit rates, the sound quality equivalent to that of HQ-SBR. It is known that
LP-SBR requires only approximately half the amount of processing that is required
in HQ-SBR (See Non-Patent Literature (NPL) 1).
[0006] SBR is used in combination with Advanced Audio Coding (AAC), and the combined configuration
is referred to as High-Efficiency AAC (HE-AAC) profile. In combination with AAC, it
is known that AAC+LP-SBR requires only approximately 70 % of the processing amount
that is required in AAC+HQ-SBR (see NPL 1).
[0007] There is also a conventional reproducing device which receives a monaural audio signal
and stereo information, and performs stereo processing on the monaural audio signal
based on the stereo information to generate a stereo audio signal. The stereo processing
is known as Parametric Stereo (hereinafter, referred to as PS), and used in combination
with SBR. PS commonly uses a complex Quadrature Mirror Filter (QMF) with SBR for stereo
processing (see NPL 2).
[0008] It is known that PS is used in combination with AAC and SBR, and the combined configuration
is referred to as HE-AACv2 profile. PS needs to be used in combination with HQ-SBR
which uses the complex QMF (see Non-Patent Literatures 2 and 3). When there is no
PS data, AAC may be used in combination with either HQ-SBR or LP-SBR.
[0009] The HE-AAC profile and HE-AACv2 profile have a concept of levels. The higher the
level is, the more variety of types of signals can be decoded. Examples of the types
here include maximum sampling frequency or maximum number of channels of an encoded
input audio signal, and maximum sampling frequency of a decoded output audio signal
(see NPL 3).
[Citation List]
[Non Patent Literature]
[0010]
[NPL 1] Mitsutoshi HATORI, "One segment broadcasting textbook", Impress Japan, June 15, 2005
[NPL 2] Toshiyuki NOMURA, "Latest trends and applications of MPEG audio", [online], University
of the Ryukyus Computing and Networking center publication No. 5, April 2008, [searched
on September 17, 2008], Internet <URL: http://www.cc.u-ryukyu-ac.jp/news/kouhoujNo5/2-5.pdf>
[NPL 3] ISO/IEC 14496-3:2005/FDAM2, "Information technology-Coding of audio-visual objects-Part
3: Audio, AMENDMENT 2: Audio Lossless Coding (ALS), new audio profiles and BSAC extensions"
August 2005
[Summary of Invention]
[Technical Problem]
[0011] However, in order for the conventional techniques of decoding encoded audio signals
to comply with HE-AACv2 profile and higher levels, HQ-SBR needs to be used which requires
a large amount of computation. As a result, for example, in the case where an encoded
input audio signal is a multi-channel signal, computation amount (processing amount)
significantly increases. Furthermore, attempts to solve the problem by using the conventional
techniques result in generating abnormal sounds in the decoded audio signal. Details
are described below.
[0012] According to NPL 3, as described above, in the case where the technique of decoding
encoded audio signals complies with HE-AACv2 profile and there is PS data, processing
needs to be performed in combination with HQ-SBR. However, in the case where there
is no PS data, processing may be performed in combination with either HQ-SBR or LP-SBR.
[0013] For example, in view of NPL 3, for preventing the increase in the computation amount,
such a method can be considered which switches SBR depending on the state of the stream
to be decoded. More specifically, when HQ-SBR needs to be used, that is, when there
is PS data, HQ-SBR is used. In other cases, that is, when there is no PS data, LP-SBR
is used for reducing the increase in the computation amount.
[0014] Here, in the case where a stream includes a plurality pieces of normal PS data but
a piece of PS data is missing at some point in the stream, HQ-SBR is switched to LP-SBR.
Alternatively, LP-SBR is switched to HQ-SBR when the state where a stream includes
PS data without any missing, but SBR and stereo processing cannot be executed because
SBR header is not yet obtained, is changed to the state where the SBR header is obtained.
[0015] As described earlier, HQ-SBR performs complex arithmetic for QMF filtering, and LP-SBR
performs real number operations for QMF filtering. Thus, HQ-SBR and LP-SBR have different
formats of delay information, which does not allow HQ-SBR and LP-SBR to share the
delay information of the QMF filtering. As a result, delay information of the QMF
filtering becomes discontinuous at the time of switching of SBR, thereby generating
abnormal sounds.
[0016] In FIG. 7, (a) shows an output audio signal for a single channel in the case where
SBR is switched at times t0 and t2. It is shown that abnormal sounds are generated
during the periods between t0 and t1 and between t2 and t3 because delay information
cannot be used due to the switching of SBR (in FIG. 7, (b) shows a normal audio signal).
In such a manner, attempts to prevent the increase in computation amount by switching
SBR results in generating abnormal sounds at the time of switching of SBR.
[0017] The present invention has been conceived in order to solve the problem, and has an
object to provide an audio reproducing device and an audio reproducing method which
prevent occurrence of abnormal sounds without significantly increasing the computation
amount even when an encoded input audio signal is a multi-channel signal.
[Solution to Problem]
[0018] In order to solve the problem, an audio reproducing device according to an aspect
of the present invention is an audio reproducing device which reproduces a stream
including a basic codec that is an encoded audio signal. The audio reproducing device
includes: a stream separating unit which separates, on a frame basis, the stream into
the basic codec and bandwidth extension information that is used for extending a band
of the basic codec; a basic codec information analyzing unit which analyzes the basic
codec separated by the stream separating unit, to generate analysis information indicating
a type of the basic codec; a basic codec decoding unit which decodes the basic codec
in accordance with the analysis information generated by the basic codec information
analyzing unit, to generate a decoded basic codec signal; a first bandwidth extension
processing unit which executes first processing which extends, by using the bandwidth
extension information, a frequency band of the decoded basic codec signal generated
by the basic codec decoding unit; a second bandwidth extension processing unit which
executes second processing which extends, by using the bandwidth extension information,
the frequency band of the decoded basic codec signal generated by the basic codec
decoding unit, the second processing being executed with an accuracy higher than an
accuracy of the first processing; and a switching unit which switches between the
first bandwidth extension processing unit and the second bandwidth extension processing
unit based on the analysis information.
[0019] According to the structure, two separate processing having different processing amount
are switched based on the analysis information indicating the type of basic codec.
As a result, more appropriate processing can be selected. Thus, for example, even
when an input encoded audio signal is a multi-channel signal, the computation amount
(processing amount) does not significantly increase. In addition, processing is switched
based on the analysis information; and thus, processing is not switched while the
type of the basic codec is the same. As a result, it is possible to prevent abnormal
sounds which may occur at the time of switching of processing.
[0020] It may also be that the stream separating unit separates, on the frame basis, the
stream into the basic codec, the bandwidth extension information, and stereo extension
information that is used for performing stereo processing on the basic codec, and
that the audio reproducing device further includes: a stereo extension processing
unit which performs, by using the stereo extension information, stereo processing
on the decoded basic codec signal having the frequency band extended by the second
bandwidth extension processing unit.
[0021] Accordingly, when the basic codec is a monaural audio signal, proper stereo processing
can be performed.
[0022] It may also be that the basic codec information analyzing unit analyzes the basic
codec separated by the stream separating unit, to generate analysis information including
at least one of channel information and sampling frequency information, the channel
information indicating the number of channels of the basic codec, the sampling frequency
information indicating a sampling frequency of the basic codec, and that the switching
unit determines at least one of (i) whether the number of channels indicated by the
channel information is greater than a predetermined first threshold and (ii) whether
the sampling frequency indicated by the sampling frequency information is greater
than a predetermined second threshold, and select the first bandwidth extension processing
unit when at least one of the following is determined: (i) the number of channels
is greater than the predetermined first threshold and (ii) the sampling frequency
is greater than the predetermined second threshold.
[0023] According to the structure, in the case where the basic codec has a large number
of channels, that is, where the basic codec is multi-channel, a first processing is
selected which requires less processing amount but produces lower accuracy. As a result,
it is possible to prevent processing amount from significantly increasing compared
to a single channel signal. Alternatively, when the sampling frequency of the basic
codec is high, the first processing is also selected which requires less processing
amount but produces lower accuracy. Thus, similarly, it is possible to prevent the
processing amount from significantly increasing compared to the case where the basic
codec with lower sampling frequency is processed.
[0024] It may also be that the audio reproducing device further includes a buffer which
stores stereo extension information of a first frame, wherein the stereo extension
processing unit performs stereo processing on a decoded basic codec signal of a second
frame by using the stereo extension information stored in the buffer, the second frame
being a frame after the first frame and being a frame in which the stereo extension
information is missing.
[0025] Accordingly, stereo extension information used for stereo processing is stored in
a buffer, and the stereo extension information stored in the buffer is used when stereo
extension information cannot be obtained. Thus, even when a stream includes a frame
in which stereo extension data is missing, stereo processing can be properly performed
on the frame.
[0026] It also may be that the second bandwidth extension processing unit generates a high-frequency
component signal from the decoded basic codec signal by using the bandwidth extension
information, the stereo extension processing unit performs, by using the stereo extension
information, stereo processing on the decoded basic codec signal and the high-frequency
component signal generated by the second bandwidth extension processing unit, to generate
a decoded basic codec signal and a high-frequency component signal for a first channel
and a decoded basic codec signal and a high-frequency component signal for a second
channel, and the second bandwidth extension processing unit further includes a band
synthesis filter for synthesizing the high-frequency component signal and the decoded
basic codec signal that have been generated, and synthesizes bands of the second channel
by using delay information that is stored in the band synthesis filter of the first
channel, as delay information stored in the band synthesis filter of the second channel,
when the stereo extension information is missing.
[0027] According to the structure, even when delay information only for a single channel
is obtained, the obtained delay information is used as delay information for the other
channel. As a result, bands of the respective signals of two channels can be properly
synthesized.
[0028] It may also be that the basic codec is an audio signal encoded according to Advanced
Audio Coding (AAC) scheme, the bandwidth extension information is Spectral Band Replication
(SBR) information generated according to SBR scheme, the stereo extension information
is Parametric Stereo (PS) information generated according to PS scheme, the first
bandwidth extension processing unit extends a frequency band of the decoded basic
codec signal according to Low Power-SBR (LP-SBR) scheme, and the second bandwidth
extension processing unit extends a frequency band of the decoded basic codec signal
according to High Quality-SBR (HQ-SBR) scheme.
[0029] The present invention may be implemented not only as an audio reproducing device,
but also as an audio reproducing method which includes processing units of the audio
reproducing device as steps. The present invention may be also implemented as a program
causing a computer to execute these steps. Furthermore, the present invention may
be implemented as a computer-readable recording medium, such as a Compact Disc-Read
Only Memory (CD-ROM), which records the program therein, and as information, data,
or signals indicating the program. Such program, information, data, and signals may
be distributed over a communication network such as the Internet.
[0030] In addition, part or all of the elements included in each audio reproducing device
above may be in a form of a single system large scale integration (LSI). The system
LSI is a ultra-multifunctional LSI which is produced by integrating a plurality of
constitutional units on a single chip. More specifically, the system LSI is a computer
system including, for example, a microprocessor, a ROM, and a Random Access Memory
(RAM).
[Advantageous Effects of Invention]
[0031] According to the present invention, it is possible to prevent occurrence of abnormal
sounds without significantly increasing the computation amount even when an encoded
input audio signal is a multi-channel signal.
[Brief Description of Drawings]
[0032]
[FIG. 1]
FIG. 1 is a block diagram illustrating an example of a structure of an audio reproducing
device according to Embodiment 1.
[FIG. 2]
FIG. 2 is a flowchart of an example of operations of the audio reproducing device
according to Embodiment 1.
[FIG. 3]
FIG. 3 is a flowchart of a specific example of operations of a switching unit according
to Embodiment 1.
[FIG. 4]
FIG. 4 is a diagram illustrating an example of an input stream which includes stereo
extension data.
[FIG. 5]
FIG. 5 is a diagram illustrating an example of an input stream which does not include
stereo extension data.
[FIG. 6]
FIG. 6 is a diagram illustrating an example of an input stream including a frame in
which stereo extension data is missing.
[FIG. 7]
FIG. 7 is a diagram illustrating an example of waveforms of output audio signals.
[FIG. 8]
FIG. 8 is a block diagram illustrating an example of a structure of an audio reproducing
device according to Embodiment 2.
[FIG. 9]
FIG. 9 is a flowchart of an example of operations of a stereo extension processing
unit according to Embodiment 2.
[FIG. 10]
FIG. 10 is a diagram illustrating an example of waveforms of stereo audio signals
to be output.
[FIG. 11]
FIG. 11 is an external view of an example of an audio reproducing apparatus incorporating
an audio reproducing device according to the present invention.
[Description of Embodiments]
[0033] Hereinafter, embodiments of an audio reproducing device according to the present
invention will be described with reference to the drawings.
(Embodiment 1)
[0034] An audio reproducing device according to Embodiment 1 is characterized in switching
between two bandwidth extension processing having different characteristics based
on an analysis result of basic codec, regardless of validity of stereo extension information
used for performing stereo processing on a monaural audio signal. The two bandwidth
extension processing are: processing which requires larger processing amount but produces
higher accuracy, that is, processing for outputting an audio signal with excellent
sound quality; and processing which requires less processing amount but produces lower
accuracy.
[0035] FIG. 1 is a block diagram illustrating an example of a structure of an audio reproducing
device 100 according to Embodiment 1. The audio reproducing device 100 in FIG. 1 includes:
a stream separating unit 101; a basic codec analyzing unit 102; a basic codec decoding
unit 103; a bandwidth extension data analyzing unit 104; a stereo extension data analyzing
unit 105; a first bandwidth extension processing unit 106; a second bandwidth extension
processing unit 107; a stereo extension processing unit 108; and a switching unit
109.
[0036] The stream separating unit 101 separates an input stream into basic codec, bandwidth
extension data, and stereo extension data. When an input stream includes no stereo
extension data, the stream separating unit 101 separates the stream into basic codec
and bandwidth extension data. The stream separating unit 101 then transmits the separated
basic codec to the basic codec analyzing unit 102, transmits the bandwidth extension
data to the bandwidth extension data analyzing unit 104, and transmits the stereo
extension data to the stereo extension data analyzing unit 105.
[0037] Here, the stream input to the audio reproducing device 100 is, for example, a stream
having HE-AACv2 profile. The basic codec is an encoded audio signal, and is, for example,
an audio signal encoded in accordance with AAC scheme. The bandwidth extension data
is data used for extending bandwidth of the basic codec, and is, for example, SBR
data. The stereo extension data is data used for performing stereo processing on a
monaural audio signal, and is, for example, PS data.
[0038] The basic codec analyzing unit 102 generates basic codec analysis information by
analyzing the basic codec transmitted from the stream separating unit 101. The basic
codec analysis information includes, for example, channel information representing
the number of channels (CH) of the basic codec, and sampling frequency information
representing the sampling frequency (FS) of the basic codec. The basic codec analyzing
unit 102 transmits the generated basic codec analysis information to the basic codec
decoding unit 103. Of the basic codec analysis information, the basic codec analyzing
unit 102 also transmits the channel information and the sampling frequency information
to the switching unit 109.
[0039] The basic codec decoding unit 103 decodes the basic codec by using the basic codec
analysis information transmitted from the basic codec analyzing unit 102, and generates
a decoded basic codec signal. The basic codec decoding unit 103 then transmits the
decoded basic codec signal to the switching unit 109.
[0040] The bandwidth extension data analyzing unit 104 analyzes the bandwidth extension
data transmitted from the stream separating unit 101 to generate bandwidth extension
information, and transmits the generated bandwidth extension information to the switching
unit 109. The bandwidth extension information includes, for example, side information
used for prediction for reconstruction of high band of the decoded basic codec signal
using the SBR technique.
[0041] The stereo extension data analyzing unit 105 analyzes the stereo extension data transmitted
from the stream separating unit 101 to generate stereo extension information, and
transmits the generated stereo extension information to the stereo extension processing
unit 108. The stereo extension information is, for example, information used for performing
stereo extension processing (also referred to as stereo processing) on a monaural
audio signal using the PS technique.
[0042] The first bandwidth extension processing unit 106 extends the frequency band of the
decoded basic codec signal by using the bandwidth extension information transmitted
from the switching unit 109 to output an audio signal. More specifically, the first
bandwidth extension processing unit 106 predicts and generates high frequency components
by using the bandwidth extension information, and synthesizes the bands of the generated
high frequency component signal and the decoded basic codec signal to output an audio
signal.
[0043] Here, the first bandwidth extension processing unit 106 has an advantage over the
second bandwidth extension processing unit 107 in that the first bandwidth extension
processing unit 106 requires less processing amount for processing a same signal.
However, the sound quality of the audio signal output by the first bandwidth extension
processing unit 106 is lower than that of the audio signal output by the second bandwidth
extension processing unit 107. The first bandwidth extension processing unit 106 performs,
for example, bandwidth extension based the LP-SBR scheme.
[0044] The second bandwidth extension processing unit 107 extends the frequency band of
the decoded basic codec signal by using the bandwidth extension information transmitted
from the switching unit 109 to output an audio signal. More specifically, the second
bandwidth extension processing unit 107 predicts and generates high frequency components
by using the bandwidth extension information, and synthesizes the bands of the generated
high frequency component signal and the decoded basic codec signal to output an audio
signal.
[0045] Here, the sound quality of the audio signal output by the second bandwidth extension
processing unit 107 is higher than that of the audio signal output by the first bandwidth
extension processing unit 106. However, the second bandwidth extension processing
unit 107 requires processing amount larger than that of the first bandwidth extension
processing unit 106. The second bandwidth extension processing unit 107 performs,
for example, bandwidth extension based the HQ-SBR scheme.
[0046] Generally, when encoding an audio signal (that is, when generating basic codec),
high frequency components are removed to reduce encoding amount. Thus, the decoded
basic codec signal is an audio signal mainly including low frequency components. The
bandwidth extension performed by the first bandwidth extension processing unit 160
and the second bandwidth extension processing unit 107 are processing in which the
removed high frequency components are predicted and generated by using bandwidth extension
information.
[0047] More specifically, the first bandwidth extension processing unit 106 and the second
bandwidth extension processing unit 107 each includes a band synthesis filter. The
first and second bandwidth extension processing units 106 and 107 reconstruct an output
audio signal that is close to an original sound by synthesizing the bands of the decoded
basic codec signal generated by the basic codec decoding unit 103 and the high frequency
component signal reconstructed based on the decoded basic codec signal by using the
bandwidth extension information.
[0048] The stereo extension processing unit 108 uses stereo extension information transmitted
from the stereo extension data analyzing unit 105 to perform stereo processing on
the monaural audio signal having a frequency band extended by the second bandwidth
extension processing unit 107. More specifically, the stereo extension processing
unit 108 performs, by using the stereo extension information, stereo processing on
the decoded basic codec signal that is a monaural audio signal and the high frequency
component signal generated by the second bandwidth extension processing unit 107,
to generate a decoded basic codec signal and a high frequency component signal for
left (L) channel, and a decoded basic codec signal and a high frequency component
signal for right (R) channel. The stereo extension processing unit 108 performs, for
example, stereo processing based on the PS scheme. Here, the stereo extension processing
unit 108 has to be used in combination with the second bandwidth extension processing
unit 107. In other words, the stereo extension processing unit 108 shares the complex
QMF with the second bandwidth extension processing unit 107.
[0049] The second bandwidth extension processing unit 107 synthesizes the bands of the generated
L-channel signals and the bands of the generated R-channel signals. In the band synthesis
processing of the second bandwidth extension processing unit 107, when an input stream
includes a frame in which stereo extension data is missing, the delay information
of the L channel is copied to the delay information of the R channel. When stereo
extension data is obtained, band synthesis of R channel is performed using the delay
information of the L channel copied for a previous frame, as the delay information
of R channel. The delay information of the L channel is information held over frames
in the band synthesis filter of the band synthesis processing.
[0050] The switching unit 109 determines whether the outputs of the basic codec decoding
unit 103 and the bandwidth extension data analyzing unit 104 are connected to terminal
A or terminal B, based on the number of channels CH and the sampling frequency FS
transmitted form the basic codec analyzing unit 102. The determination procedure will
be specifically described later with reference to FIG. 3. The switching unit 109 transmits
the decoded basic codec signal transmitted form the basic codec decoding unit 103
and the bandwidth extension information transmitted from the bandwidth extension data
analyzing unit 104, to the first bandwidth extension processing unit 106 or the second
bandwidth extension processing unit 107 depending on the determination result.
[0051] As described above, the audio reproducing device 100 according to Embodiment 1 includes
the switching unit 109 which selects one of the two bandwidth extension processing
having different characteristics, based on the analysis result of the basic codec.
The two bandwidth extension processing are: a first processing which requires less
processing amount but produces lower sound quality; and a second processing which
requires larger processing amount but produces higher sound quality.
[0052] Next, operations of the audio reproducing device 100 according to Embodiment 1 are
described.
[0053] FIG. 2 is a flowchart of the operations of the audio reproducing device 100 according
to Embodiment 1. The following operations are performed on a frame basis.
[0054] First, the stream separating unit 101 separates an input stream into basic codec,
bandwidth extension data, and stereo extension data (S101). The basic codec is transmitted
to the basic codec analyzing unit 102. The bandwidth extension data is transmitted
to the bandwidth extension data analyzing unit 104. The stereo extension data is transmitted
to the stereo extension data analyzing unit 105.
[0055] Next, each of separated data is analyzed (S102). More specifically, the basic codec
analyzing unit 102 analyzes the basic codec to generate basic codec analysis information.
The bandwidth extension data analyzing unit 104 analyzes the bandwidth extension data
to generate bandwidth extension information. The stereo extension data analyzing unit
105 analyzes the stereo extension data to generate stereo extension information. In
the case where stereo extension information cannot be generated, such as the case
where stereo extension data is missing, the stereo extension data analyzing unit 105
transmits, to the stereo extension processing unit 108, information indicating that
there is no stereo extension information.
[0056] Next, the basic codec decoding unit 103 decodes the basic codec in accordance with
the basic codec analysis information (S103). The decoded basic codec signal is transmitted
to the switching unit 109.
[0057] The switching unit 109 determines the connection destination of the transmission
path of the decoded basic codec signal based on the basic codec analysis information,
and switches between the terminal A and the terminal B based on the determination
result (S104). For example, the switching unit 109 refers to the channel information
included in the basic codec analysis information, and selects the terminal A when
the number of channels CH of the basic codec is greater than a predetermined threshold.
Alternatively, the switching unit 109 refers to the sampling frequency information
included in the basic codec analysis information, and selects the terminal A when
the sampling frequency FS of the basic codec is equal to or greater than a predetermined
threshold. In other cases, the switching unit 109 selects the terminal B.
[0058] When the terminal A is selected ("A" in S105), the decoded basic codec signal and
the bandwidth extension information are transmitted to the first bandwidth extension
processing unit 106. The first bandwidth extension processing unit 106 extends the
frequency band of the decoded basic codec signal to generate an output audio signal
(S106). The first bandwidth extension processing unit 106 executes processing in accordance
with the LP-SBR scheme or the like which requires less processing amount but generates
an audio signal with lower sound quality.
[0059] When the terminal B is selected ("B" in S105), the decoded basic codec signal and
the bandwidth extension information are transmitted to the second bandwidth extension
processing unit 107. The second bandwidth extension processing unit 107 extends the
frequency band of the decoded basic codec signal to generate an output audio signal
(S107). The second bandwidth extension processing unit 107 executes processing in
accordance with the HQ-SBR scheme or the like which requires larger processing amount
but generates an audio signal with higher sound quality.
[0060] Here, when there is stereo extension information, the stereo extension processing
unit 108 performs stereo processing on the decoded basic codec signal (monaural audio
signal) having the frequency band extended by the second bandwidth extension processing
unit 107.
[0061] Lastly, the audio signal generated by the first bandwidth extension processing unit
106 or the second bandwidth extension processing unit 107 is output (S108).
[0062] In such a manner, it is possible to generate an output audio signal that is close
to an original sound by predicting and reconstructing the high frequency components
of the decoded basic codec signal. Here, processing is selected based on the basic
codec analysis information representing the type of the basic codec. Accordingly,
for example, in the case where processing amount is increased due to multi-channel
or higher sampling frequency, it is possible to prevent an increase in the processing
amount by selecting the first bandwidth extension processing unit 106 which requires
less processing amount.
[0063] Next, reference is made to a specific example of the determination processing of
the connection destination (S104).
[0064] FIG. 3 is a flowchart of a specific example of the operations of the switching unit
109 according to Embodiment 1.
[0065] First, it is determined whether or not the number of channels CH and the sampling
frequency FS of an input basic codec meets a predetermined condition (S201). Here,
it is determined whether the CH is 1 and also the FS is at most equal to 24 kHz.
[0066] In the case where the number of channels CH is 2 or more, or the sampling frequency
FS is higher than 24 kHz (No in S201), the transmission path is connected to the terminal
A, and the input bandwidth extension information and the decoded basic codec signal
are transmitted to the first bandwidth extension processing unit 106 (S202). In the
case where the number of channels CH is 1, and the sampling frequency FS is 24 kHz
or lower (Yes in S201), the transmission path is connected to the terminal B, and
the input bandwidth extension information and the decoded basic codec signal are transmitted
to the second bandwidth extension processing unit 107 (S203).
[0067] In the following, reference is made to the operations of the audio reproducing device
100 according to Embodiment 1 with a specific example of a stream.
[0068] FIG. 4 is a diagram illustrating an example of an input stream which includes stereo
extension data.
[0069] For example, when the audio reproducing device 100 receives a stream as shown in
FIG. 4, the basic codec analyzing unit 102 analyzes the basic codec, and transmits
the number of channels CH (=1) and the sampling frequency information FS (=24 kHz)
to the switching unit 109. Since the condition shown in FIG. 3 is met (Yes in S201),
the switching unit 109 connects the transmission path to the terminal B, and transmits
the decoded basic codec signal and the bandwidth extension information to the second
bandwidth extension processing unit 107 (S203). The second bandwidth extension processing
unit 107 extends the band of the decoded basic codec signal transmitted from the switching
unit 109, by using the bandwidth extension information. Here, at the same time, the
stereo extension processing unit 108 performs stereo extension processing by using
the stereo extension information, and outputs the stereo audio signal.
[0070] As shown in FIG. 4, when the stereo extension data is included, the number of channels
CH is 1. The stereo extension data is information used for performing stereo processing
on a monaural audio signal. When the number of channels is 1, it represents that the
decoded basic codec signal is a monaural audio signal.
[0071] FIG. 5 is a diagram illustrating an example of an input stream which includes no
stereo extension data. When the audio reproducing device 100 receives a stream as
shown in FIG. 5, the basic codec analyzing unit 102 analyzes the basic codec, and
transmits the number of channels CH (= 5.1) and the sampling frequency information
FS (= 24 kHz) to the switching unit 109. Since the condition shown in FIG. 3 is not
met (No in S201), the switching unit 109 connects the transmission path to the terminal
A, and transmits the decoded basic codec signal and the bandwidth extension information
to the first bandwidth extension processing unit 106 (S202). The first bandwidth extension
processing unit 106 extends the band of the decoded basic codec signal transmitted
from the switching unit 109, by using the bandwidth extension information, and outputs
an audio signal.
[0072] Next, reference is made to the case where the audio reproducing device 100 receives
a stream in which stereo extension data is missing in a frame at some point within
the stream, and stereo extension data reappears in the subsequent frames.
[0073] FIG. 6 is a diagram illustrating an example of an input stream including a frame
in which stereo extension data is missing. As shown in FIG. 6, stereo extension data
is included in frames 201 and 203, but stereo extension data is missing in a frame
202. However, there is no change in the basic codec analysis information generated
by the analysis of the basic codec included in the frames 201, 202, and 203. More
specifically, each of the number of channels CH of the basic codec of the frames 201,
202, 203 is 1, and the sampling frequency is 24 kHz.
[0074] Thus, the switching unit 109 determines that each frame meets the condition shown
in FIG. 3 (Yes in S201), and connects the transmission path to the terminal B (S203).
The second bandwidth extension processing unit 107 performs bandwidth extension on
each frame.
[0075] FIG. 7 is a diagram illustrating an example of waveforms of output audio signals.
In FIG. 7, (a) shows a conventional waveform of an output audio signal of the case
where HQ-SBR is switched to LP-SBR at time t0 and LP-SBR is switched to HQ-SBR at
time t2 due to the missing PS data of the frame 202. Conventionally, such switching
of the processing causes abnormal sounds because delay information is not available
during the periods between times t0 and t1 and between times t2 and t3.
[0076] On the other hand, as described above, the audio reproducing device 100 according
to Embodiment 1 determines the first bandwidth extension processing unit 106 or the
second bandwidth extension processing unit 107 for performing processing, independently
of the existence of the stereo extension data within a stream. More specifically,
in the case where the respective frames have the same analysis information of the
basic codec, the same processing unit is used for extending the band of the decoded
basic codec signal of each frame. Thus, discontinuity of the delay information does
not occur, thereby preventing abnormal sounds as shown in FIG. 7(b).
[0077] As described above, in the audio reproducing device 100 according to Embodiment 1,
the second bandwidth extension processing unit 107 performs bandwidth extension on
a stream including stereo extension data (that is, the stream having CH = 1); and
thus, it is possible to perform stereo extension processing without any problems.
Furthermore, the first bandwidth extension processing unit 106 performs bandwidth
extension on a stream that is multi-channel and includes no stereo extension data;
and thus, it is possible to reduce processing amount (computation amount).
[0078] As a result, for example, it is possible to reproduce an audio signal with which
a stream having HE-AACv2 profile properly decoded, without increasing the computation
amount required when reproducing a multi-channel audio signal. Here, it is possible
to reproduce audio signals without abnormal sounds even in the case where no PS data
is input and then PS data is input.
(Embodiment 2)
[0079] An audio reproducing device according to Embodiment 2 includes a buffer for storing
stereo extension information. For example, when there is missing stereo extension
data under the influences of broadcast receiving, stereo processing is performed by
using the stereo extension information stored in the buffer.
[0080] FIG. 8 is a block diagram illustrating an example of a structure of an audio reproducing
device 300 according to Embodiment 2. The audio reproducing device 300 shown in FIG.
8 differs from the audio reproducing device 100 shown in FIG. 1 in that a stereo extension
processing unit 308 is included instead of the stereo extension processing unit 108,
and that a buffer 310 is further included. In the following, only the differences
from Embodiment 1 are described, and the descriptions of the same points are omitted.
[0081] In addition to the processing performed by the stereo extension processing unit 108,
the stereo extension processing unit 308 stores, in the buffer 310, stereo extension
information used for the stereo processing. More specifically, the stereo extension
processing unit 308 performs stereo processing on the decoded basic codec signal having
the frequency band extended by the second bandwidth extension processing unit 107,
by using the stereo extension information transmitted from the stereo extension data
analyzing unit 105. The stereo extension information used here is stored in the buffer
310. For example, each time stereo extension information is obtained, the stereo extension
processing unit 308 updates the stereo extension information stored in the buffer
310.
[0082] Furthermore, in the case where there is no stereo extension information such as the
case where the stereo extension information in a frame is missing, the stereo extension
processing unit 308 reads stereo extension information from the buffer 310, and performs
stereo processing on the decoded basic codec signal (monaural audio signal) of the
frame by using the read stereo extension information.
[0083] The buffer 310 stores the stereo extension information transmitted from the stereo
extension data analyzing unit 105. The buffer 310 not only stores newest stereo extension
information, but also may store a plurality pieces of stereo extension information.
In the case where the buffer 310 stores a plurality pieces of stereo extension information,
the stereo extension processing unit 308, for example, refers to the basic codec extension
information and uses the stereo extension information used for the stereo processing
of a previous decoded basic codec signal similar to the current decoded basic codec
signal.
[0084] As described, the audio reproducing device 300 according to Embodiment 1 includes
the buffer 310 for storing stereo extension information. In the case where there is
no stereo extension information, the audio reproducing device 300 performs stereo
processing on the decoded basic codec signal by using the stereo extension information
stored in the buffer 310.
[0085] Next, of the operations of the audio reproducing device 300 according to Embodiment
2, the operations of the stereo extension processing unit 308 are described. The audio
reproducing device 300 decodes input streams in accordance with the flowcharts shown
in FIG. 2 and FIG. 3. The stereo extension processing unit 308 according to Embodiment
2 performs processing when the second bandwidth extension processing unit 107 performs
bandwidth extension (S107).
[0086] FIG. 9 is a flowchart of the operations of the stereo extension processing unit 308
according to Embodiment 2.
[0087] First, the stereo extension processing unit 308 determines whether or not a stream
includes stereo extension data, that is, whether or not stereo extension information
is transmitted from the stereo extension data analyzing unit 105 (S301). In the case
where the stereo extension information is transmitted (Yes in S301), stereo extension
processing is performed by using the stereo extension information (S302). The stereo
extension processing unit 308 further stores the stereo extension information used
here (S303).
[0088] In the case where the stereo extension information is not transmitted (No in S301),
it is determined whether or not stereo extension processing has been performed for
decoding previous frames (S304). In the case where the stereo extension processing
has been performed (Yes in S304), stereo extension processing is performed by using
the stereo extension information stored when decoding previous frames (S305). In the
case where no stereo extension processing has been performed (No in S304), the processing
ends here.
[0089] In such a manner, the stereo extension processing unit 308 according to Embodiment
2 stores, in the buffer 310, the stereo extension information used for decoding previous
frames. When stereo extension data is missing in a subsequent frame, stereo processing
is performed on the decoded basic codec signal by using the stereo extension information
stored in the buffer 310.
[0090] In the following, reference is made to the operations performed by the audio reproducing
device 300 according to Embodiment 2 when a stream as shown in FIG. 6 is input.
[0091] According to Embodiment 2, in the case where a stream is input which includes a frame
in which stereo extension data is missing at some point within the stream as shown
in FIG. 6, the switching unit 109 connects the transmission path to the terminal B
because all of the frames 201 to 203 has a single channel and sampling frequency FS
that is equal to or higher than 24 kHz. The decoded basic codec signal and bandwidth
extension information are transmitted to the second bandwidth extension processing
unit 107. In such a manner, the bandwidth extension processing on all of the frames
201 to 203 is performed by the second bandwidth extension processing unit 107, which
allows continuity of delay information.
[0092] FIG. 10 is a diagram illustrating an example of waveforms of output stereo audio
signals. Conventionally, stereo extension processing is not performed during a period
of the frame in which stereo extension data is missing(period between t4 and t5).
As shown in (a) in FIG. 10, R-channel audio signal is not output, which gives a listener
a feeling of strangeness. In order to overcome such a strange feeling, and properly
output R-channel audio signal as shown in (b) in FIG. 10, the stereo extension processing
unit 308 performs the following operations.
[0093] Since the frame 201 includes stereo extension data (Yes in S301), the stereo extension
processing unit 308 performs stereo extension processing (S302), and stores the stereo
extension information used here (S303).
[0094] Next, the frame 202 in which stereo extension data is missing is input. Since stereo
extension data is missing in the frame 202 (No in S301) and the stereo extension processing
is performed at the time of decoding of the frame 201 (Yes in S304), the stereo extension
processing unit 308 performs stereo extension processing on the frame 202 by using
the stereo extension information of the frame 201.
[0095] Subsequently, the next frame 203 with stereo extension data is input. Since the frame
203 includes stereo extension data (Yes S301), the stereo extension processing unit
308 performs stereo extension processing on the frame 203 by using the stereo extension
information extracted from the frame 203 (S302).
[0096] In such a manner, the audio reproducing device 300 according to Embodiment 2 is capable
of keeping continuity of an output sound, and also performing stereo extension even
on a frame in which stereo extension data is missing.
[0097] As a result, for example, it is possible to reproduce an audio signal with which
a stream having HE-AACv2 profile properly decoded, without increasing the computation
amount required when reproducing a multi-channel audio signal. Here, it is possible
to reproduce audio signals without abnormal sounds even in the case where no PS data
is input and then PS data is input. Alternatively, in the case where PS data is input
in a preceding frame and then no PS data is input in a subsequent frame, a stereo
audio signal can be reproduced by using the previous PS data.
[0098] FIG. 11 is an external view of an example of an audio reproducing apparatus incorporating
an audio reproducing device according to the present invention. FIG. 11 illustrates
a recording medium 401, an audio reproducing apparatus 402, and earphones 403.
[0099] The recording medium 401 is a recording medium which is capable of recording compressed
audio streams. FIG. 11 shows the recording medium 401 as a medium, such as a secure
digital (SD) card, removable from an apparatus; however, the recording medium 401
may also be implemented as an optical disk, a hard disk drive (HDD) incorporated in
the apparatus, or the like.
[0100] The audio reproducing apparatus 402 is an apparatus which reproduces compressed audio
streams, and includes at least one of the audio reproducing devices 100 and 300 according
to Embodiments 1 and 2.
[0101] The earphones 403 are loud speaker apparatus which output audio signals output from
the audio reproducing apparatus 402 to outside. FIG. 11 illustrates earphones which
are inserted into the ears of a user; however, the earphones may be headphones which
are put on the head of the user, or desktop loudspeakers.
[0102] According to such structure of the audio reproducing apparatus 402, it is possible
to obtain an output audio signal without causing abnormal sounds even when a stream
includes a frame in which stereo extension data is missing.
[0103] The audio reproducing device and the audio reproducing method according to the present
invention have been described based on the embodiments; however, the present invention
is not limited to these embodiments. Those skilled in the art will readily appreciate
that many modifications are possible in the exemplary embodiments without materially
departing from the novel teachings and advantages of this invention. Accordingly,
all such modifications are intended to be included within the scope of this invention.
[0104] For example, the switching unit 109 makes the determination based on the determination
condition that the number of channels is 1 and the sampling frequency is 24 kHz or
lower; however, the determination condition is not limitative. For example, it may
be that the switching unit 109 determines to use the second bandwidth extension processing
unit 107 (connect to the terminal B) only when the number of channels is two or less.
In this case, when a stream having the basic codec with 1 or 2 channels is input,
bandwidth extension is performed by the second bandwidth extension processing unit
107 which generates higher sound quality but requires larger processing amount.
[0105] In the case where a stream of 3 or more channels is input, it is possible to perform
bandwidth extension by using the first bandwidth extension processing unit 106 which
requires less processing amount but generates lower sound quality, to reduce the overall
processing amount. In such a manner, it is possible to provide high-quality sound
even for multi-channel processing as long as the processing capability and memory
resources permit.
[0106] The present invention may be implemented not only as an audio reproducing device
and an audio reproducing method as described above, but also as a program causing
a computer to execute an audio reproducing method according to the embodiments. The
present invention may also be implemented as a recording medium, such as a computer
readable CD-ROM, which stores the program. Furthermore, the present invention may
be implemented as information, data, or a signal indicating the program. Such program,
information, data, and signal may be distributed over a communication network such
as the Internet.
[0107] Furthermore, portion or all of the constituent elements of the audio reproducing
device according to the present invention may be structured as a single system LSI.
The system LSI is a super multi-functional LSI manufactured by integrating a plurality
of structural units onto a single chip. Specifically, it is a computer system including
a microprocessor, a ROM, a RAM, and the like.
[Industrial Applicability]
[0108] The present invention prevents a significant increase in processing amount, and also
prevents occurrence of abnormal sounds. The present invention may be used for, for
example, an audio reproducing device. For example, the present invention may be used
for an audio reproducing apparatus, such as a portable music player, which has limited
processor capability and limited memory resources.
[Reference Signs List]
[0109]
100, 300 |
Audio reproducing device |
101 |
Stream separating unit |
102 |
Basic codec analyzing unit |
103 |
Basic codec decoding unit |
104 |
Bandwidth extension data analyzing unit |
105 |
stereo extension data analyzing unit |
106 |
First bandwidth extension processing unit |
107 |
Second bandwidth extension processing unit |
108, 308 |
Stereo extension processing unit |
109 |
Switching unit |
201, 202, 203 |
Frame |
310 |
Buffer |
401 |
Recording medium |
402 |
Audio reproducing apparatus |
403 |
Earphones |
1. An audio reproducing device which reproduces a stream including a basic codec that
is an encoded audio signal, said audio reproducing device comprising:
a stream separating unit configured to separate, on a frame basis, the stream into
the basic codec and bandwidth extension information that is used for extending a band
of the basic codec;
a basic codec information analyzing unit configured to analyze the basic codec separated
by said stream separating unit, to generate analysis information indicating a type
of the basic codec;
a basic codec decoding unit configured to decode the basic codec in accordance with
the analysis information generated by said basic codec information analyzing unit,
to generate a decoded basic codec signal;
a first bandwidth extension processing unit configured to execute first processing
which extends, by using the bandwidth extension information, a frequency band of the
decoded basic codec signal generated by said basic codec decoding unit;
a second bandwidth extension processing unit configured to execute second processing
which extends, by using the bandwidth extension information, the frequency band of
the decoded basic codec signal generated by said basic codec decoding unit, the second
processing being executed with an accuracy higher than an accuracy of the first processing;
and
a switching unit configured to switch between said first bandwidth extension processing
unit and said second bandwidth extension processing unit based on the analysis information,
wherein said first bandwidth extension processing unit is configured to execute the
first processing by performing real number operations, and
said second bandwidth extension processing unit is configured to execute the second
processing by performing complex arithmetic.
2. The audio reproducing device according to Claim 1,
wherein said stream separating unit is configured to separate, on the frame basis,
the stream into the basic codec, the bandwidth extension information, and stereo extension
information that is used for performing stereo processing on the basic codec,
said audio reproducing device further comprises:
a stereo extension processing unit configured to perform, by using the stereo extension
information, stereo processing on the decoded basic codec signal having the frequency
band extended by said second bandwidth extension processing unit.
3. The audio reproducing device according to Claim 2,
wherein said basic codec information analyzing unit is configured to analyze the basic
codec separated by said stream separating unit, to generate analysis information including
at least one of channel information and sampling frequency information, the channel
information indicating the number of channels of the basic codec, the sampling frequency
information indicating a sampling frequency of the basic codec, and
said switching unit is configured to determine at least one of (i) whether the number
of channels indicated by the channel information is greater than a predetermined first
threshold and (ii) whether the sampling frequency indicated by the sampling frequency
information is greater than a predetermined second threshold, and select said first
bandwidth extension processing unit when at least one of the following is determined:
(i) the number of channels is greater than the predetermined first threshold and (ii)
the sampling frequency is greater than the predetermined second threshold.
4. The audio reproducing device according to Claim 2 or Claim 3, further comprising
a buffer which stores stereo extension information of a first frame,
wherein said stereo extension processing unit is configured to perform stereo processing
on a decoded basic codec signal of a second frame by using the stereo extension information
stored in said buffer, the second frame being a frame after the first frame and being
a frame in which the stereo extension information is missing.
5. The audio reproducing device according to Claim 2 or Claim 3, wherein said second
bandwidth extension processing unit is configured to generate a high-frequency component
signal from the decoded basic codec signal by using the bandwidth extension information,
said stereo extension processing unit is configured to perform, by using the stereo
extension information, stereo processing on the decoded basic codec signal and the
high-frequency component signal generated by said second bandwidth extension processing
unit, to generate a decoded basic codec signal and a high-frequency component signal
for a first channel and a decoded basic codec signal and a high-frequency component
signal for a second channel, and
said second bandwidth extension processing unit further includes a band synthesis
filter for synthesizing the high-frequency component signal and the decoded basic
codec signal that have been generated, and is configured to synthesize bands of the
second channel by using delay information that is stored in said band synthesis filter
of the first channel, as delay information stored in said band synthesis filter of
the second channel, when the stereo extension information is missing.
6. The audio reproducing device according to one of Claims 2 to 5, wherein the basic
codec is an audio signal encoded according to Advanced Audio Coding (AAC) scheme,
the bandwidth extension information is Spectral Band Replication (SBR) information
generated according to SBR scheme,
the stereo extension information is Parametric Stereo (PS) information generated according
to PS scheme,
said first bandwidth extension processing unit is configured to extend a frequency
band of the decoded basic codec signal according to Low Power-SBR (LP-SBR) scheme,
and
said second bandwidth extension processing unit is configured to extend a frequency
band of the decoded basic codec signal according to High Quality-SBR (HQ-SBR) scheme.
7. An audio reproducing apparatus comprising the audio reproducing device according to
one of Claims 1 to 6.
8. An audio reproducing method which reproducing a stream including a basic codec that
is an encoded audio signal, said audio reproducing method comprising:
separating, on a frame basis, the stream into the basic codec and bandwidth extension
information that is used for extending a band of the basic codec;
analyzing the basic codec separated in said separating, to generate analysis information
indicating a type of the basic codec;
decoding the basic codec in accordance with the analysis information generated in
said analyzing, to generate a decoded basic codec signal;
switching between first processing and second processing based on the analysis information,
the second processing being executed with an accuracy higher than an accuracy of the
first processing;
executing, when the first processing is selected in said switching, the first processing
which extends, by using the bandwidth extension information, a frequency band of the
decoded basic codec signal generated in said decoding; and
executing, when the second processing is selected in said switching, the second processing
which extends, by using the bandwidth extension information, the frequency band of
the decoded basic codec signal generated in said decoding,
wherein real number operations are performed in the first processing, and
complex arithmetic is performed in the second processing.
9. An integrated circuit which reproduces a stream including a basic codec that is an
encoded audio signal, said integrated circuit comprising:
a stream separating unit configured to separate, on a frame basis, the stream into
the basic codec and bandwidth extension information that is used for extending a band
of the basic codec;
a basic codec information analyzing unit configured to analyze the basic codec separated
by said stream separating unit, to generate analysis information indicating a type
of the basic codec;
a basic codec decoding unit configured to decode the basic codec in accordance with
the analysis information generated by said basic codec information analyzing unit,
to generate a decoded basic codec signal;
a first bandwidth extension processing unit configured to execute first processing
which extends, by using the bandwidth extension information, a frequency band of the
decoded basic codec signal generated by said basic codec decoding unit;
a second bandwidth extension processing unit configured to execute second processing
which extends, by using the bandwidth extension information, the frequency band of
the decoded basic codec signal generated by said basic codec decoding unit, the second
processing being executed with an accuracy higher than an accuracy of the first processing;
and
a switching unit configured to switch between said first bandwidth extension processing
unit and said second bandwidth extension processing unit based on the analysis information,
wherein said first bandwidth extension processing unit is configured to execute the
first processing by performing real number operations, and
said second bandwidth extension processing unit is configured to execute the second
processing by performing complex arithmetic.
10. A program causing a computer to execute an audio reproducing method for reproducing
a stream including a basic codec that is an encoded audio signal, said program comprising:
separating, on a frame basis, the stream into the basic codec and bandwidth extension
information that is used for extending a band of the basic codec;
analyzing the basic codec separated in said separating, to generate analysis information
indicating a type of the basic codec;
decoding the basic codec in accordance with the analysis information generated in
said analyzing, to generate a decoded basic codec signal;
switching between first processing and second processing based on the analysis information,
the second processing being executed with an accuracy higher than an accuracy of the
first processing;
executing, when the first processing is selected in said switching, the first processing
which extends, by using the bandwidth extension information, a frequency band of the
decoded basic codec signal generated in said decoding; and
executing, when the second processing is selected in said switching, the second processing
which extends, by using the bandwidth extension information, the frequency band of
the decoded basic codec signal generated in said decoding,
wherein real number operations are performed in the first processing, and
complex arithmetic is performed in the second processing.