[0001] The present invention generally relates to a frequency analysis of an acoustic or
speech waveform in an acoustic or speech recognition, an acoustic or speech synthesis,
an automatic dictation of music score, a so-called "karaoke" (i.e. music accompaniment
playing) point scoring, a machine diagnosis and the like by use of a computer, and
more particularly to a pitch detection apparatus for and a pitch detection method
of detecting a pitch of a fundamental wave of an acoustic wave.
[0002] As a practical method of a frequency analysis for an acoustic or speech wave and
a vibration wave, there is known a method by means of the FFT (Fast Fourier Transform).
However, according to this method by means of the FFT, only a result of the spectrum
analysis as for a harmonic of the waveform corresponding to an observation interval
(time length) L can be accurately obtained. Thus, in order to extract the pitch (cycle)
of the fundamental wave of the acoustic wave, which has a relatively fine frequency
distribution, this method by means of the FFT is not enough in its resolution or accuracy
with respect to the frequency.
[0003] Accordingly, as a technique for detecting the pitch of the acoustic wave, there is
a method of determining the pitch, by filtering the acoustic wave by a pitch filter
and detecting the zero-cross timing of the filtered signal (i.e. the timing when the
filtered signal crosses the amplitude 0 level). As a central frequency of this pitch
filter, the average pitch, which is estimated for each frame frequency sectioned by
a window function having a predetermined width, is used.
[0004] In this way, according to the above explained method of detecting the pitch, on condition
that the central frequency of the pitch filter is appropriately estimated, the pitch
detection of the acoustic wave can be performed with relatively high accuracy.
[0005] In the technical field of the pitch detection apparatus used for the speech recognition,
the acoustic synthesis, the automatic dictation of music score, the karaoke point
scoring, the machine diagnosis and the like, it is demanded to improve the detection
accuracy and to simplify the calculating process and device.
[0006] However, according to the above explained method of detecting the pitch, since the
pitch detection accuracy depends upon the probability of the estimation for the central
frequency of the pitch filter, a complicated process such as a Cepstrum method is
necessary for the correct estimation. As a result, the calculating process and device
for the pitch detection apparatus become complicated, which is a problem in practice.
[0007] It is therefore an obiect of the present invention to provide a pitch detection apparatus
and a pitch detection method, which can improve the detection accuracy and which can
simplify the calculating process and the apparatus construction.
[0008] The above object of the present invention can be achieved by a pitch detection apparatus,
to which an acoustic waveform is inputted, for detecting a pitch of a fundamental
wave of the acoustic waveform. The pitch detection apparatus is provided with: an
orthogonal function component output device for taking, out of orthogonal function
components for every cycle which compose the acoustic waveform, a plurality of orthogonal
function components one after another in an order from one orthogonal function component
having a greater energy contribution for the acoustic waveform than other orthogonal
function components, and outputting the taken out orthogonal function components;
and a pitch extract device for extracting as a pitch one of the outputted orthogonal
function components, on the basis of a mutual relationship between cycles of the outputted
orthogonal function components.
[0009] According to the present invention, when the acoustic waveform is inputted, out of
the orthogonal function components for every cycle which compose the acoustic waveform,
a plurality of orthogonal function components are taken out one after another, in
the order from the greater energy contribution for the acoustic waveform, and are
outputted by the orthogonal function component output device. Therefore, by taking
out the orthogonal function components in an appropriate number in accordance with
the characteristic of the acoustic waveform whose pitch is to be detected, the fundamental
wave and its harmonics are included in the outputted orthogonal function components.
Here, the fundamental wave and its harmonics have certain relationships in their cycles
(or frequencies) with each other. Namely, the cycle of the fundamental wave is integer
times of that of its harmonic, and the cycle of one harmonic has an integer ratio
relationship with that of another harmonic of the same fundamental wave. Consequently,
one of the outputted orthogonal function components can be extracted as a pitch by
the pitch extract device, on the basis of such a mutual relationship between the cycles
of the outputted orthogonal function components. In this way, by use of a relatively
simple calculation process and a relatively simple apparatus construction, the detection
accuracy in the pitch detection can be improved. As a result, the acoustic or speech
recognition apparatus, the acoustic or speech synthesis apparatus, the apparatus for
automatically dictating a music score, the karaoke point scoring apparatus, the machine
diagnosis apparatus and the like, each of which has a high accuracy, can be produced
with a relatively low cost by use of the pitch detection apparatus of the present
invention.
[0010] In one aspect of the present invention, the orthogonal function component output
device stops taking out and outputting when a ratio of an energy of a sum signal,
which is obtained by re-synthesizing the taken out orthogonal function components,
with respect to an energy of the acoustic waveform exceeds a predetermined value.
[0011] According to this aspect, when the ratio of the energy of the sum signal to the energy
of the acoustic waveform exceeds a predetermined value e.g. 99%, the orthogonal function
component output device stops taking out and outputting. As a result, an appropriate
number of the orthogonal function components, which include the fundamental wave and
its harmonics and which are enough for the pitch extract device to extract the pitch
correctly on the basis of the mutual relationship between the cycles, can be automatically
outputted from the orthogonal function component output device. Thus, it is possible
to efficiently prevent the orthogonal function component output device from performing
the unnecessary and surplus taking out operations. This feature is convenient in the
practical cases.
[0012] In this aspect of the present invention, it is preferable that the orthogonal function
component output device stops taking out and outputting when finishing taking out
and outputting a predetermined number of orthogonal function components before the
ratio exceeds the predetermined value.
[0013] In this preferable case, even before the ratio exceeds the predetermined value, when
the orthogonal function component output device finishes taking out and outputting
a predetermined number of orthogonal function components e.g. 10 components, the orthogonal
function component output device stops taking out and outputting. Thus, it is possible
to efficiently prevent the orthogonal function component output device from performing
the unnecessary and surplus taking out operations, in such a case where the energy
ratio would not exceed the predetermined value e. g. 99 % due to the characteristic
of the acoustic waveform even if the taking out operation is performed by a large
number of times. This feature is very convenient in the practical cases.
[0014] In another aspect of the present invention, the pitch extract device extracts as
the pitch one of the outputted orthogonal function components, which has the longest
cycle.
[0015] According to this aspect, one of the outputted orthogonal function components, which
has the longest cycle, is extracted as the pitch, by the pitch extract device. Here,
as for the relationship between the cycles of the fundamental wave and its harmonics
which are included in the components taken out by the orthogonal function component
output device, the cycle of the fundamental wave is integer times of that of its harmonic.
Consequently, the component of the fundamental wave can be correctly extracted in
most practical cases as having the longest cycle among the taken out components. In
this way, the pitch can be detected by use of a relatively simple calculation process
and a relatively simple apparatus construction, according to the present invention.
[0016] In another aspect of the present invention, the pitch extract device discriminates
some of the outputted orthogonal function components, each of which is an odd order
harmonic of another orthogonal function component, and extracts as the pitch one of
the discriminated orthogonal function components, which has the longest cycle.
[0017] According to this aspect, at first, some of the outputted orthogonal function components,
each of which is the odd order harmonic of another orthogonal function component,
are discriminated by the pitch extract device. Here, as for the relationship between
the cycles of the fundamental wave and its harmonics which are included in the components
taken out by the orthogonal function component output device, the cycle of the fundamental
wave is integer times of that of its harmonic. Further, the energy of a harmonic of
a sub-harmonic (which has a double cycle of the fundamental wave) of the fundamental
wave, which is occasionally included in the components taken out by the orthogonal
function component output device, is much smaller than the energy of the harmonic
of the fundamental wave. Thus, the harmonics of the sub-harmonic are hardly or not
at all included in the components taken out by the appropriate number by the orthogonal
function component output device. Consequently, even if there is included the sub-harmonic,
which cycle is longer than that of the fundamental wave, in the components outputted
from the orthogonal function component output device, the odd order harmonic relationship
can be achieved only by the fundamental wave and its harmonics. Finally, the component
of the fundamental wave can be correctly extracted as one of the discriminated orthogonal
function components, which has the longest cycle, by the pitch extract device. Consequently,
it is possible to perform the pitch detection with a high accuracy according to the
present invention.
[0018] In another aspect of the present invention, the pitch extract device discriminates
a plurality of groups of the outputted orthogonal function components, which cycles
have an integer ratio relationship with each other in each of the discriminated groups,
selects one of the discriminated groups, which includes the largest number of the
orthogonal function components, and estimates as the pitch one of the orthogonal function
components having a cycle, which is integer times of that of each of the orthogonal
function components in the selected group.
[0019] According to this aspect, at first, a plurality of groups of the outputted orthogonal
function components, which cycles have an integer ratio relationship with each other
in each of the discriminated groups, are discriminated by the pitch extract device.
Then, one of the discriminated groups, which includes the largest number of the orthogonal
function components, is selected by the pitch extract device. Here, even in a case
where there does not exist the fundamental wave or the double tone etc., in the components
outputted by the orthogonal function component output device, due to the frequency
band limitation and the like at a time of detecting the acoustic wave, there still
exists the integer ratio relationship between one harmonic and another harmonic of
the same fundamental wave, since the cycle of the fundamental wave is integer times
of the cycle of its harmonic respectively. Further, the energy of the harmonic of
the sub-harmonic of the fundamental wave, which is occasionally included in the components
taken out by the orthogonal function component output device, is much smaller than
the energy of the harmonic of the fundamental wave. Thus, the number of the harmonics
of the sub-harmonic included in the components taken out by the appropriate number
is supposed to be less than the number of the harmonics of the fundamental wave. Consequently,
even if the fundamental wave and the double tone etc. are not taken out and outputted
by the orthogonal function component output device, the group (set) of the harmonics
of the fundamental wave can be identified by discriminating the group of the components,
which cycles have the integer ratio relationship, and by selecting the group which
includes the largest number of the components. Finally, one of the orthogonal function
components having a cycle, which is integer times of that of each of the orthogonal
function components in the selected group, can be estimated as the pitch. In this
way, the component of the fundamental wave can be estimated as the pitch which does
not exists in the components taken out by the orthogonal function component output
device. Consequently, it is possible to perform the pitch detection with a high accuracy
according to the present invention.
[0020] In another aspect of the present invention, the orthogonal function component output
device: firstly, outputs a first orthogonal function component which has the greatest
energy contribution for the acoustic waveform and obtains a first residual waveform
by subtracting the first orthogonal function component from the acoustic waveform;
secondly, outputs a second orthogonal function component which has the greatest energy
contribution for the first residual waveform and obtains a second residual waveform
by subtracting the second orthogonal function component from the first residual waveform;
and thirdly, outputs an n
th (n: natural number more than 2) orthogonal function component which has the greatest
energy contribution for the (n-1)
th residual waveform and obtains an n
th residual waveform by subtracting the n
th orthogonal function component from the (n-1)
th residual waveform.
[0021] According to this aspect, at first, the first orthogonal function component which
has the greatest energy contribution for the acoustic waveform is outputted. At this
time, the first residual waveform is obtained by subtracting the first orthogonal
function component from the acoustic waveform. Nextly, the second orthogonal function
component which has the greatest energy contribution for the first residual waveform
is outputted. At this time, the second residual waveform is obtained by subtracting
the second orthogonal function component from the first residual waveform. Then, in
the same manner for the first and second residual waveform, the n
th orthogonal function component which has the greatest energy contribution for the
(n-1)
th residual waveform is outputted. At this time, the n
th residual waveform is obtained by subtracting the n
th orthogonal function component from the (n-1)
th residual waveform. Therefore, the orthogonal function component output device can
efficiently take out the orthogonal function components one after another, in the
order from the greater energy contribution for the acoustic waveform, and can efficiently
output the taken out orthogonal function components. Consequently, the pitch detection
can be performed very efficiently as a whole according to the present invention.
[0022] In another aspect of the present invention, the pitch detection apparatus is further
provided with an electroacoustic transducer for converting the acoustic waveform to
an electric signal and outputting the electric signal to said orthogonal function
component output means, said orthogonal function component output means taking out
the orthogonal function components based on the electric signal outputted from said
electroacoustic transducer.
[0023] According to this aspect, the acoustic waveform is inputted to the electroacoustic
transducer, is converted to the electric signal, and is outputted to the orthogonal
function component output device. Then, the orthogonal function components based on
the electric signal is taken out by the orthogonal function component output device.
[0024] The above object of the present invention can be also achieved by a pitch detection
method of detecting a pitch of a fundamental wave of an acoustic waveform. The pitch
detection method is provided with the steps of: taking, out of orthogonal function
components for every cycle which compose the acoustic waveform, a plurality of orthogonal
function components one after another in an order from one orthogonal function component
having a greater energy contribution for the acoustic waveform than other orthogonal
function components, and outputting the taken out orthogonal function components;
and extracting as a pitch one of the outputted orthogonal function components, on
the basis of a mutual relationship between cycles of the outputted orthogonal function
components.
[0025] According to the pitch detection method of the present invention, the action and
effect same as those obtained by the aforementioned pitch detection apparatus of the
present invention can be also obtained in the same manner. Thus, by use of a relatively
simple calculation process and a relatively simple apparatus construction, the detection
accuracy in the pitch detection can be improved according to the present invention.
[0026] The nature, utility, and further features of this invention will be more clearly
apparent from the following detailed description with respect to preferred embodiments
of the invention when read in conjunction with the accompanying drawings briefly described
below.
FIG. 1 is a block diagram of a pitch detection apparatus as an embodiment of the present
invention;
FIG. 2 is a diagram for explaining a frequency component analyzed by the GHA in the
embodiment of FIG. 1;
FIG. 3 is a flow chart showing an operation of the first embodiment;
FIG. 4 is a table 1 showing a concrete example of a pitch detection by the first embodiment;
FIG. 5 is a flow chart showing an operation of the second embodiment;
FIG. 6 is a flow chart showing an operation of the third embodiment;
FIG. 7 is a table 2 for explaining the operation of the third embodiment;
FIG. 8 is a table 3 for explaining the operation of the third embodiment;
FIG. 9 is a table 4 showing a concrete example of a pitch detection by the third embodiment;
FIG. 10 is a flow chart showing an operation of the fourth embodiment;
FIG. 11 is a table 5 for explaining the operation of the fourth embodiment;
FIG. 12 is a table 6 for explaining the operation of the fourth embodiment;
FIG. 13 is a table 7 for explaining the operation of the fourth embodiment;
FIG. 14 is a table 8 showing a concrete example of a pitch detection by the fourth
embodiment; and
FIG. 15 is a chart showing waveforms related to the pitch detection by the third embodiment.
[0027] Referring to the accompanying drawings, embodiments of the present invention will
be now explained.
(1) First Embodiment
[0028] FIG. 1 shows a pitch detection apparatus as a first embodiment of the present invention.
[0029] In FIG. 1, a pitch detection apparatus 1 is provided with an electroacoustic transducer
2, an f (frequency) spectrum analyzing unit 3, a pitch extracting unit 4 and a memory
unit 5. The pitch detection apparatus 1 is adapted to be installed in an acoustic
or speech recognition apparatus, an acoustic or speech synthesis apparatus, an apparatus
for automatically dictating a music score, a karaoke point scoring apparatus, a machine
diagnosis apparatus and the like, as an apparatus for detecting the pitch of an acoustic
or voice (e. g. speech) waveform. The electroacoustic transducer 2 is constructed
to convert the acoustic wave inputted thereto to an electric signal, and is equipped
with a microphone, for example.
[0030] The f spectrum analyzing unit 3 is constructed to perform a frequency spectrum analysis
by means of the GHA (General Harmonic Analysis) with respect to an acoustic waveform
indicated by the electric signal from the electroacoustic transducer 2. Here, the
operation of the f spectrum analyzing unit 3 is more concretely explained.
[0031] The f spectrum analyzing unit 3 performs operations [I] to [III] as following.
[I] At first, it calculates a Fourier coefficient S(f), which is a Sine coefficient,
and a Fourier coefficient C(f), which is a Cosine coefficient, by use of following
expressions (1) and (2) respectively, from a continuous signal x0(t), which is observed in an observation interval L having a predetermined time width.


wherein
- T :
- cycle of each frequency component,
- f(s):
- frequency
- T =
- 1/f(s), nT ≦ L
- n :
- integer number, indicating how many cycles are included in the observation interval
L
- nT :
- integer number
- L :
- observation interval (time length)
As for the width of the observation interval L, it is experimentally found by the
inventors that, for example, a value of 10 to 20 ms is appropriate in practical cases
in order to extract the pitch of the acoustic or speech wave. Therefore, for the pitch
extraction of the acoustic wave, assuming that the sampling frequency is 48, 000 Hz,
for example, the width (time length) of the observation interval L is preferably set
to be equivalent to 512 samples.
More concretely, as shown in FIG. 2, assuming that the width (time length) of the
observation interval L is equivalent to 512 samples, for example, with respect to
n = 1, the Fourier coefficients are respectively calculated as for 256 different values
of T i.e. T = 512 (= L), 511 (= L - 1 × 1), 510 (= L - 1 × 2), ..., 258 (= L/2 + 1
× 2), 257 (=L/2 + 1 × 1). With respect to n = 2, the Fourier coefficients are respectively
calculated as for 256 different values of T i.e. T = 256 (= L/2), 255.5 (= L/2 - 0.5
× 1), 255 (= L/2 - 0.5 × 2), ..., 129 (= L/4 + 0.5 × 2), 128.5 (=L/4 + 0.5 × 1). In
the same manner, the Fourier coefficients are calculated as for each integer number
n and each width of the observation interval L which can satisfy the condition of
the sampling number 512.
In this way, according to the present embodiment, the result of the Fourier analysis
for the fine frequencies which are very fine with respect to a fundamental frequency.
This feature of the present embodiment is much more advantageous for improving the
accuracy of the pitch detection, than the aforementioned technique by means of the
FFT, in which only the result as for the rough frequencies i.e. 1, 2, 3, ..., 256
(L/2) times of the fundamental frequency can be obtained.
[II] Nextly, a frequency f1, at which an energy E(f) expressed by a following expression
(3) of a residual difference ε (t, f) expressed by a following expression (4) is minimized
in the observation interval L, and the Fourier coefficients S(f1) and C(f1) at this
time, are obtained on the basis of the calculated coefficients.


[III] Nextly, a signal x1(t) expressed by a following expression (5) indicating a residual component in which
the frequency component of the frequency f1 (at which the energy is minimized) and
which is obtained in the above [II], is removed from the original signal x0(t), is treated as a new original signal, and the above described calculations in
the process [I] to the process [III] are repeated.

[0032] As described above, the f spectrum analyzing unit 3 performs the f spectrum analysis
by means of the GHA with respect to the waveforms indicated by the electric signal
from the electroacoustic transducer 2.
[0033] In FIG. 1 again, the pitch extracting unit 4 is constructed to extract a pitch in
such a manner as explained later with reference to FIG. 3, from the N frequency components
(N: natural number) obtained by the f spectrum analyzing unit 3 in the above described
manner.
[0034] The memory unit 5 may be equipped with an IC (Integrated Circuit) memory, a magneto
disk, an optical disc and the like, and is constructed to store the pitch extracted
by the pitch extracting unit 4 for each observation interval. By connecting those
pitches extracted for each observation interval over the whole interval, the timely
change can be described.
[0035] The pitch detection apparatus 1 may be constructed such that the extracted pitch
is outputted as a Sine wave from a speaker as the occasion demands.
[0036] Here, the pitch extracting operation in the first embodiment will be explained with
reference to a flow chart of FIG. 3.
[0037] In FIG. 3, at first, the result of the GHA analysis is read out from the f spectrum
analyzing unit 3 (step S1). Then, k components are taken out in the order from the
component having the larger amplitude to the component having the smaller amplitude
(step S2). The value k is predetermined as an appropriate value which can be experimentally
determined on the basis of the properties of the objective acoustic wave which is
an object for the pitch detection. Then, one of the taken out k components, which
has the longest cycle, is determined as the pitch (step S3). Here, by virtue of the
basic properties of the fundamental wave and its harmonics, cases where the frequency
component, which has the longest cycle among the frequency components read out by
the GHA, is coincident with the fundamental wave of the pertinent acoustic wave, are
dominant in practical cases. Thus, the pitch extracting operation of the first embodiment
is based on such a basic properties of the fundamental wave and its harmonics.
[0038] An example of the pitch detection according to the first embodiment, in which the
pitch extraction is executed with respect to a speech (voice) signal of a male vocal,
is explained. In this example, the sampling frequency is 48,000 Hz, and the observation
interval for the GHA is 1024 points (equivalent to 21.3 ms). The Sine waves extracted
by the GHA are indicated in a table 1 of FIG. 4. In the table 1, the component number
"n/6" (n = 1, 2, 3) in a left hand column indicates that its power is the nth largest
among the extracted 6 components. In this example, only 6 components, which number
has been determined in advance by the user, are taken out from the result analyzed
by the GHA in the order from the larger power. If the pitch is extracted form this
analysis result obtained by the GHA, the component indicated by the "component number
1/6" (i.e. the signal component having the frequency at the vicinity of 212 Hz) which
is the Sine wave having the low frequency (i.e. the long cycle) indicated by an arrow
in the figure, is extracted as the pitch. In this way, the pitch of the fundamental
tone can be correctly detected by use of a rather simple hardware and software construction
according to the first embodiment.
(2) Second Embodiment:
[0039] The construction of a second embodiment of the present invention is similar to that
of the above described first embodiment of FIG. 1 except that the pitch extracting
unit of the second embodiment performs the pitch extracting operation as following.
[0040] The pitch extracting operation of the pitch extracting unit in the second embodiment
is explained with reference to a flow chart of FIG. 5. In FIG. 5, the step same as
that in FIG. 3 carries the same reference numeral.
[0041] In FIG. 5, a step S20, which is a step for reading out the GHA result from the f
spectrum analyzing unit is performed as following.
[0042] Namely, although the constant number components (i. e. k components) are simply read
and taken out in the first embodiment (the step S2 in FIG. 3), the reading out operation
is repeatedly performed until the ratio of the sum of the energy of the taken out
components versus the energy of the input signal reaches a predetermined value e.g.
99 % and then is stopped in the second embodiment. However, in order not to encounter
such an undesirable case where the ratio does not reach the predetermined value even
if the reading-out operation is repeatedly performed again and again, if the ratio
does not reach the predetermined value after the reading out operation has been performed
by a predetermined times, the reading-out operation is stopped at that time (step
S20). In case of performing the GHA and the pitch extraction at real time in this
way, it is possible to prevent the f spectrum analyzing unit from performing the unnecessary
GHA since the GHA is stopped when the ratio reaches the predetermined value, which
is advantageous.
[0043] More concretely, in the step S20, at first, the sum of the energy Eo of the discrete
time waveform is obtained in a certain time period, which is the object of process
(step S21). As the initialization, a count value i for counting the number of the
taken out components is set to "0" (step S22). Nextly, the count value i is compared
with a standard number i
s (step S23). The standard number i
s indicates how many number of components are to be taken out at the maximum, and is
set in accordance with the properties of the objective acoustic wave. Here, if it
is not i > i
s (NO), the sum of the energy Ei of the 1
st Sine wave to the i
th Sine wave obtained by the GHA is calculated (step S24). Further, the energy ratio
Ei/Eo is compared with a standard ratio Es (step S25). The standard ratio Es indicates
the ratio at the time when the taking out operation is intended to be stopped (step
S25). For example, the standard ratio Es is set to 99 %. Here, if it is not Ei/Eo
> Es (NO), the count value i is incremented by one (step S26), and the flow returns
to the above mentioned step S23. At the step S23, if it is i > i
s (YES), the flow branches to a step S27, where the present count value i is set to
be the taking-out number N for the components (step S27). If it is Ei/Eo > Es at the
step S25 (YES), the flow branches to the step S27, and the present count value i is
set to be the taking-out number N for the components (step S27). Next, at a step S28,
numbers are assigned to the Sine waves obtained by the GHA in the order from the wave
having the longer cycle to the wave having the shorter cycle. More concretely, numbers
T
1, T
2, ..., T
N are respectively assigned to the cycles of the Sine waves obtained by the GHA in
this order (step S28).
[0044] As described above, the reading out operation of the GHA result in the step S20 as
a whole is completed.
[0045] Nextly, the flow proceeds to the step S3, and, in the same manner as the first embodiment
of FIG. 3, one of the read out components which cycle is the longest is determined
as the pitch, and the process is ended (step S3).
[0046] According to the second embodiment, at the time of reading and taking out the GHA
result, the reading and taking out operation is repeatedly performed until the energy
ratio reaches to the predetermined value or by the predetermined number of times,
and if the energy ratio does not reach the predetermined value after performing the
reading and taking out operation by the predetermined number of times, the reading
and taking out operation is stopped at that time. Accordingly, it is possible to keep
the accuracy of the pitch detection high, while it is possible to prevent an undesirable
practical case where the unnecessary reading and taking out operation is repeatedly
performed.
(3) Third Embodiment:
[0047] The construction of a third embodiment of the present invention is similar to that
of the above described first embodiment of FIG. 1 except that the pitch extracting
unit of the third embodiment performs the pitch extracting operation as following.
[0048] The pitch extracting operation of the pitch extracting unit in the third embodiment
is explained with reference to a flow chart of FIG. 6. In FIG. 6, the step same as
that in FIG. 5 carries the same reference numeral.
[0049] In FIG. 6, until the step S20, the operation of the third embodiment is the same
as that of the second embodiment of FIG. 5. If this step S20 is replaced by the steps
S1 and S2 of the first embodiment of FIG. 1, the third embodiment still functions
efficiently as clearly understood from the following explanations.
[0050] Although one of the taken out components, which has the longest cycle, is simply
determined as the pitch according to the first and second embodiments (the step S3
in FIG. 3 and FIG. 5), the third embodiment is characterized in that the components
in a harmonic relationship are firstly judged, and the fundamental wave is fund from
the characteristic in the arrangement of the judged components in the harmonic relationship,
in the steps at and after the step S31.
[0051] More concretely, in FIG. 6, as the initialization, a count value j for judging the
harmonic relationship one after another is set to "1" (step S31). Then, the count
value i is compared with the value N obtained at the step S20 (step S32). Here, if
it is not i ≧ N (NO), a count value k for judging the harmonic relationship one after
another is set to "j + 1" (step S33), and it is judged whether or not k > N (step
S34). Here, if it is k > N (YES), the count value j is incremented by one (step S35),
and the flow returns to the step S32. On the other hand, if it is not k > N at the
step S34 (NO), an integer closest to a ratio of the cycle Tj to the cycle Tk obtained
in the step S20 i. e., the ratio "Tj/Tk" is set as an integer Ijk (step S36). Tj (Tk)
indicates a cycle of the j
th (k
th) position in the arrangement of the cycles arranged by the step S28 (in step S20).
Then, it is judged whether or not the absolute value of "Ijk - Tj/Tk" is less than
a predetermined micro standard value ε (step S37). Here, if it is not less (NO), the
count value k is incremented by one (step S38), and the flow returns to the step S34.
On the other hand, if it is less (step S37: YES), it is further judged whether or
not the integer Ijk is an odd number (step S39). Here, if it is not the odd number
(NO), the count value k is incremented by one (step S38), and the flow returns to
the step S34. On the other hand, if it is the odd number (step S39: YES), the cycle
Tj is determined as the pitch (step S40), and the process is ended. On the other hand,
if it is j ≧ N at the step S32 (YES), it is failed to find the pitch due to the measurement
error or abnormal condition (step S41), and the process is ended. At this step S41,
it is preferable to display or output a message informing that it is failed to find
the pitch, so as to inform the user of the detection accuracy of the pitch detection.
Further, in place of or in addition to such a display or output of the message, the
longest cycle T1 may be determined as the pitch at the step S41.
[0052] In a table 2 of FIG. 7, an example of the arrangement of the cycles T1 to T5 of the
components read out by the GHA, which are arranged in the order from the longer cycle,
in case of fs = 48, 000 Hz and N = 5, is shown. In the table 2, the value of the frequencies
fl to f5 corresponding to the cycles T1 to T5 are listed, and the values each indicating
how many times is the frequency of each component as compare with that of the fundamental
wave, as for the components which are the harmonics or sub-harmonics of the fundamental
wave, are also listed. In the table 2, for example, as for the value 99. 48 Hz of
the frequency f1, f1 = fs/T1 =48, 000/482. 5 = 99. 48 Hz. In this example, the count
values j and k in the flow chart of FIG. 6 move as shown in a table 3 of FIG. 8 (from
the upper line to the lower line in the table 3). As a result, (j, k) = (1, 2 ), (1,
3), (1, 5) and (2, 5) are judged to be in the harmonic relationship since the absolute
value of "Ijk - Tj/Tk" is little enough in case of ε = 0.1. On the other hand, as
for the cases of other combinations (j, k), it is judged that they are not in the
harmonic relationship as shown in the table 3.
[0053] In this way, by introducing the micro standard value ε indicating the permissible
value of the measurement error, even if the ratio of the read out cycles of the two
orthogonal function components are not exactly integer ratio due to the measurement
condition, the accuracy of the GHA, etc., it is still possible to find out the harmonic
relationship between them.
[0054] Here, the cycle Tj corresponding to the count value j, which simultaneously satisfies
two conditions, i.e. the first condition that (j, k) has this harmonic relationship,
and the second condition that the integer closest to "Tj/Tk" is the odd number, is
judged to be the pitch. Even if the sub-harmonic is read out by the GHA since it has
a considerable power, the harmonic of the sub-harmonic is difficult to be read out
since the harmonic of the sub-harmonic is by far smaller than the harmonic of the
fundamental wave. By this, even if the sub-harmonic which frequency is lower than
that of the fundamental wave is included, the second condition cannot be satisfied
by the sub-harmonic since the integer closest to "Tj/Tk" becomes the even number in
case of the sub-harmonic because of the basic relationship between the fundamental
wave and the sub-harmonic. Namely, this second condition is satisfied only in the
case of the fundamental wave.
[0055] Therefore, in this example, the cycle T2 corresponding to (j, k) = (2, 5) which satisfies
those two conditions, is judged to be the pitch. On the other hand, the cycle T1 corresponding
to (j, k) = (1, 2), (1, 3), (1, 5), which is the sub-harmonic of the fundamental wave
is not judged to be the pitch since the integer closest to "Tj/Tk" is the even number.
In this example, the actual process is proceeded form the upper line to the lower
line in the table 3 one by one, and is stopped when the cycle T2 is judged to be the
pitch at (j, k) =(2, 5).
[0056] FIG. 15 shows a waveform W1 of the original signal and a waveform W3 of the pitch
extracted by the third embodiment. Between those waveforms in FIG. 15, a waveform
W2 is shown which is obtained by re-synthesizing 6 main components read out by the
GHA.
[0057] In FIG. 15, it is recognized that the original waveform W1 resembles the waveform
W2 obtained by re-synthesizing the 6 main frequency components read out from this
original waveform W1. This is because the contribution of the 6 main frequency components
are dominant in the original waveform W1. On the other hand, the component having
the longest cycle in the waveforms W1 and W2, which seems to be the fundamental wave
since it can be identified as the longest cycle easily by the human eye, is not judged
to the pitch according to the third embodiment, as understood from the waveform W3.
[0058] As described above, according to the third embodiment, since the fundamental wave
and its harmonics existing in the components read out by the GHA are identified (judged)
by use of such a property that the fundamental wave has a cycle which is integer multiple
of that of the harmonic. Thus, even if there exists a noise which has the frequency
lower than the fundamental wave or the sub-harmonic which has the frequency half of
the fundamental wave, it can be efficiently prevented to erroneously detect the noise
or the sub-harmonic as the fundamental wave. As a result, the pitch detection more
accurate and more reliable than the first and second embodiments, can be performed
by the third embodiment.
[0059] An example of the pitch detection with respect to the acoustic signal obtained by
measuring a piano sound (A4) at a studio by the third embodiment is explained. In
this example, the sampling frequency is 48, 000 Hz, the observation interval of the
GHA is 1024 points (equivalent to 21.3 ms). The Sine waves extracted by the GHA are
listed in a table 4 of FIG. 9. In the table 4, the component number "n/6" (n = 1,
..., 6) in a left hand column indicates that its power is the nth largest among the
extracted 6 components. In this example, 6 components are taken out from the result
analyzed by the GHA as a result of truncating the GHA analysis by use of the standard
"power of the extracted signal components/ power of the original signal ≧ 99%" If
the pitch is extracted from this analysis result obtained by the GHA, the component
indicated by the "component number 3/6" (i.e. the signal component having the frequency
at the vicinity of 440 Hz) indicated by an arrow in the figure, is extracted as the
pitch without the influence of the sub-harmonic (i.e. the signal component having
the frequency at the vicinity of 220 Hz) which has the large amplitude and low frequency.
In this manner, the pitch of the fundamental tone can be correctly detected according
to the third embodiment.
(4) Fourth Embodiment:
[0060] The construction of a fourth embodiment of the present invention is similar to that
of the above described first embodiment of FIG. 1 except that the pitch extracting
unit of the fourth embodiment performs the pitch extracting operation as following.
[0061] The pitch extracting operation of the pitch extracting unit in the fourth embodiment
is explained with reference to a flow chart of FIG. 10. In FIG. 6, the step same as
that in FIG. 5 carries the same reference numeral.
[0062] In FIG. 10, until the step S20, the operation of the fourth embodiment is the same
as that of the second embodiment of FIG. 5. If this step S20 is replaced by the steps
S1 and S2 of the first embodiment of FIG. 1, the fourth embodiment still functions
efficiently as clearly understood from the following explanations.
[0063] Although the components in the harmonic relationship are firstly found and then the
fundamental wave is found out from the arrangement and the characteristic of the components
in the harmonic relationship according to the third embodiment, the fourth embodiment
is characterized in that the fundamental wave can be found by considering the natures
of the harmonics even if there exist only harmonics and there does not exist the fundamental
wave itself in the original signal, since the fundamental wave is lost by the frequency
band limitation and the like.
[0064] More concretely, in FIG. 10, as the initialization, a count value Lmax indicating
the maximum number of possible combinations of the count values j and k for judging
the harmonic relationship one after another is set to "0" (step S51). Then, the count
value j is set to "1" (step S52). Then, the count value j is compared with the value
N obtained at the step S20 (step S53). Here, if it is not j ≧ N (NO), a count value
L indicating the number of possible combinations of the count values j and k is set
to "0" (step S54), and the count value k is set to "j + 1" (step S55). Then, it is
judged whether or not k > N (step S56). Here, if it is k > N (YES), the count value
j is incremented by one (step S57), and the flow returns to the step S53. On the other
hand, if it is not k > N at the step S56 (NO), a count value 1 indicating the order
of harmonics which are presently considered, is set to "1" (step S58). Then, the count
value 1 is compared with a predetermined number H, which indicates until which order
the harmonics are to be considered for the pitch detection (step S59). If the predetermined
number H is set to, for example, "10", it is enough to consider the components having
energies which can be practically measured. Therefore, the predetermined number H
may be set to a value not more than 10 by the user in accordance with the required
pitch extraction accuracy and the object for the pitch detection. At this step S59,
if 1 > H (YES), the count value k is incremented by one (step S60), and the flow returns
to the step S56. On the other hand, if it is not 1 > H at the step S59 (NO), a count
value m indicating the order of the presently considered harmonics is set to "1 +
1" (step S61), and it is judged whether or not m > H (step S62). Here, if it is not
m > H (NO), it is judged whether or not an absolute value of "(Tj/Tk)/(m/1)-1" is
less than the predetermined micro standard value ε (step S63). Here if it is not less
(NO), the count value m is incremented by one (step S64), and the flow returns to
the step S62. At the step S63, if it is less (YES), the count value L is incremented
by one (step S65), the count value m is also incremented by one (step S64), and the
flow returns to the step S62. On the other hand, if it is m > H at the step S62 (YES),
it is judged whether or not Lmax < L (step S66). Here, if it is Lmax > L (YES), the
Lmax is set to L, and Jmax is set to j, and lmax is set to 1 (step S67). Then, the
count value 1 is incremented by one (step S68), and the flow returns to the step S59.
On the other hand, at the step S66, if it is not Lmax < L (NO), the flow directly
proceeds to S68 from the step S66, and after the count value 1 is incremented by one,
the flow returns to the step S59.
[0065] At the step S53, if it is j ≧ N (YES), the flow branches to a step S69, where it
is judged whether or not Lmax = 0. If it is not Lmax = 0 (NO), it is judged that the
pitch is Tjmax * 1max (Tjmax: the maximum value of Tj, lmax: the maximum value of
1) (step S70) and the process is ended. On the other hand, at the step S69, if it
is Lmax = 0 (YES), it is failed to find the pitch (step S71), and the process is ended.
At this step S71, it is preferable to display or output a message informing that it
is failed to find the pitch, so as to inform the user of the detection accuracy of
the pitch detection. Further, in place of or in addition to such a display or output
of the message, the longest cycle T1 may be determined as the pitch at the step S71.
[0066] An example of the pitch detection with respect to an acoustic signal, which fundamental
tone is 100 Hz and into which the harmonics are mixed, by the fourth embodiment is
explained. In this example, the sampling frequency is 48, 000 Hz, and the observation
interval of the GHA is 1024 points (equivalent to 21.3 ms). In a table 5 of FIG. 11,
an example of the arrangement of the cycles T1 to T5 of the components read out by
the GHA, which are arranged in the order from the longer cycle in case of N = 5, is
shown. In the table 5, the value of the frequencies fl to f5 corresponding to the
cycles T1 to T5 are listed, and the values indicating how many times is the frequency
of each component as compare with that of the fundamental wave, as for the components
which are the harmonics or sub-harmonics of the fundamental wave, are also listed.
In this example, the GHA analysis is truncated by use of the standard "power of the
extracted signal components/ power of the original signal ≧ 99%". In this example,
the count values j and k in the flow chart of FIG. 10 move as shown in a table 6 of
FIG. 12 (from the upper line to the lower line in the table 6). In the table 6, as
for the "(b) m/1", the count value 1 is moved from 1 to H as shown in a table 7 of
FIG. 13, while the count value m is moved from 1 + 1 to H, so as to select the combination
corresponding to the value closest to the value of "(a) Tj/Tk". In this example, it
is set as H = 10. In the table 6 of FIG. 12, the sign "-" indicates that there is
no combination of l and m which satisfies the condition that the absolute value of
"(a)/(b) - 1" is less than the predetermined standard value ε.
[0067] A table 8 of FIG. 14 is obtained by re-constructing the table 6 of FIG. 12 by treating
the count value 1 as the main parameter. In the table 8, by fixing the count values
j and 1, and the number of possible combinations of m is found one by one in the order
from the smaller 1. In this example, the case of j=2 and 1=4 corresponds to it. Namely,
it is determined as lmax = 3 and lmax = 4. In the present embodiment, if the numbers
of possible combinations are the same to each other, the priority is given to one
of them which corresponds to the smaller j (i.e. lower in the frequency). As a result,
in the present example, Tj × 1 = T2 × 4=481. 2. Thus, the frequency corresponds to
the pitch is determined to be 99. 75 Hz although the component with this frequency
does not exist in the components read out by the GHA, as shown in the table 5 of FIG.
11.
[0068] In this manner, according to the fourth embodiment, although the fundamental tone
as well as its double tone or triple tone is lost, the pitch of the fundamental tone
can be still correctly detected.
[0069] As described above, according to the fourth embodiment, even if the pitch is lost
in the original signal due to the influence of the frequency band limitation and the
like, it is still possible to find the pitch. As a result, the pitch detection more
accurate and more reliable than the first to third embodiments, can be performed by
the fourth embodiment.
[0070] In the above described embodiments, the pitch detection apparatus is constructed
such that the acoustic signal is obtained from the electroacoustic transducer. However,
by replacing the electroacoustic transducer in this construction by a device for generating
an acoustic wave signal, the present embodiments can efficiently function in the same
manner.
[0071] The invention may be embodied in other specific forms without departing from the
spirit or essential characteristics thereof. The present embodiments are therefore
to be considered in all respects as illustrative and not restrictive, the scope of
the invention being indicated by the appended claims rather than by the foregoing
description and all changes which come within the meaning and range of equivalency
of the claims are therefore intended to be embraced therein.