Pitch detection apparatus and method for acoustic waveform

(19)

(11)

EP 0 762 380 A2

(12)	EUROPEAN PATENT APPLICATION

(43)	Date of publication:
	12.03.1997 Bulletin 1997/11

(21)	Application number: 96306416.7

(22)	Date of filing: 04.09.1996

(51)	International Patent Classification (IPC)⁶: G10G 7/02

(84)	Designated Contracting States:
	DE FR GB

(30)

Priority:

04.09.1995 JP 226896/95

(71)	Applicant: PIONEER ELECTRONIC CORPORATION
	Meguro-ku Tokyo-to (JP)

(72)	Inventors:
	Terada, Takahiko, c/o Pioneer Electronic Corp Tsurugashima-shi, Saitama-ken (JP) Fukuda, Hiroaki, c/o Kogakuin University Hachioji-shi, Tokyo-to, 192 (JP) Tohyama, Mikio, c/o Kogakuin University Hachioji-shi, Tokyo-to, 192 (JP) Hirata, Yoshimutsu Hachioji-shi, Tokyo-to, 192 (JP)

(74)	Representative: Brunner, Michael John et al
	GILL JENNINGS & EVERY Broadgate House 7 Eldon Street London EC2M 7LH London EC2M 7LH (GB)

(54)	Pitch detection apparatus and method for acoustic waveform

(57) An acoustic waveform is inputted to a pitch detection apparatus (1) for detecting a pitch of a fundamental wave of the acoustic waveform. The pitch detection apparatus (1) is provided with: an orthogonal function component output device (3) for taking, out of orthogonal function components for every cycle which compose the acoustic waveform, a plurality of orthogonal function components one after another in an order from one orthogonal function component having a greater energy contribution for the acoustic waveform than other orthogonal function components, and outputting the taken out orthogonal function components; and a pitch extract device (4) for extracting as a pitch one of the outputted orthogonal function components, on the basis of a mutual relationship between cycles of the outputted orthogonal function components.

Description

[0001] The present invention generally relates to a frequency analysis of an acoustic or speech waveform in an acoustic or speech recognition, an acoustic or speech synthesis, an automatic dictation of music score, a so-called "karaoke" (i.e. music accompaniment playing) point scoring, a machine diagnosis and the like by use of a computer, and more particularly to a pitch detection apparatus for and a pitch detection method of detecting a pitch of a fundamental wave of an acoustic wave.

[0002] As a practical method of a frequency analysis for an acoustic or speech wave and a vibration wave, there is known a method by means of the FFT (Fast Fourier Transform). However, according to this method by means of the FFT, only a result of the spectrum analysis as for a harmonic of the waveform corresponding to an observation interval (time length) L can be accurately obtained. Thus, in order to extract the pitch (cycle) of the fundamental wave of the acoustic wave, which has a relatively fine frequency distribution, this method by means of the FFT is not enough in its resolution or accuracy with respect to the frequency.

[0003] Accordingly, as a technique for detecting the pitch of the acoustic wave, there is a method of determining the pitch, by filtering the acoustic wave by a pitch filter and detecting the zero-cross timing of the filtered signal (i.e. the timing when the filtered signal crosses the amplitude 0 level). As a central frequency of this pitch filter, the average pitch, which is estimated for each frame frequency sectioned by a window function having a predetermined width, is used.

[0004] In this way, according to the above explained method of detecting the pitch, on condition that the central frequency of the pitch filter is appropriately estimated, the pitch detection of the acoustic wave can be performed with relatively high accuracy.

[0005] In the technical field of the pitch detection apparatus used for the speech recognition, the acoustic synthesis, the automatic dictation of music score, the karaoke point scoring, the machine diagnosis and the like, it is demanded to improve the detection accuracy and to simplify the calculating process and device.

[0006] However, according to the above explained method of detecting the pitch, since the pitch detection accuracy depends upon the probability of the estimation for the central frequency of the pitch filter, a complicated process such as a Cepstrum method is necessary for the correct estimation. As a result, the calculating process and device for the pitch detection apparatus become complicated, which is a problem in practice.

[0007] It is therefore an obiect of the present invention to provide a pitch detection apparatus and a pitch detection method, which can improve the detection accuracy and which can simplify the calculating process and the apparatus construction.

[0008] The above object of the present invention can be achieved by a pitch detection apparatus, to which an acoustic waveform is inputted, for detecting a pitch of a fundamental wave of the acoustic waveform. The pitch detection apparatus is provided with: an orthogonal function component output device for taking, out of orthogonal function components for every cycle which compose the acoustic waveform, a plurality of orthogonal function components one after another in an order from one orthogonal function component having a greater energy contribution for the acoustic waveform than other orthogonal function components, and outputting the taken out orthogonal function components; and a pitch extract device for extracting as a pitch one of the outputted orthogonal function components, on the basis of a mutual relationship between cycles of the outputted orthogonal function components.

[0009] According to the present invention, when the acoustic waveform is inputted, out of the orthogonal function components for every cycle which compose the acoustic waveform, a plurality of orthogonal function components are taken out one after another, in the order from the greater energy contribution for the acoustic waveform, and are outputted by the orthogonal function component output device. Therefore, by taking out the orthogonal function components in an appropriate number in accordance with the characteristic of the acoustic waveform whose pitch is to be detected, the fundamental wave and its harmonics are included in the outputted orthogonal function components. Here, the fundamental wave and its harmonics have certain relationships in their cycles (or frequencies) with each other. Namely, the cycle of the fundamental wave is integer times of that of its harmonic, and the cycle of one harmonic has an integer ratio relationship with that of another harmonic of the same fundamental wave. Consequently, one of the outputted orthogonal function components can be extracted as a pitch by the pitch extract device, on the basis of such a mutual relationship between the cycles of the outputted orthogonal function components. In this way, by use of a relatively simple calculation process and a relatively simple apparatus construction, the detection accuracy in the pitch detection can be improved. As a result, the acoustic or speech recognition apparatus, the acoustic or speech synthesis apparatus, the apparatus for automatically dictating a music score, the karaoke point scoring apparatus, the machine diagnosis apparatus and the like, each of which has a high accuracy, can be produced with a relatively low cost by use of the pitch detection apparatus of the present invention.

[0010] In one aspect of the present invention, the orthogonal function component output device stops taking out and outputting when a ratio of an energy of a sum signal, which is obtained by re-synthesizing the taken out orthogonal function components, with respect to an energy of the acoustic waveform exceeds a predetermined value.

[0011] According to this aspect, when the ratio of the energy of the sum signal to the energy of the acoustic waveform exceeds a predetermined value e.g. 99%, the orthogonal function component output device stops taking out and outputting. As a result, an appropriate number of the orthogonal function components, which include the fundamental wave and its harmonics and which are enough for the pitch extract device to extract the pitch correctly on the basis of the mutual relationship between the cycles, can be automatically outputted from the orthogonal function component output device. Thus, it is possible to efficiently prevent the orthogonal function component output device from performing the unnecessary and surplus taking out operations. This feature is convenient in the practical cases.

[0012] In this aspect of the present invention, it is preferable that the orthogonal function component output device stops taking out and outputting when finishing taking out and outputting a predetermined number of orthogonal function components before the ratio exceeds the predetermined value.

[0013] In this preferable case, even before the ratio exceeds the predetermined value, when the orthogonal function component output device finishes taking out and outputting a predetermined number of orthogonal function components e.g. 10 components, the orthogonal function component output device stops taking out and outputting. Thus, it is possible to efficiently prevent the orthogonal function component output device from performing the unnecessary and surplus taking out operations, in such a case where the energy ratio would not exceed the predetermined value e. g. 99 % due to the characteristic of the acoustic waveform even if the taking out operation is performed by a large number of times. This feature is very convenient in the practical cases.

[0014] In another aspect of the present invention, the pitch extract device extracts as the pitch one of the outputted orthogonal function components, which has the longest cycle.

[0015] According to this aspect, one of the outputted orthogonal function components, which has the longest cycle, is extracted as the pitch, by the pitch extract device. Here, as for the relationship between the cycles of the fundamental wave and its harmonics which are included in the components taken out by the orthogonal function component output device, the cycle of the fundamental wave is integer times of that of its harmonic. Consequently, the component of the fundamental wave can be correctly extracted in most practical cases as having the longest cycle among the taken out components. In this way, the pitch can be detected by use of a relatively simple calculation process and a relatively simple apparatus construction, according to the present invention.

[0016] In another aspect of the present invention, the pitch extract device discriminates some of the outputted orthogonal function components, each of which is an odd order harmonic of another orthogonal function component, and extracts as the pitch one of the discriminated orthogonal function components, which has the longest cycle.

[0017] According to this aspect, at first, some of the outputted orthogonal function components, each of which is the odd order harmonic of another orthogonal function component, are discriminated by the pitch extract device. Here, as for the relationship between the cycles of the fundamental wave and its harmonics which are included in the components taken out by the orthogonal function component output device, the cycle of the fundamental wave is integer times of that of its harmonic. Further, the energy of a harmonic of a sub-harmonic (which has a double cycle of the fundamental wave) of the fundamental wave, which is occasionally included in the components taken out by the orthogonal function component output device, is much smaller than the energy of the harmonic of the fundamental wave. Thus, the harmonics of the sub-harmonic are hardly or not at all included in the components taken out by the appropriate number by the orthogonal function component output device. Consequently, even if there is included the sub-harmonic, which cycle is longer than that of the fundamental wave, in the components outputted from the orthogonal function component output device, the odd order harmonic relationship can be achieved only by the fundamental wave and its harmonics. Finally, the component of the fundamental wave can be correctly extracted as one of the discriminated orthogonal function components, which has the longest cycle, by the pitch extract device. Consequently, it is possible to perform the pitch detection with a high accuracy according to the present invention.

[0018] In another aspect of the present invention, the pitch extract device discriminates a plurality of groups of the outputted orthogonal function components, which cycles have an integer ratio relationship with each other in each of the discriminated groups, selects one of the discriminated groups, which includes the largest number of the orthogonal function components, and estimates as the pitch one of the orthogonal function components having a cycle, which is integer times of that of each of the orthogonal function components in the selected group.

[0019] According to this aspect, at first, a plurality of groups of the outputted orthogonal function components, which cycles have an integer ratio relationship with each other in each of the discriminated groups, are discriminated by the pitch extract device. Then, one of the discriminated groups, which includes the largest number of the orthogonal function components, is selected by the pitch extract device. Here, even in a case where there does not exist the fundamental wave or the double tone etc., in the components outputted by the orthogonal function component output device, due to the frequency band limitation and the like at a time of detecting the acoustic wave, there still exists the integer ratio relationship between one harmonic and another harmonic of the same fundamental wave, since the cycle of the fundamental wave is integer times of the cycle of its harmonic respectively. Further, the energy of the harmonic of the sub-harmonic of the fundamental wave, which is occasionally included in the components taken out by the orthogonal function component output device, is much smaller than the energy of the harmonic of the fundamental wave. Thus, the number of the harmonics of the sub-harmonic included in the components taken out by the appropriate number is supposed to be less than the number of the harmonics of the fundamental wave. Consequently, even if the fundamental wave and the double tone etc. are not taken out and outputted by the orthogonal function component output device, the group (set) of the harmonics of the fundamental wave can be identified by discriminating the group of the components, which cycles have the integer ratio relationship, and by selecting the group which includes the largest number of the components. Finally, one of the orthogonal function components having a cycle, which is integer times of that of each of the orthogonal function components in the selected group, can be estimated as the pitch. In this way, the component of the fundamental wave can be estimated as the pitch which does not exists in the components taken out by the orthogonal function component output device. Consequently, it is possible to perform the pitch detection with a high accuracy according to the present invention.

[0020] In another aspect of the present invention, the orthogonal function component output device: firstly, outputs a first orthogonal function component which has the greatest energy contribution for the acoustic waveform and obtains a first residual waveform by subtracting the first orthogonal function component from the acoustic waveform; secondly, outputs a second orthogonal function component which has the greatest energy contribution for the first residual waveform and obtains a second residual waveform by subtracting the second orthogonal function component from the first residual waveform; and thirdly, outputs an n^th (n: natural number more than 2) orthogonal function component which has the greatest energy contribution for the (n-1)^th residual waveform and obtains an n^th residual waveform by subtracting the n^th orthogonal function component from the (n-1)^th residual waveform.

[0021] According to this aspect, at first, the first orthogonal function component which has the greatest energy contribution for the acoustic waveform is outputted. At this time, the first residual waveform is obtained by subtracting the first orthogonal function component from the acoustic waveform. Nextly, the second orthogonal function component which has the greatest energy contribution for the first residual waveform is outputted. At this time, the second residual waveform is obtained by subtracting the second orthogonal function component from the first residual waveform. Then, in the same manner for the first and second residual waveform, the n^th orthogonal function component which has the greatest energy contribution for the (n-1)^th residual waveform is outputted. At this time, the n^th residual waveform is obtained by subtracting the n^th orthogonal function component from the (n-1)^th residual waveform. Therefore, the orthogonal function component output device can efficiently take out the orthogonal function components one after another, in the order from the greater energy contribution for the acoustic waveform, and can efficiently output the taken out orthogonal function components. Consequently, the pitch detection can be performed very efficiently as a whole according to the present invention.

[0022] In another aspect of the present invention, the pitch detection apparatus is further provided with an electroacoustic transducer for converting the acoustic waveform to an electric signal and outputting the electric signal to said orthogonal function component output means, said orthogonal function component output means taking out the orthogonal function components based on the electric signal outputted from said electroacoustic transducer.

[0023] According to this aspect, the acoustic waveform is inputted to the electroacoustic transducer, is converted to the electric signal, and is outputted to the orthogonal function component output device. Then, the orthogonal function components based on the electric signal is taken out by the orthogonal function component output device.

[0024] The above object of the present invention can be also achieved by a pitch detection method of detecting a pitch of a fundamental wave of an acoustic waveform. The pitch detection method is provided with the steps of: taking, out of orthogonal function components for every cycle which compose the acoustic waveform, a plurality of orthogonal function components one after another in an order from one orthogonal function component having a greater energy contribution for the acoustic waveform than other orthogonal function components, and outputting the taken out orthogonal function components; and extracting as a pitch one of the outputted orthogonal function components, on the basis of a mutual relationship between cycles of the outputted orthogonal function components.

[0025] According to the pitch detection method of the present invention, the action and effect same as those obtained by the aforementioned pitch detection apparatus of the present invention can be also obtained in the same manner. Thus, by use of a relatively simple calculation process and a relatively simple apparatus construction, the detection accuracy in the pitch detection can be improved according to the present invention.

[0026] The nature, utility, and further features of this invention will be more clearly apparent from the following detailed description with respect to preferred embodiments of the invention when read in conjunction with the accompanying drawings briefly described below.

FIG. 1 is a block diagram of a pitch detection apparatus as an embodiment of the present invention;

FIG. 2 is a diagram for explaining a frequency component analyzed by the GHA in the embodiment of FIG. 1;

FIG. 3 is a flow chart showing an operation of the first embodiment;

FIG. 4 is a table 1 showing a concrete example of a pitch detection by the first embodiment;

FIG. 5 is a flow chart showing an operation of the second embodiment;

FIG. 6 is a flow chart showing an operation of the third embodiment;

FIG. 7 is a table 2 for explaining the operation of the third embodiment;

FIG. 8 is a table 3 for explaining the operation of the third embodiment;

FIG. 9 is a table 4 showing a concrete example of a pitch detection by the third embodiment;

FIG. 10 is a flow chart showing an operation of the fourth embodiment;

FIG. 11 is a table 5 for explaining the operation of the fourth embodiment;

FIG. 12 is a table 6 for explaining the operation of the fourth embodiment;

FIG. 13 is a table 7 for explaining the operation of the fourth embodiment;

FIG. 14 is a table 8 showing a concrete example of a pitch detection by the fourth embodiment; and

FIG. 15 is a chart showing waveforms related to the pitch detection by the third embodiment.

[0027] Referring to the accompanying drawings, embodiments of the present invention will be now explained.

(1) First Embodiment

[0028] FIG. 1 shows a pitch detection apparatus as a first embodiment of the present invention.

[0029] In FIG. 1, a pitch detection apparatus 1 is provided with an electroacoustic transducer 2, an f (frequency) spectrum analyzing unit 3, a pitch extracting unit 4 and a memory unit 5. The pitch detection apparatus 1 is adapted to be installed in an acoustic or speech recognition apparatus, an acoustic or speech synthesis apparatus, an apparatus for automatically dictating a music score, a karaoke point scoring apparatus, a machine diagnosis apparatus and the like, as an apparatus for detecting the pitch of an acoustic or voice (e. g. speech) waveform. The electroacoustic transducer 2 is constructed to convert the acoustic wave inputted thereto to an electric signal, and is equipped with a microphone, for example.

[0030] The f spectrum analyzing unit 3 is constructed to perform a frequency spectrum analysis by means of the GHA (General Harmonic Analysis) with respect to an acoustic waveform indicated by the electric signal from the electroacoustic transducer 2. Here, the operation of the f spectrum analyzing unit 3 is more concretely explained.

[0031] The f spectrum analyzing unit 3 performs operations [I] to [III] as following.

[I] At first, it calculates a Fourier coefficient S(f), which is a Sine coefficient, and a Fourier coefficient C(f), which is a Cosine coefficient, by use of following expressions (1) and (2) respectively, from a continuous signal x₀(t), which is observed in an observation interval L having a predetermined time width.

wherein

T :: cycle of each frequency component,
f(s):: frequency
T =: 1/f(s), nT ≦ L
n :: integer number, indicating how many cycles are included in the observation interval L
nT :: integer number
L :: observation interval (time length)

As for the width of the observation interval L, it is experimentally found by the inventors that, for example, a value of 10 to 20 ms is appropriate in practical cases in order to extract the pitch of the acoustic or speech wave. Therefore, for the pitch extraction of the acoustic wave, assuming that the sampling frequency is 48, 000 Hz, for example, the width (time length) of the observation interval L is preferably set to be equivalent to 512 samples.
More concretely, as shown in FIG. 2, assuming that the width (time length) of the observation interval L is equivalent to 512 samples, for example, with respect to n = 1, the Fourier coefficients are respectively calculated as for 256 different values of T i.e. T = 512 (= L), 511 (= L - 1 × 1), 510 (= L - 1 × 2), ..., 258 (= L/2 + 1 × 2), 257 (=L/2 + 1 × 1). With respect to n = 2, the Fourier coefficients are respectively calculated as for 256 different values of T i.e. T = 256 (= L/2), 255.5 (= L/2 - 0.5 × 1), 255 (= L/2 - 0.5 × 2), ..., 129 (= L/4 + 0.5 × 2), 128.5 (=L/4 + 0.5 × 1). In the same manner, the Fourier coefficients are calculated as for each integer number n and each width of the observation interval L which can satisfy the condition of the sampling number 512.
In this way, according to the present embodiment, the result of the Fourier analysis for the fine frequencies which are very fine with respect to a fundamental frequency. This feature of the present embodiment is much more advantageous for improving the accuracy of the pitch detection, than the aforementioned technique by means of the FFT, in which only the result as for the rough frequencies i.e. 1, 2, 3, ..., 256 (L/2) times of the fundamental frequency can be obtained.

[II] Nextly, a frequency f1, at which an energy E(f) expressed by a following expression (3) of a residual difference ε (t, f) expressed by a following expression (4) is minimized in the observation interval L, and the Fourier coefficients S(f1) and C(f1) at this time, are obtained on the basis of the calculated coefficients.

[III] Nextly, a signal x₁(t) expressed by a following expression (5) indicating a residual component in which the frequency component of the frequency f1 (at which the energy is minimized) and which is obtained in the above [II], is removed from the original signal x₀(t), is treated as a new original signal, and the above described calculations in the process [I] to the process [III] are repeated.

[0032] As described above, the f spectrum analyzing unit 3 performs the f spectrum analysis by means of the GHA with respect to the waveforms indicated by the electric signal from the electroacoustic transducer 2.

[0033] In FIG. 1 again, the pitch extracting unit 4 is constructed to extract a pitch in such a manner as explained later with reference to FIG. 3, from the N frequency components (N: natural number) obtained by the f spectrum analyzing unit 3 in the above described manner.

[0034] The memory unit 5 may be equipped with an IC (Integrated Circuit) memory, a magneto disk, an optical disc and the like, and is constructed to store the pitch extracted by the pitch extracting unit 4 for each observation interval. By connecting those pitches extracted for each observation interval over the whole interval, the timely change can be described.

[0035] The pitch detection apparatus 1 may be constructed such that the extracted pitch is outputted as a Sine wave from a speaker as the occasion demands.

[0036] Here, the pitch extracting operation in the first embodiment will be explained with reference to a flow chart of FIG. 3.

[0037] In FIG. 3, at first, the result of the GHA analysis is read out from the f spectrum analyzing unit 3 (step S1). Then, k components are taken out in the order from the component having the larger amplitude to the component having the smaller amplitude (step S2). The value k is predetermined as an appropriate value which can be experimentally determined on the basis of the properties of the objective acoustic wave which is an object for the pitch detection. Then, one of the taken out k components, which has the longest cycle, is determined as the pitch (step S3). Here, by virtue of the basic properties of the fundamental wave and its harmonics, cases where the frequency component, which has the longest cycle among the frequency components read out by the GHA, is coincident with the fundamental wave of the pertinent acoustic wave, are dominant in practical cases. Thus, the pitch extracting operation of the first embodiment is based on such a basic properties of the fundamental wave and its harmonics.

[0038] An example of the pitch detection according to the first embodiment, in which the pitch extraction is executed with respect to a speech (voice) signal of a male vocal, is explained. In this example, the sampling frequency is 48,000 Hz, and the observation interval for the GHA is 1024 points (equivalent to 21.3 ms). The Sine waves extracted by the GHA are indicated in a table 1 of FIG. 4. In the table 1, the component number "n/6" (n = 1, 2, 3) in a left hand column indicates that its power is the nth largest among the extracted 6 components. In this example, only 6 components, which number has been determined in advance by the user, are taken out from the result analyzed by the GHA in the order from the larger power. If the pitch is extracted form this analysis result obtained by the GHA, the component indicated by the "component number 1/6" (i.e. the signal component having the frequency at the vicinity of 212 Hz) which is the Sine wave having the low frequency (i.e. the long cycle) indicated by an arrow in the figure, is extracted as the pitch. In this way, the pitch of the fundamental tone can be correctly detected by use of a rather simple hardware and software construction according to the first embodiment.

(2) Second Embodiment:

[0039] The construction of a second embodiment of the present invention is similar to that of the above described first embodiment of FIG. 1 except that the pitch extracting unit of the second embodiment performs the pitch extracting operation as following.

[0040] The pitch extracting operation of the pitch extracting unit in the second embodiment is explained with reference to a flow chart of FIG. 5. In FIG. 5, the step same as that in FIG. 3 carries the same reference numeral.

[0041] In FIG. 5, a step S20, which is a step for reading out the GHA result from the f spectrum analyzing unit is performed as following.

[0042] Namely, although the constant number components (i. e. k components) are simply read and taken out in the first embodiment (the step S2 in FIG. 3), the reading out operation is repeatedly performed until the ratio of the sum of the energy of the taken out components versus the energy of the input signal reaches a predetermined value e.g. 99 % and then is stopped in the second embodiment. However, in order not to encounter such an undesirable case where the ratio does not reach the predetermined value even if the reading-out operation is repeatedly performed again and again, if the ratio does not reach the predetermined value after the reading out operation has been performed by a predetermined times, the reading-out operation is stopped at that time (step S20). In case of performing the GHA and the pitch extraction at real time in this way, it is possible to prevent the f spectrum analyzing unit from performing the unnecessary GHA since the GHA is stopped when the ratio reaches the predetermined value, which is advantageous.

[0043] More concretely, in the step S20, at first, the sum of the energy Eo of the discrete time waveform is obtained in a certain time period, which is the object of process (step S21). As the initialization, a count value i for counting the number of the taken out components is set to "0" (step S22). Nextly, the count value i is compared with a standard number i_s (step S23). The standard number i_s indicates how many number of components are to be taken out at the maximum, and is set in accordance with the properties of the objective acoustic wave. Here, if it is not i > i_s (NO), the sum of the energy Ei of the 1^st Sine wave to the i^th Sine wave obtained by the GHA is calculated (step S24). Further, the energy ratio Ei/Eo is compared with a standard ratio Es (step S25). The standard ratio Es indicates the ratio at the time when the taking out operation is intended to be stopped (step S25). For example, the standard ratio Es is set to 99 %. Here, if it is not Ei/Eo > Es (NO), the count value i is incremented by one (step S26), and the flow returns to the above mentioned step S23. At the step S23, if it is i > i_s (YES), the flow branches to a step S27, where the present count value i is set to be the taking-out number N for the components (step S27). If it is Ei/Eo > Es at the step S25 (YES), the flow branches to the step S27, and the present count value i is set to be the taking-out number N for the components (step S27). Next, at a step S28, numbers are assigned to the Sine waves obtained by the GHA in the order from the wave having the longer cycle to the wave having the shorter cycle. More concretely, numbers T₁, T₂, ..., T_N are respectively assigned to the cycles of the Sine waves obtained by the GHA in this order (step S28).

[0044] As described above, the reading out operation of the GHA result in the step S20 as a whole is completed.

[0045] Nextly, the flow proceeds to the step S3, and, in the same manner as the first embodiment of FIG. 3, one of the read out components which cycle is the longest is determined as the pitch, and the process is ended (step S3).

[0046] According to the second embodiment, at the time of reading and taking out the GHA result, the reading and taking out operation is repeatedly performed until the energy ratio reaches to the predetermined value or by the predetermined number of times, and if the energy ratio does not reach the predetermined value after performing the reading and taking out operation by the predetermined number of times, the reading and taking out operation is stopped at that time. Accordingly, it is possible to keep the accuracy of the pitch detection high, while it is possible to prevent an undesirable practical case where the unnecessary reading and taking out operation is repeatedly performed.

(3) Third Embodiment:

[0047] The construction of a third embodiment of the present invention is similar to that of the above described first embodiment of FIG. 1 except that the pitch extracting unit of the third embodiment performs the pitch extracting operation as following.

[0048] The pitch extracting operation of the pitch extracting unit in the third embodiment is explained with reference to a flow chart of FIG. 6. In FIG. 6, the step same as that in FIG. 5 carries the same reference numeral.

[0049] In FIG. 6, until the step S20, the operation of the third embodiment is the same as that of the second embodiment of FIG. 5. If this step S20 is replaced by the steps S1 and S2 of the first embodiment of FIG. 1, the third embodiment still functions efficiently as clearly understood from the following explanations.

[0050] Although one of the taken out components, which has the longest cycle, is simply determined as the pitch according to the first and second embodiments (the step S3 in FIG. 3 and FIG. 5), the third embodiment is characterized in that the components in a harmonic relationship are firstly judged, and the fundamental wave is fund from the characteristic in the arrangement of the judged components in the harmonic relationship, in the steps at and after the step S31.

[0051] More concretely, in FIG. 6, as the initialization, a count value j for judging the harmonic relationship one after another is set to "1" (step S31). Then, the count value i is compared with the value N obtained at the step S20 (step S32). Here, if it is not i ≧ N (NO), a count value k for judging the harmonic relationship one after another is set to "j + 1" (step S33), and it is judged whether or not k > N (step S34). Here, if it is k > N (YES), the count value j is incremented by one (step S35), and the flow returns to the step S32. On the other hand, if it is not k > N at the step S34 (NO), an integer closest to a ratio of the cycle Tj to the cycle Tk obtained in the step S20 i. e., the ratio "Tj/Tk" is set as an integer Ijk (step S36). Tj (Tk) indicates a cycle of the j^th (k^th) position in the arrangement of the cycles arranged by the step S28 (in step S20). Then, it is judged whether or not the absolute value of "Ijk - Tj/Tk" is less than a predetermined micro standard value ε (step S37). Here, if it is not less (NO), the count value k is incremented by one (step S38), and the flow returns to the step S34. On the other hand, if it is less (step S37: YES), it is further judged whether or not the integer Ijk is an odd number (step S39). Here, if it is not the odd number (NO), the count value k is incremented by one (step S38), and the flow returns to the step S34. On the other hand, if it is the odd number (step S39: YES), the cycle Tj is determined as the pitch (step S40), and the process is ended. On the other hand, if it is j ≧ N at the step S32 (YES), it is failed to find the pitch due to the measurement error or abnormal condition (step S41), and the process is ended. At this step S41, it is preferable to display or output a message informing that it is failed to find the pitch, so as to inform the user of the detection accuracy of the pitch detection. Further, in place of or in addition to such a display or output of the message, the longest cycle T1 may be determined as the pitch at the step S41.

[0052] In a table 2 of FIG. 7, an example of the arrangement of the cycles T1 to T5 of the components read out by the GHA, which are arranged in the order from the longer cycle, in case of fs = 48, 000 Hz and N = 5, is shown. In the table 2, the value of the frequencies fl to f5 corresponding to the cycles T1 to T5 are listed, and the values each indicating how many times is the frequency of each component as compare with that of the fundamental wave, as for the components which are the harmonics or sub-harmonics of the fundamental wave, are also listed. In the table 2, for example, as for the value 99. 48 Hz of the frequency f1, f1 = fs/T1 =48, 000/482. 5 = 99. 48 Hz. In this example, the count values j and k in the flow chart of FIG. 6 move as shown in a table 3 of FIG. 8 (from the upper line to the lower line in the table 3). As a result, (j, k) = (1, 2 ), (1, 3), (1, 5) and (2, 5) are judged to be in the harmonic relationship since the absolute value of "Ijk - Tj/Tk" is little enough in case of ε = 0.1. On the other hand, as for the cases of other combinations (j, k), it is judged that they are not in the harmonic relationship as shown in the table 3.

[0053] In this way, by introducing the micro standard value ε indicating the permissible value of the measurement error, even if the ratio of the read out cycles of the two orthogonal function components are not exactly integer ratio due to the measurement condition, the accuracy of the GHA, etc., it is still possible to find out the harmonic relationship between them.

[0054] Here, the cycle Tj corresponding to the count value j, which simultaneously satisfies two conditions, i.e. the first condition that (j, k) has this harmonic relationship, and the second condition that the integer closest to "Tj/Tk" is the odd number, is judged to be the pitch. Even if the sub-harmonic is read out by the GHA since it has a considerable power, the harmonic of the sub-harmonic is difficult to be read out since the harmonic of the sub-harmonic is by far smaller than the harmonic of the fundamental wave. By this, even if the sub-harmonic which frequency is lower than that of the fundamental wave is included, the second condition cannot be satisfied by the sub-harmonic since the integer closest to "Tj/Tk" becomes the even number in case of the sub-harmonic because of the basic relationship between the fundamental wave and the sub-harmonic. Namely, this second condition is satisfied only in the case of the fundamental wave.

[0055] Therefore, in this example, the cycle T2 corresponding to (j, k) = (2, 5) which satisfies those two conditions, is judged to be the pitch. On the other hand, the cycle T1 corresponding to (j, k) = (1, 2), (1, 3), (1, 5), which is the sub-harmonic of the fundamental wave is not judged to be the pitch since the integer closest to "Tj/Tk" is the even number. In this example, the actual process is proceeded form the upper line to the lower line in the table 3 one by one, and is stopped when the cycle T2 is judged to be the pitch at (j, k) =(2, 5).

[0056] FIG. 15 shows a waveform W1 of the original signal and a waveform W3 of the pitch extracted by the third embodiment. Between those waveforms in FIG. 15, a waveform W2 is shown which is obtained by re-synthesizing 6 main components read out by the GHA.

[0057] In FIG. 15, it is recognized that the original waveform W1 resembles the waveform W2 obtained by re-synthesizing the 6 main frequency components read out from this original waveform W1. This is because the contribution of the 6 main frequency components are dominant in the original waveform W1. On the other hand, the component having the longest cycle in the waveforms W1 and W2, which seems to be the fundamental wave since it can be identified as the longest cycle easily by the human eye, is not judged to the pitch according to the third embodiment, as understood from the waveform W3.

[0058] As described above, according to the third embodiment, since the fundamental wave and its harmonics existing in the components read out by the GHA are identified (judged) by use of such a property that the fundamental wave has a cycle which is integer multiple of that of the harmonic. Thus, even if there exists a noise which has the frequency lower than the fundamental wave or the sub-harmonic which has the frequency half of the fundamental wave, it can be efficiently prevented to erroneously detect the noise or the sub-harmonic as the fundamental wave. As a result, the pitch detection more accurate and more reliable than the first and second embodiments, can be performed by the third embodiment.

[0059] An example of the pitch detection with respect to the acoustic signal obtained by measuring a piano sound (A4) at a studio by the third embodiment is explained. In this example, the sampling frequency is 48, 000 Hz, the observation interval of the GHA is 1024 points (equivalent to 21.3 ms). The Sine waves extracted by the GHA are listed in a table 4 of FIG. 9. In the table 4, the component number "n/6" (n = 1, ..., 6) in a left hand column indicates that its power is the nth largest among the extracted 6 components. In this example, 6 components are taken out from the result analyzed by the GHA as a result of truncating the GHA analysis by use of the standard "power of the extracted signal components/ power of the original signal ≧ 99%" If the pitch is extracted from this analysis result obtained by the GHA, the component indicated by the "component number 3/6" (i.e. the signal component having the frequency at the vicinity of 440 Hz) indicated by an arrow in the figure, is extracted as the pitch without the influence of the sub-harmonic (i.e. the signal component having the frequency at the vicinity of 220 Hz) which has the large amplitude and low frequency. In this manner, the pitch of the fundamental tone can be correctly detected according to the third embodiment.

(4) Fourth Embodiment:

[0060] The construction of a fourth embodiment of the present invention is similar to that of the above described first embodiment of FIG. 1 except that the pitch extracting unit of the fourth embodiment performs the pitch extracting operation as following.

[0061] The pitch extracting operation of the pitch extracting unit in the fourth embodiment is explained with reference to a flow chart of FIG. 10. In FIG. 6, the step same as that in FIG. 5 carries the same reference numeral.

[0062] In FIG. 10, until the step S20, the operation of the fourth embodiment is the same as that of the second embodiment of FIG. 5. If this step S20 is replaced by the steps S1 and S2 of the first embodiment of FIG. 1, the fourth embodiment still functions efficiently as clearly understood from the following explanations.

[0063] Although the components in the harmonic relationship are firstly found and then the fundamental wave is found out from the arrangement and the characteristic of the components in the harmonic relationship according to the third embodiment, the fourth embodiment is characterized in that the fundamental wave can be found by considering the natures of the harmonics even if there exist only harmonics and there does not exist the fundamental wave itself in the original signal, since the fundamental wave is lost by the frequency band limitation and the like.

[0064] More concretely, in FIG. 10, as the initialization, a count value Lmax indicating the maximum number of possible combinations of the count values j and k for judging the harmonic relationship one after another is set to "0" (step S51). Then, the count value j is set to "1" (step S52). Then, the count value j is compared with the value N obtained at the step S20 (step S53). Here, if it is not j ≧ N (NO), a count value L indicating the number of possible combinations of the count values j and k is set to "0" (step S54), and the count value k is set to "j + 1" (step S55). Then, it is judged whether or not k > N (step S56). Here, if it is k > N (YES), the count value j is incremented by one (step S57), and the flow returns to the step S53. On the other hand, if it is not k > N at the step S56 (NO), a count value 1 indicating the order of harmonics which are presently considered, is set to "1" (step S58). Then, the count value 1 is compared with a predetermined number H, which indicates until which order the harmonics are to be considered for the pitch detection (step S59). If the predetermined number H is set to, for example, "10", it is enough to consider the components having energies which can be practically measured. Therefore, the predetermined number H may be set to a value not more than 10 by the user in accordance with the required pitch extraction accuracy and the object for the pitch detection. At this step S59, if 1 > H (YES), the count value k is incremented by one (step S60), and the flow returns to the step S56. On the other hand, if it is not 1 > H at the step S59 (NO), a count value m indicating the order of the presently considered harmonics is set to "1 + 1" (step S61), and it is judged whether or not m > H (step S62). Here, if it is not m > H (NO), it is judged whether or not an absolute value of "(Tj/Tk)/(m/1)-1" is less than the predetermined micro standard value ε (step S63). Here if it is not less (NO), the count value m is incremented by one (step S64), and the flow returns to the step S62. At the step S63, if it is less (YES), the count value L is incremented by one (step S65), the count value m is also incremented by one (step S64), and the flow returns to the step S62. On the other hand, if it is m > H at the step S62 (YES), it is judged whether or not Lmax < L (step S66). Here, if it is Lmax > L (YES), the Lmax is set to L, and Jmax is set to j, and lmax is set to 1 (step S67). Then, the count value 1 is incremented by one (step S68), and the flow returns to the step S59. On the other hand, at the step S66, if it is not Lmax < L (NO), the flow directly proceeds to S68 from the step S66, and after the count value 1 is incremented by one, the flow returns to the step S59.

[0065] At the step S53, if it is j ≧ N (YES), the flow branches to a step S69, where it is judged whether or not Lmax = 0. If it is not Lmax = 0 (NO), it is judged that the pitch is Tjmax * 1max (Tjmax: the maximum value of Tj, lmax: the maximum value of 1) (step S70) and the process is ended. On the other hand, at the step S69, if it is Lmax = 0 (YES), it is failed to find the pitch (step S71), and the process is ended. At this step S71, it is preferable to display or output a message informing that it is failed to find the pitch, so as to inform the user of the detection accuracy of the pitch detection. Further, in place of or in addition to such a display or output of the message, the longest cycle T1 may be determined as the pitch at the step S71.

[0066] An example of the pitch detection with respect to an acoustic signal, which fundamental tone is 100 Hz and into which the harmonics are mixed, by the fourth embodiment is explained. In this example, the sampling frequency is 48, 000 Hz, and the observation interval of the GHA is 1024 points (equivalent to 21.3 ms). In a table 5 of FIG. 11, an example of the arrangement of the cycles T1 to T5 of the components read out by the GHA, which are arranged in the order from the longer cycle in case of N = 5, is shown. In the table 5, the value of the frequencies fl to f5 corresponding to the cycles T1 to T5 are listed, and the values indicating how many times is the frequency of each component as compare with that of the fundamental wave, as for the components which are the harmonics or sub-harmonics of the fundamental wave, are also listed. In this example, the GHA analysis is truncated by use of the standard "power of the extracted signal components/ power of the original signal ≧ 99%". In this example, the count values j and k in the flow chart of FIG. 10 move as shown in a table 6 of FIG. 12 (from the upper line to the lower line in the table 6). In the table 6, as for the "(b) m/1", the count value 1 is moved from 1 to H as shown in a table 7 of FIG. 13, while the count value m is moved from 1 + 1 to H, so as to select the combination corresponding to the value closest to the value of "(a) Tj/Tk". In this example, it is set as H = 10. In the table 6 of FIG. 12, the sign "-" indicates that there is no combination of l and m which satisfies the condition that the absolute value of "(a)/(b) - 1" is less than the predetermined standard value ε.

[0067] A table 8 of FIG. 14 is obtained by re-constructing the table 6 of FIG. 12 by treating the count value 1 as the main parameter. In the table 8, by fixing the count values j and 1, and the number of possible combinations of m is found one by one in the order from the smaller 1. In this example, the case of j=2 and 1=4 corresponds to it. Namely, it is determined as lmax = 3 and lmax = 4. In the present embodiment, if the numbers of possible combinations are the same to each other, the priority is given to one of them which corresponds to the smaller j (i.e. lower in the frequency). As a result, in the present example, Tj × 1 = T2 × 4=481. 2. Thus, the frequency corresponds to the pitch is determined to be 99. 75 Hz although the component with this frequency does not exist in the components read out by the GHA, as shown in the table 5 of FIG. 11.

[0068] In this manner, according to the fourth embodiment, although the fundamental tone as well as its double tone or triple tone is lost, the pitch of the fundamental tone can be still correctly detected.

[0069] As described above, according to the fourth embodiment, even if the pitch is lost in the original signal due to the influence of the frequency band limitation and the like, it is still possible to find the pitch. As a result, the pitch detection more accurate and more reliable than the first to third embodiments, can be performed by the fourth embodiment.

[0070] In the above described embodiments, the pitch detection apparatus is constructed such that the acoustic signal is obtained from the electroacoustic transducer. However, by replacing the electroacoustic transducer in this construction by a device for generating an acoustic wave signal, the present embodiments can efficiently function in the same manner.

[0071] The invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims

1. A pitch detection apparatus (1), to which an acoustic waveform is inputted, for detecting a pitch of a fundamental wave of the acoustic waveform, characterized in that said pitch detection apparatus (1) comprises:

an orthogonal function component output means (3) for taking, out of orthogonal function components for every cycle which compose the acoustic waveform, a plurality of orthogonal function components one after another in an order from one orthogonal function component having a greater energy contribution for the acoustic waveform than other orthogonal function components, and outputting the taken out orthogonal function components; and

a pitch extract means (4) for extracting as a pitch one of the outputted orthogonal function components, on the basis of a mutual relationship between cycles of the outputted orthogonal function components.

2. An apparatus (1) according to claim 1, characterized in that said orthogonal function component output means (3) stops taking out and outputting when a ratio of an energy of a sum signal, which is obtained by re-synthesizing the taken out orthogonal function components, with respect to an energy of the acoustic waveform exceeds a predetermined value.

3. An apparatus (1) according to claim 2, characterized in that said orthogonal function component output means (3) stops taking out and outputting when finishing taking out and outputting a predetermined number of orthogonal function components before the ratio exceeds the predetermined value.

4. An apparatus (1) according to any one of claims 1 to 3, characterized in that said pitch extract means (4) extracts as the pitch one of the outputted orthogonal function components, which has the longest cycle.

5. An apparatus (1) according to any one of claims 1 to 3, characterized in that said pitch extract means (4) discriminates some of the outputted orthogonal function components, each of which is an odd order harmonic of another orthogonal function component, and extracts as the pitch one of the discriminated orthogonal function components, which has the longest cycle.

6. An apparatus (1) according to any one of claims 1 to 3, characterized in that said pitch extract means (4) discriminates a plurality of groups of the outputted orthogonal function components, which cycles have an integer ratio relationship with each other in each of the discriminated groups, selects one of the discriminated groups, which includes the largest number of the orthogonal function components, and estimates as the pitch one of the orthogonal function components having a cycle, which is integer times of that of each of the orthogonal function components in the selected group.

7. An apparatus (1) according to any one of claims 1 to 6, characterized in that said orthogonal function component output means (3): firstly, outputs a first orthogonal function component which has the greatest energy contribution for the acoustic waveform and obtains a first residual waveform by subtracting the first orthogonal function component from the acoustic waveform; secondly, outputs a second orthogonal function component which has the greatest energy contribution for the first residual waveform and obtains a second residual waveform by subtracting the second orthogonal function component from the first residual waveform; and thirdly, outputs an n^th (n: natural number more than 2) orthogonal function component which has the greatest energy contribution for the (n-1)^th residual waveform and obtains an n^th residual waveform by subtracting the n^th orthogonal function component from the (n-1)^th residual waveform.

8. An apparatus (1) according to any one of claims 1 to 7, characterized in that said apparatus further comprises an electroacoustic transducer for converting the acoustic waveform to an electric signal and outputting the electric signal to said orthogonal function component output means, and said orthogonal function component output means takes out the orthogonal function components based on the electric signal outputted from said electroacoustic transducer.

9. A pitch detection method of detecting a pitch of a fundamental wave of an acoustic waveform, characterized in that said method comprises the steps of:

taking, out of orthogonal function components for every cycle which compose the acoustic waveform, a plurality of orthogonal function components one after another in an order from one orthogonal function component having a greater energy contribution for the acoustic waveform than other orthogonal function components, and outputting the taken out orthogonal function components; and

extracting as a pitch one of the outputted orthogonal function components, on the basis of a mutual relationship between cycles of the outputted orthogonal function components.

Drawing