Apparatus for modifying the time scale modification of speech

(19)

(11)

EP 0 702 354 A1

(12)	EUROPEAN PATENT APPLICATION

(43)	Date of publication:
	20.03.1996 Bulletin 1996/12

(21)	Application number: 95306302.1

(22)	Date of filing: 08.09.1995

(51)	International Patent Classification (IPC)⁶: G10L 5/06, G10L 7/08, G10L 9/06, G10L 9/18

(84)	Designated Contracting States:
	DE FR GB

(30)

Priority:

14.09.1994 JP 220131/94
14.09.1994 JP 220132/94
25.10.1994 JP 260206/94

(71)	Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.
	Kadoma-shi, Osaka-fu 571 (JP)

(72)	Inventors:
	Norimatsu, Takeshi Kadoma-shi, Osaka-fu 571 (JP) Misaki, Masayuki Higashinada-ku, Kobe-shi, Hyogo-ken 658 (JP) Watanabe, Koji Hirakata-shi, Osaka-fu 573 (JP) Ueno, Norikazu Kadoma-shi, Osaka-fu 571 (JP) Sato, Kazuhiko Hirakata-shi, Osaka-fu 573 (JP)

(74)	Representative: Crawford, Andrew Birkby et al
	A.A. THORNTON & CO. Northumberland House 303-306 High Holborn London WC1V 7LE London WC1V 7LE (GB)

(54)	Apparatus for modifying the time scale modification of speech

(57) In a speech judging section, a speech portion and a speechless portion of an acoustic signal are judged. Data of the acoustic signal are stored in a buffer memory. A memory control section controls writing of the data judged to be the speech portion in the speech judging section into the buffer memory, and reading of the data from the buffer memory. A time-scale modification section determines a time scale modification speed depending on an amount of residual storage data which have not been read out from the buffer memory, and modifies time scale of the acoustic signal depending on the time scale modification speed.

Description

BACKGROUND OF THE INVENTION

1. Field of the Invention

[0001] The present invention relates to a speech time scale modification apparatus capable of varying a reproduction speed without changing pitch of an acoustic signal mainly of speech, and more particularly to a speech time scale modification apparatus used for variable speed reproduction of an acoustic signal in a video tape recorder (VTR) or a language learning system.

2. Description of the Prior Art

[0002] Various audio and visual (AV) related products are recently introduced, and a broadcasting network and an information network are distributed and advanced, and our society is flooded with AV information. Many products having functions for seeing and listening the AV information efficiently and easily are developed these days. In particular, as a representative example of a function of hearing efficiently audio information mainly of speech, fast hearing reproduction function of cassette tape recorder or automatic answering telephone is known. That is, reproducing a normally recorded speech at, for example, a double speed, enables to hear efficiently in a half time. As a similar function of the AV information, a variable speed reproducing function of the VTR is known. This enables to see and hear the AV information in a short time by reproducing recorded information of pictures and sounds at a high speed while synchronizing.

[0003] In a language learning system, too, a function of varying a reproduction speed of speech is provided. For a beginner, natural speed speech of a native speaker is very hard to hear. In the system, by reproducing the natural speed speech at a low speed, it is easier to hear a foreign language easily, and language learning effect is enhanced.

[0004] Incidentally, in a case of varying the reproduction speed of the speech recorded on a tape, usually, when the speed is reproduced at the high speed or the low speed, the reproduced speech is also changed in the pitch, and it is very hard to hear the reproduced speech. For example, when reproduced at the high speed, the pitch is higher, and when reproduced at the low speed, the pitch is lower. Therefore, it is general to process the speech so as not to change the pitch in such systems at the time of variable speed reproduction of speech.

[0005] As a most general method for varying only the reproduction speed without changing the pitch, for example, a method is proposed by G. Fairbanks, W.L. Everitt, R.P. Jaeger "Method for time or frequency compression-expansion of speech" in Jae S. Lim "Speech enhancement" pp. 302-307 published by Prentice-Hall Inc in 1983. This is an example in which a rotary head and a tape are used. In this example, a data reading speed is changed according to a desired speed. Since a rotating speed of the rotary head and a running speed of the tape are different, the data in a quantity corresponding to a difference between the rotating speed and the running speed are regularly discarded or duplicated. For example, when reproducing at a double speed, first the data are read out in a period of twice the recording speed. In this state, the pitch is twice high, and hence the data are discarded at a rate of 1/2. Finally, the remaining data are reproduced in the same period as in recording, so that only the reproduction speed is doubled without changing the pitch. In the above methods, however, quality of the speech deteriorates significantly when reproduced at a variable speed because of discarding or duplicating the data. Accordingly, a speech time scale modification apparatus improved in sound quality has been also proposed, and a basic concept is proposed, for example, by R.J. Scott and S.E. Gerber "Pitch-synchronous tone compression of speech" in Jae S. Lim "Speech enhancement" pp. 308-310 published by Prentice-Hall Inc in 1983. Herein, pitch portions of speech signals are extracted, and the time axis is compressed by regularly omitting repeated waveform portions.

[0006] In the above method, however, the reproducing speed is fixed, and as the reproducing speed is further from the recording speed, it is harder to hear the speech. In particular, in a case of viewing the pictures of the VTR or the like slowly or quickly, when the reproduction speed of the tape is changed, the reproduction speed of the speech also changes along with the pictures and it is very hard to hear the speech in the conventional speech time scale modification apparatus.

SUMMARY OF THE INVENTION

[0007] In view of the above, a primary object of the invention is to present a speech time scale modification apparatus which, when playing back an audio signal containing speech from a recording medium at a playback speed different from a recording speed, reproduces the speech at a speed close to the recording speed by sequentially changing a reproducing speed of a speech portion depending on a quantity of a speechless portion in the audio signal in a range between the playback speed and the recording speed, thereby enabling to reproduce the speed at a clearly recognizable quality. It is another object of the invention to realize a speech time scale modification apparatus allowing, when playing back at the same speed as the recording speed, to hear easily rapid speech by properly changing the speech to a slow speed below the recording speed depending on the quantity of the speechless portion. It is still another object of the invention to realize a speech time scale modification apparatus which, when playing back at a lower speed than the recording speed, reproduces the speech at a speed close to the recording speed, by properly changing an expanding ratio of the speechless portion and an expanding ratio of the speech portion to thereby obtain a clearly recognizable speech.

[0008] To achieve the objects, the invention provides a speech time scale modification apparatus capable of notably improving clarity of the speech in variable speed reproduction, by detecting the speechless portion of an acoustic signal being read out from a recording medium, and compressing or expanding the speechless portion, and sequentially changing the compressing or expanding ratio of speech portion depending on the quantity of the speechless portion.

[0009] Accordingly, in one aspect of the invention, a speech time scale modification apparatus comprises a recording and reproducing section for reproducing an acoustic signal recorded in a recording medium at a reproduction speed higher than a recording speed, a speech judging section for judging a speechless portion and a speech portion of the acoustic signal, a buffer memory for storing data of the reproduced acoustic signal, a write control section for controlling a write address of the buffer memory so as to write the data of the acoustic signal judged to be the speech portion in the speech judging section into the buffer memory, a read control section for controlling reading of the data from the buffer memory and a read address of the buffer memory, a residual storage data amount monitor section for monitoring a residual storage data amount in the buffer memory from a current write address of the buffer memory and a current read address of the buffer memory, an adaptive speed control section for determining a modification speed of the data depending on the residual storage data amount obtained from the residual storage data amount monitor section, and a time scale compressing section for compressing time scale of the acoustic signal depending on the modification speed determined in the adaptive speed control section.

[0010] In another aspect of the present invention, a speech time scale modification apparatus comprises a recording and reproducing section for reproducing an acoustic signal recorded in a r-ecording medium at the same speed as a recording speed, a speech judging section to judge a speechless portion and a speech portion of the acoustic signal, a buffer memory for storing data of the acoustic signal, a write control section for controlling a write address of the buffer memory so as to write the data of the acoustic signal judged to be the speech portion in the speech judging section into the buffer memory, a read control section for controlling reading of the data from the buffer memory and a read address of the buffer memory, a residual storage data amount monitor section for monitoring a residual storage data amount in the buffer memory from a current write address of the buffer memory and a current read address of the buffer memory, an adaptive speed control section for determining a modification speed depending on the residual storage data amount from the residual storage data amount monitor section, and a time scale expanding section for expanding time scale of the acoustic signal depending on the modification speed determined in the adaptive speed control section.

[0011] In a further aspect of the invention, a speech time scale modification apparatus comprises a recording and reproducing section for reproducing an acoustic signal recorded in a recording medium at a reproduction speed lower than a recording speed, a speech judging a section for judging a speechless portion and a speech portion of the acoustic signal, an input buffer for storing data of the acoustic signal, a time scale expanding section for expanding time scale of the data of the acoustic signal of the input buffer by independently setting a time scale expanding ratio to the speechless portion and a time scale expanding ratio to the speech portion from a judging result of the speech judging section, an output buffer for storing output data of the time scale expanding section, a residual storage data amount monitor section for monitoring a residual storage data amount being stored in the output buffer, and expanding ratio control section for determining an expanding ratio of time scale modification of the speech portion and the speechless portion depending on the residual storage data amount.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] Fig. 1 is a block diagram showing a constitution of an a speech time scale modification apparatus in a first embodiment of the invention.

[0013] Fig. 2 (a) and Fig. 2 (b) are explanatory diagrams explaining measuring methods of residual storage data amounts in the first embodiment.

[0014] Fig. 3 (a) is an explanatory diagram of a speed setting method by a linear rule of an adaptive speed control section in the first embodiment.

[0015] Fig. 3 (b) is an explanatory diagram of a speed setting method by a nonlinear rule of the adaptive speed control section in the first embodiment.

[0016] Fig. 3 (c) is an explanatory diagram of a speed setting method by a staircase rule of the adaptive speed control section.

[0017] Fig. 4 is a circuit diagram of a time scale control section in the first embodiment.

[0018] Fig. 5 (a) shows a data row before processing data in the time scale control section in the first embodiment.

[0019] Fig. 5 (b) shows a data row after processing the data in the time scale control section in the first embodiment.

[0020] Fig. 6 is a flow chart showing other operation of a write control section in the first embodiment.

[0021] Fig. 7 is a block diagram showing a constitution of a speech time scale modification apparatus in a second embodiment of the invention.

[0022] Fig. 8 (a) is an explanatory diagram of a speed setting method by a linear rule of an adaptive speed control section in the second embodiment.

[0023] Fig. 8 (b) is an explanatory diagram of a speed setting method by a nonlinear rule of the adaptive speed control section in the second embodiment.

[0024] Fig. 8 (c) is an explanatory diagram of a speed setting method by a staircase rule of the adaptive speed control section in the second embodiment of the invention.

[0025] Fig. 9 is a circuit diagram of a time scale control section in the second embodiment.

[0026] Fig. 10 (a) shows a data row before processing data in the time scale control section in the second embodiment.

[0027] Fig. 10 (b) shows a data row after processing the data in the time scale control section in the second embodiment.

[0028] Fig. 11 is a flow chart showing other operation of a write control section in the second embodiment.

[0029] Fig. 12 is a block diagram showing a constitution of speech time scale modification apparatus in a third embodiment of the invention.

[0030] Fig. 13 (a) is an explanatory diagram of a first expanding ratio setting table of an expanding ratio determining section in the third embodiment of the invention.

[0031] Fig. 13 (b) is an explanatory diagram of a second expanding ratio setting table of the expanding ratio determining section.

[0032] Fig. 14 (a), (b), (c) are principle diagrams showing operations of a time scale expanding section in the third embodiment.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0033] An outline of a first embodiment of the invention is described below. The first embodiment relates to a speech time scale modification apparatus capable of sequentially changing to a speed below a reproduction speed depending on a quantity of a speechless portion when reproducing an audio signal recorded in a recording medium at a higher speed than a recording speed. In the first place, out of the audio signal being read out at a high speed, a speech portion and the speechless portion are detected, and only the speech portion is written into a buffer memory having a specific capacity. The data is always output while processing speed modification. At this time, since the speed differs in writing into and reading out from the buffer memory, a modification speed is properly altered so as to avoid on a basis of a memory remainder in the buffer memory overflow or underflow on the buffer memory. As a result, even in high speed reproduction, it is possible to reproduce the audio signal at a speed below the reproduction speed depending on the quantity of the speechless portion.

[0034] Referring now to the drawings, the first embodiment is described in detail below. Fig. 1 is a block diagram showing a constitution of a speech time scale modification apparatus in the first embodiment.

[0035] First, an acoustic signal is read out from a recording and reproducing section 101 at a speed of M (≧1) times the recording speed. Hereinafter, the speed refers to the relative speed to the recording speed (M=1). Herein, supposing a sampling period of recording in the recording and reproducing section 101 to be T, the acoustic signal reproduced at M times speed from the recording and reproducing section 101 is converted into a digital signal series in a sampling period T/M sequentially in an A/D converter 102. The digital signal series is fed into a speech judging section 103, and the speech portion and the speechless portion of the digital signal series are judged. This speech judgement is done, for example, as follows. Supposing a sample value series of the digital signal series to be s_i, in N sample value series, it is judged that the sample value series is the speech portion when a formula (1) in satisfied, and the speechless portion when the formula (1) is not satisfied. Herein, Pth is a predetermined threshold value for judgement between the speech portion and the speechless portion.

[0036] Supposing a pointer (hereinafter called a write point er) indicating an address for storing next data on a buffer memory 105 to be Pw, when the sample value series is judged to be the speech portion in the formula (1), the sample value series is sequentially stored at the address in the buffer memory 105 indicated by the write pointer Pw by a write control section 104, and Pw is increased. When the sample value series is judged to be the speechless portion, to the contrary, the write control section 104 stops storing the sample value series in the buffer memory 105. In this way, only data of speech portions are accumulated in the buffer memory 105.

[0037] The sample value series is judged herein to be the speech portion when the formula (1) is satisfied, and the speechless portion when the formula (1) is satisfied, but a short sample value series judged to be speechless consecutive before or after the sample value series satisfying the formula (1) may be included in the speech portion.

[0038] In a read control section 106, the data in the buffer memory 105 is read out sequentially in the period T, and sent into a time scale control section 109. Herein, a pointer (hereinafter a read pointer) indicating an address of next data on the buffer memory 105 to be read out is supposed to be Pr. In a residual storage data amount monitor section 107, by using configuration of the write pointer Pw and the read pointer Pr, a residual storage data amount not read out yet from the buffer memory 105 is measured sequentially. Fig. 2 (a) and Fig. 2 (b) are explanatory diagrams explaining measuring methods of residual storage data amount, and there are two cases Fig. 2 (a) and Fig. 2 (b) depending on the configuration of the write pointer and the read pointer. In Fig. 2 (a) and Fig. 2 (b), supposing a start address of the buffer memory to be a_o, and an end address to be a_n-1 (a_n-1 > a_o), a residual storage data amount Z not read out yet is shown in shaded areas in Fig. 2 (a) and Fig. 2 (b). and calculated as follows.

This is equivalent when the buffer memory 105 is handled as a so-called cyclic memory. Usually, to read out and output the data from the buffer memory, the write pointer Pw must be ahead of the read pointer Pr on the cyclic memory, and therefore if Pw and Pr overlap (Pw = Pr), the read control section 106 stops reading out the data, and the read pointer Pr maintains the address at this time. In the overlapped state of Pw and Pr, two cases are considered, that is, Pr catches up with Pr in Fig. 2 (a), and Pw catches up with Pr in Fig. 2 (b). In the latter case, actually, the residual storage data amount corresponds to a capacity of the buffer memory 105, that is, i=n, but in this case also the residual storage data amount Z is reset to 0.

[0039] On the basis of a value of the residual storage data amount Z obtained in the residual storage data amount monitor section 107, in an adaptive speed control section 108, the speed of the time scale modification is set to a slow speed as close to the recording speed as possible when the residual storage data amount is small, or to a properly fast speed so that the write pointer Pw may not catch up with the read pointer Pr when the residual storage data amount is abundant. The operation of the adaptive speed control section 108 is explained below in a case of reproducing at a double (M = 2) speed of the recording speed from the recording and reproducing section 101. Herein, a maximum value of the modification speed is 2 same as the reproduction speed, and a minimum value of the modification speed is 1 same as the recording speed. Fig. 3 (a), (b), and (c) show a relation between the residual storage data amount and the modification speed, and these are rules for setting the modification speed. Fig. 3 (a) shows a rule of linear correspondence between the residual storage data amount and the modification speed. In this case, the modification speed V is calculated in the following formula.

Fig. 3(b) shows an example of a rule of nonlinear correspondence between the residual storage data amount and the modification speed. Corresponding the nonlinear correspondence by quadratic curve, the modification speed V is calculated as follows.

In a case of Fig. 3 (a), the modification speed can be changed smoothly depending on increment or decrement of the residual storage data amount, while it is a feature of Fig. 3 (b) that it is stabilized near the recording speed 1 until the data is accumulated to a certain extend in the buffer memory 105.

[0040] Fig. 3 (c) relates to an example of defining the nonlinear correspondence on a staircase profile, and the modification speed V is calculated as follows.

A rule shown in Fig. 3 (c) can realize nearly the same control as the rule in Fig. 3 (b) in a smaller quantity of calculation and circuit scale.

[0041] In this way, by determining the modification speed on the basis of the rules in Fig. 3 (a), Fig 3 (b), or Fig. 3 (c), the modification speed can be set at an easy-to-hear speed close to recording speed 1 as for an input signal including more than a specified quantity of speechless portion in even a signal reproduced at a double speed, or set at a maximum modification speed 2 if signals without the speechless portion are reproduced, so that data missing does not occur. Herein, the maximum value of the modification speed is 2 and the minimum value is 1, but the same rules can be applied if the maximum value is smaller than 2 (for example, 1.8) and the minimum value is greater than 1 (for example, 1.5). However, when setting the maximum value smaller than 2, if the signals without the speechless portion are reproduced continuously and the signals are reproduced twice the recording speed, and all data cannot be read out and part of the data must be discarded. It corresponds to the case when Pw catches up with Pr in Fig. 2 (b), and it can be resolved by resetting the residual storage data amount to 0 as mentioned above and discarding the data in the portion corresponding to the capacity of the buffer memory accumulated so far. Supposing, for example, the capacity of the buffer memory to be 256k bits and handling 8-bit data per one sample in 10 kHz sampling, speech data of 32k points (about 3.2 sec) is discarded. By thus setting, although part of the data is discarded depending on the quantity of the speechless portion, almost data can be reproduced stably at a slow easy-to-hear speed, by suppressing the maximum value of the modification speed.

[0042] The value of the modification speed V determined in the adaptive speed control section 108 is sent out into a time scale compressing section 109, and the time scale modification is set depending on the modification speed V. Fig. 4 is a block diagram showing a detailed constitution of the time scale compressing section 109. In Fig. 4, reference numeral 401 denotes a control circuit for controlling the time scale compressing section, reference numeral 402 denotes a changeover circuit for changing over cross fade processing section or non-processing section for weighting and adding according to a command from the control circuit, reference numeral 403 denotes a latch circuit for temporarily holding the data, and reference numeral 404 denotes a cross fade circuit for weighting addition processing, and other sections are same as those in the same names in Fig. 1 and are hence identified with same reference numerals. Referring to Fig. 4, operation of the time scale compressing section 109 is described below.

[0043] The control circuit 401 first determines cross fade section length K and non-processing section length S in order to realize the modification speed V. Herein, the cross fade section length K is fixed, but the K may be variable depending on the modification speed V. Fig. 5 (a) and Fig. 5 (b) are schematic diagrams for explaining the time scale modification processing, and Fig. 5 (a) shows a data row before processing the data, and Fig. 5 (b) shows a data row after processing the data. Besides, a portion corresponding to the cross fade section length K of the data in Fig. 5 (b) shows cross fade processing of data A and data B. To realize the modification speed V, the length S should be determined so that 1/V of length (2K + S) of a total of the data A, B, C before processing may be data length (K + S) after time scale processing. The non-processing section length S is determined in the following expression.

[0044] Supposing the read pointer Pr indicates a beginning of the data row A of Fig. 5 (a), cross fade processing is explained. The control circuit 401 changes over the change-over circuit 402 to cross fade processing side, and instructs the read control section 106 to read out the data indicated the read pointer Pr. The data is fed to and held in the latch circuit 403. The control circuit 401 instructs the read control section 106 to read out the data indicated by the address of Pr + K of k samples ahead, and the data indicated by the address of Pr + K is put directly into the cross fade circuit 404. The cross fade circuit 404 executes weighted addition by using the data indicated by the read pointer Pr and the data indicated by the address of Pr + K. Herein, the data row A in Fig. 5 (a) is supposed to be d(0), d(1), ..., d(k-1), and the data row B to be d(k), d(k+1), ..., d(2k-1). Supposing the monotonously increasing weighting function to be w₁(t) (where 0 ≦ w₁(t) ≦ 1, t = 0, 1, ..., k-1), and monotonously decreasing weight function to be W₂(t) = 1 - w₁(t), the value c(t) after weighted addition is obtained in a following equation.

Thereafter, the read pointer Pr is increased, and the control circuit 401 is similarly processed K times continuously, and after all of the cross fade processing of the data rows A and B in Fig. 5 (a) are completed, the value of Pr + K at that moment is set at the read pointer. When the cross fade processing is over, the control circuit 401 changes over the changeover circuit 402 to non-processing side, and the data being read out from the buffer memory 105 is determined in the expression (6) and the data of the length S is directly put into a D/A converter 110. Thereafter, by alternately repeating outputs of the data after the cross fading processing of the length K and the data of the length S, the time scale modification for giving the modification speed V is realized. When the modification speed set in the adaptive speed control section 108 is changed at a certain point, the non-processing section length is varied in the expression (6), and similar processing is continued thereafter, thereby varying the modification speed as desired.

[0045] The data row thus processed by time scale modification is finally converted into analog signal at the period T in the D/A converter 110, thereby obtaining an audio signal adaptively changing over the speed below the reproducing speed M at same pitch as in recording.

[0046] According to the first embodiment described so far, since the apparatus for time scale modification of speech comprises the speech judging section 103, the memory remainder monitor section 107 for measuring the memory remainder from the configuration of the write pointer and the read pointer, and the adaptive speed control section 108 for determining the speed of time scale modification depending on the memory remainder, the modification speed is controlled to be gradually slower when the residual storage data amount is less and gradually faster when the residual storage data amount is much, so that the audio signal reproduced at a high speed may be heard at a slow speed below the reproducing speed depending on the quantity of the speechless portion contained therein, and at a high speed with almost no missing of information. Besides, comprising the time scale compressing section 109 for modifying the time scale at a desired modification speed by adjusting the cross fade section length and the non-processing section length, time scale modification at high quality is realized, and in particular when the cross fade section length is fixed at a preset value, an arbitrary speed of time scale modification is achieved only by changing the length of non-processing section, so that the speech time scale modification apparatus can be realized in a very simple constitution. In particular, in the recording and reproducing section accompanied by images such as the VTR, for example, the images can be reproduced at double speed, and only the sound may be reproduced at a slow speed of less than the double speed, and hence its effect is great.

[0047] Incidentally, in the first embodiment, the operation of the write control section 104 may be done as follows. Fig. 6 is a flow chart showing other operation of the write control section. Referring now to Fig. 6, the other operation of the write control section is described below.

[0048] The write control section 104 sequentially takes in the values of the residual storage data amount Z measured by the residual storage data amount monitor section 107 (S601), and compares with the preset threshold value Zth (S602). Herein, if Z is greater than Zth, or there is enough residual storage data amount, it is judged if the present input data is speech or speechless from the result of the speech judging section 103 (S603), and is written into the buffer memory 105 only in the case of the speech portion (S604), and the write pointer Pw is incremented (S605). If not satisfying a judging condition at S602, or there is no enough residual storage data amount, regardless of judgement of speech, the data is written into the buffer memory 105, and the write pointer Pw is increased. In this series of processing, specifically, it is controlled so that, in the case of signal containing much speechless portion, the read pointer Pr may not catch up with the write pointer Pw in Fig. 2 (a), that is, the residual storage data amount may not become 0.

[0049] In this way, by comprising the write control section for accumulating all data in the buffer memory when the residual storage data amount is less than a preset value, the residual storage data amount does not become 0, and the reproduced sound is prevented from being interrupted (being in mute state), thereby realizing a speech time scale modification apparatus capable of reproducing naturally without feel of strangeness.

[0050] As explained in the first embodiment, analog signals are recorded in the recording and reproducing section 101, but it may be realized similarly when handling digital signals. In this case, the digital signals of the sampling period T are directly fed into the speech judging section 103, and the same processing is carried out thereafter, so that the signals adaptively modified in time scale are output.

[0051] An outline of a second embodiment of the invention is described below. In this embodiment relating to a speech time scale modification apparatus, when reading out the sound signal recorded on a recording medium at a same speed as a recording speed, the time scale is changed so that the speed may be below a proper recording speed depending on the quantity of the speechless portion, so that it is effective to improve the ease of hearing a fast speech, in particular. Fig. 7 is a block diagram showing a constitution of a speech time scale modification apparatus in the second embodiment. An operation of the second embodiment is specifically described below.

[0052] The acoustic signal recorded in a recording and reproducing section 101 is reproduced at the same speed (M=1) as the recording speed (=1), and is converted into a digital signal in a sampling period T in an A/D converter 102. This digital signal is sequentially fed into a speech judging circuit 103 to judge a speech or speechless portion, and only the signal judged to be a speech portion is written into a buffer memory 105 while a write control section 104 controls the pointer Pw of the address to be written in. A read control section 106 reads out the data sequentially from the buffer memory 105 and sends out into a time scale expanding section 702, while controlling a read pointer Pr. In a residual storage data amount monitor section 107, the residual storage data amount Z not being readout is measured from the current read pointer Pr and the current write pointer Pw. So far, the operation is same as in the first embodiment, except that the value of the reproduction speed M is different.

[0053] On the basis of the value of the residual storage data amount Z obtained in the residual storage data amount monitor section 107, in an adaptive speed control section 701, the speed of time scale modification is set to a slower speed than the recording speed 1 when the residual storage data amount is less, or to a speed close to the recording speed 1 adequately so that the write pointer Pw may not catch up with the read pointer Pr when the residual storage data amount is much. The operation of the adaptive speed control section 701 is explained below in a case of a reproduction speed M=1 from the recording and reproducing section 101. Herein, the maximum value of the modification speed is supposed to be 1 same as the reproducing speed, and the minimum value to be V_o (where 0 < V_o < 1). Fig. 8 (a), Fig. 8 (b), and Fig. 8 (c) show the relation of the residual storage data amount and the corresponding modification speed, and present rules for setting the modification speed. Fig. 8 (a) shows a rule of linear correspondence between the residual storage data amount and the modification speed. In this case, the modification speed V is calculated in the following formula.

[0054] Fig. 8 (b) shows an example of a rule of nonlinear correspondence between the residual storage data amount and the modification speed. By corresponding by quadratic curve, the modification speed V can be calculated in the following formula.

In the case of Fig. 8 (a), the modification speed can be smoothly changed depending on increment or decrement of the residual storage data amount, while in the case of Fig. 8 (b), it is stabilized nearly at the recording speed 1 until the data are accumulated to a certain extent in the buffer memory 105.

[0055] Fig. 8 (c) shows a case of staircase definition of the nonlinear correspondence, and the modification speed V can be calculated as follows.

The rule shown in Fig. 8 (c) can realize nearly same control as in the rule in Fig. 8 (b) in a smaller quantity of operation and circuit scale.

[0056] By thus determining the modification speed on the basis of the corresponding rules in Fig. 8 (a), Fig. 8 (b), and Fig. 8 (c), in the signals reproduced at a single speed, a slow speed Vo less than the recording speed may be realized in the signal input containing more than specified quantity of the speechless portion. When signals not containing the speechless portion continue, the maximum modification speed 1 is set, so that data missing does not occur.

[0057] The value of the modification speed V determined in the adaptive speed control section 701 is sent out into the time scale expanding section 702, and the time scale is modified depending on the modification speed V.

[0058] Fig. 9 is a block diagram showing a detailed description of the time scale expanding section 702. In Fig. 9, reference numeral 901 is a control circuit for controlling the entire time scale expanding section, reference numeral 902 is a changeover circuit for changing over cross fade processing section or non-processing section for weighting and adding according to the command from the control circuit, reference numeral 903 is a latch circuit for temporarily holding the data, and reference numeral 904 is a cross fade circuit for weighting addition processing, and other sections are same as those in the same names in Fig. 1 and are hence identified with same reference numerals. Referring to Fig. 9, an operation of the time scale expanding section 702 is described below.

[0059] The control circuit 901 first determines the cross fade section length K and the non-processing section length S in order to realize the modification speed V. Herein, the cross fade section length is fixed value K, but the value of K may be variable depending on the modification speed V.

[0060] Fig. 10 are schematic diagrams for explaining the time scale modification processing, and Fig. 10 (a) shows the data before processing, and Fig. 10 (b) shows the data after processing. Besides, the portion corresponding to the length K enclosed by data row A and data row B is the data row obtained by cross fade processing of the data row A and the data row B.

[0061] To realize the modification speed V, the length S should be determined so that 1/V of the length (2K + S) of the total of the data rows before processing A, B, C may be the data length (3K + S) after time scale processing. The non-processing section length S is determined in the following expression.

[0062] Supposing the read pointer Pr indicates a beginning of the data row A of Fig. 10 (a), the cross fade processing is explained. The cross fade processing comprises three processes.

[0063] The first process is explained. Fig. 11 is a flow chart showing part of the cross fade processing. First, referring to the modification speed V, the control circuit 901 changes over the changeover circuit 902 to non-processing side (S1101). It consequently commands the read control section 106 to read out the data indicated by the read pointer Pr (S1102). The read data is put into the D/A converter 110 without being processed (S1103). Finally the read pointer Pr is increased (S1104). The same operation is repeated until data row A is processed completely.

[0064] The second process is explained. The control circuit 901 commands the read control section 106 so that the read pointer Pr may indicate the beginning data of the data row A. The control circuit 901 changes over the changeover circuit 902 to the cross fade processing side, and commands the read control section 106 to read out the data indicated by the pointer Pr. The data are fed and held in the latch circuit 903. The control circuit 901 commands the read control section 106 to read out the data shown by address Pr+k of k samples ahead, and the data are directly put into the cross fade circuit 904. The cross fade circuit executes weighted addition by using these two sets of data. Herein, the data row A in Fig. 10 (a) are supposed to be d(0), d(1), ..., d(k-1), and the data row B to be d(k), d(k+1), ..., d(2k-1). Supposing the monotonously increasing weighting function to be W₁(t) (where 0 ≦ w1(t) ≦ 1, t = 0, 1, ..., k-1), and monotonously decreasing weight function to be w₂(t) = 1
- w₂(t), the value c(t) after weighted addition is obtained in the following equation.

Thereafter, the read pointer Pr is increased, and the control circuit 901 repeats same processing K times continuously, and after completion of all cross fade processing of the data rows A and B in Fig. 10(a), the value of Pr + K at that moment is set at the read pointer.

[0065] A third process is explained. At an end of the second process, the read pointer Pr indicates the beginning of the data row B, and the same processing on the data row in the first process is conducted on the data row B. More specifically, the control circuit 901 changes over the changeover circuit 902 to the non-processing side. It also commands the read control section 106 to read out the data indicated by the read pointer Pr. The read data is put into the D/A converter 110 directly without being processed. Finally, the read pointer Pr is increased. This series of processing is repeated on the data row B.

[0066] When the cross fade processing is over, the control circuit 901 changes over the changeover circuit 902 to non-processing side, and the number of data corresponding to the length S determined in formula (11) is read out from the buffer memory 105, and directly put into the D/A converter 110.

[0067] Thereafter, by alternately repeating the cross fade processing of length 3K and output of non-processed data in length S, the modification of time scale for giving the modification speed V is realized. When the modification speed set in the adaptive speed control section 701 is changed at a certain point, the non-processing section length is changed in formula (11), and the same process is continued, so that the modification speed may be changed whenever desired.

[0068] The data row thus modified in time scale is finally converted into an analog signal in the period T by the D/A converter 110, thereby obtaining the acoustic signal adaptively changing over the speed below the recording speed 1 at the same pitch as when recording.

[0069] In the second embodiment, the operation of the write control section 104 in Fig. 7 can be changed to the flow chart in Fig. 6 same as in the first embodiment.

[0070] According to the second embodiment, as described herein, comprising the speech judging section 103, the residual storage data amount monitor section 107, and the adaptive speed control section 701 for determining the speed of time scale modification depending on the residual storage data amount, by controlling at a speed close to the reproducing speed 1 when the residual storage data amount is much, and at a slow speed below 1 gradually when the residual storage data amount is less, the sound signal reproduced at the recording speed can be heard at a slow speed below the recording speed depending on the quantity of the speechless portion contained therein. It is particularly effective for hearing sound signal of fast speech.

[0071] In the second embodiment, analog signals are recorded in the recording and reproducing section 101, but it may be realized similarly in the case of digital signals. In this case, the digital signals of the sampling period T are directly fed into the speech judging section 103, and the same processing is carried out thereafter, so that the signals adaptively modified in time scale are output.

[0072] An outline of a third embodiment of the invention is described below. In this embodiment relating to a speech time scale modification apparatus, when reproducing the acoustic signal at a slower speed than the recording speed, a larger expanding ratio than in the speech portion is set in the speechless portion in the input signal depending on the degree of accumulation of data to be output, and the speech portion is changed to a speed as close to the recording speed as possible, so that the ease of hearing of sound in low speed reproduction is enhanced.

[0073] Fig. 12 is a block diagram showing a constitution of the speech time scale modification apparatus in the third embodiment. Its operation is descried in detail below while referring to Fig. 12.

[0074] First, from a recording and reproducing section 1201, acoustic signals are read out at a speed of M times (0 < M < 1) of recording speed are readout. Supposing the sampling period in recording in the recording and reproducing section 1201 to be T, the acoustic signals reproduced at M times speed from the recording and reproducing section 1201 are sequentially changed into digital signal series at sampling period T/M by the A/D converter, and written into an input buffer 1203.

[0075] The data being read out from the input buffer 1203 is fed into the speech judging section 1204, where the sample value row is judged to be the speech portion or the speechless portion. The speech or speechless judgement may be done in the condition in the formula (1) explained in the first embodiment. On the basis of the judgement, a time scale expanding section 1206 processes time scale expansion on the data being read out from an input buffer 1203, and issues to an output buffer 1208. At this time, the residual storage data not issued to a D/A converter 1211 is monitored in every specific time in a residual storage data monitor section 1209, and depending on the remainder, consequently, an expanding ratio determining section 1210 determines an expanding ratio Es for speechless portion in the speechless portion, and an expanding ratio Ev for speech portion in the speech portion. Fig. 13 (a) and Fig. 13 (b) are explanatory diagrams showing a setting method of expanding ratio in the expanding ratio determining section 1210. The example in Fig. 13 (a) is a case of correspondence of the residual storage data and expanding ratio by linear function, which prevents from being empty by increasing an expanding rate when the residual storage data Z obtained in the residual storage data monitor section 1209 is less, that is, when the output buffer 1208 is nearly empty. In this case, the expanding rates Es, Ev for speechless portion and speech portion are obtained in formulas (13) and (14) respectively.

Herein the expanding ratio of the speechless portion is larger than the expanding ratio of the speech portion because it is intended to prevent the output buffer 1208 from being empty if the expanding rate of the speech portion is lowered. In the example in Fig. 13 (b), the expanding rate is 1.0 so far as the residual storage data in the speech portion is not 0, that is, it is reproduced at the same speed as the recording speed. The speechless portion corresponds by a quadratic function. In this case, the expanding ratio Es, Ev of speechless sound are expressed in formulas (15) and (16) respectively.

In this case, if the expanding rate in the speech portion is fixed at 1, when the speech portion continues, the residual storage data in the output buffer 1208 decreases suddenly, and hence the expanding rate in the speechless portion is set generally larger, so that the data may be easily accumulated in the output buffer. Although it is possible to prevent the output buffer 1208 from being empty by expanding the time, but if an excessively large expanding ratio is given, it may exceed the capacity of the output buffer, and the continuity of the output signals cannot be maintained. Hence, as the residual storage data increases, the expanding ratio is kept low.

[0076] Thus, the expanding ratio determining section 1210 determines the expanding ratios Ev, Es of speech and speechless portions in every specific period according to the rule shown in Fig. 13, and sends to a time scale control section 1206. In the time scale control section 1206, on the basis of the expanding ratios, the time scale is expanded at the expanding ratio Ev of speech portion in the speech portion and the expanding ratio Es of speechless portion in the speechless portion.

[0077] Fig. 14 (a), (b), (c) are schematic diagrams showing the operation of the time scale expanding section 1206 in an example of reproducing the recording medium at 2/3 times (M=2/3) of the recording speed.

[0078] Fig. 14 (a) shows the time series of input signals in recording, and Fig; 14 (b) shows a signal row when reproducing the sound from the recording medium at a reproducing speed of M=2/3. In Fig. 14 (c), blocks 1, 2, 3 are the speechless portions, and blocks 4, 5, 6 are the speech portions, and the signal row after processing is shown, at the expanding ratio Ev of speech portion of 1.0 as given by the expanding ratio determining section 1210, and the expanding ratio Es of speechless portion of 2.0. Herein, in the judged speechless portions (blocks 1, 2, 3), as shown in the second embodiment, the time scale modification of expanding ratio 2.0 is realized by inserting the cross fade processing section in formula (12), and the data is accumulated in the output buffer 1208. In the judged speech portions (blocks 4, 5, 6), since the expanding ratio is 1.0, the data is directly accumulated in the output buffer 1208. When the expanding ratios obtained from the expanding ratio determining section 1210 are changed, the expanding ratio is set again in the time scale expanding section 1206, and the time scale expanding process as shown in Fig. 14 (c) is continued.

[0079] In this way, by properly setting again the expanding ratio while monitoring the quantity of data accumulated in the output buffer 1208, and absorbing the excess or shortage of time of the output data in the output buffer, the expanding ratio can be set independently for the speechless portion and the speech portion even if the rate of the speechless portion in the signal cannot be expected.

[0080] Thus, according to the third embodiment, by independently setting the time scale expanding ratio in the speech portion and the speechless portion depending on the residual storage data, setting the expanding ratio of speech portion at 1/M when the residual storage data is less than a predetermined quantity to prevent the output signal from being interrupted, and controlling the expanding ratio so that the speech portion may be close to the sound speed in recording as far as possible, easy-to-hear reproduced sound without feel of strangeness can be obtained even if the reproducing speed from the recording medium is slow.

[0081] In the third embodiment, analog signals are recorded in the recording and reproducing section 1201, but it may be realized similarly in the case of digital signals. In this case, the digital signals of the sampling period T are directly fed into the input buffer 1203, and the same processing as in the third embodiment is carried out thereafter, so that-the signals adaptively modified in time scale are output.

Claims

1. A speech time scale modification apparatus comprising a speech judging section for judging a speech portion and a speechless portion of an acoustic signal, a buffer memory for storing data of the acoustic signal, a memory control section for controlling writing of the data judged to be the speech portion in the speech judging section into the buffer memory and reading of the data from the buffer memory, and a time scale modification section for determining a time scale modification speed depending on an amount of residual storage data which have not been read out from the buffer memory, and modifying time scale of the acoustic signal depending on the time scale modification speed.

2. A speech time scale modification apparatus comprising a recording and reproducing section for reproducing an acoustic signal stored in a recording medium at a reproduction speed of M (M is a real number and more than one) times a recording speed, a speech judging section for judging a speech portion and a speechless portion of the acoustic signal, a buffer memory for storing data of the acoustic signal, a write control section for controlling a write address of the buffer memory so as to write the data of the acoustic signal judged to be the speech portion in the speech judging section into the buffer memory, a read control section for controlling reading of the data from the buffer memory and a read address of the buffer memory, a residual storage data amount monitor section for monitoring a residual storage data amount in the buffer memory from a current write address of the buffer memory and a current read address of the buffer memory, an adaptive speed control section for determining a modification speed of the data depending on the residual storage data amount obtained from the residual storage data amount monitor section, and a time scale compressing section for compressing time scale of the acoustic signal depending on the modification speed determined in the adaptive speed control section.

3. A speech time scale modification apparatus of claim 2, wherein the adaptive speed control section determines the modification speed in proportion to the residual storage data amount in the buffer memory, by defining the modification speed below the reproduction speed and above the recording speed.

4. A speech time scale modification apparatus of claim 2, wherein the adaptive speed control section determines the modification speed on a basis of a modification rule corresponding nonlinearly to the residual storage data amount, by defining the modification speed below the reproduction speed and above the recording speed.

5. A speech time scale modification apparatus of claim 2, wherein the time scale compressing section adjusts the time scale according to the modification speed determined in the adaptive speed control section, by adjusting a length of a cross fade processing section for adding products of a sample value row in a specific number of adjacent pieces respectively multiplied by a monotonously decreasing weighting coefficient and multiplied by a monotonously increasing weighting coefficient, and a length of a non-processing section for issuing the data directly, and issuing the length of the cross fade processing section and the length of the non-processing section alternately.

6. A speech time scale modification apparatus of claim 2, wherein the write control section controls the write address so as to store only the data judged to be the speech portion in the speech judging section into the buffer memory when the residual storage data amount is more than a specific quantity in the residual storage data amount monitor section, and to store all data into the buffer memory, regardless of judgement of the speech judging section when the residual storage data amount is less than the specific quantity in the residual storage data amount monitor section.

7. A speech time scale modification apparatus comprising a recording and reproducing section for reproducing an acoustic signal recorded in a recording medium at a reproduction speed same as a recording speed, a speech judging section for judging a speechless portion and a speech portion of the acoustic signal, a buffer memory for storing data of the acoustic signal, a write control section for controlling a write address of the buffer memory so as to write the data of the acoustic signal judged to be the speech portion in the speech judging section into the buffer memory, a read control section for controlling reading of the data from the buffer memory and a read address of the buffer memory, a residual storage data amount monitor section for monitoring a residual storage data amount in the buffer memory from a current write address of the buffer memory and a current read address of the buffer memory, an adaptive speed control section for determining a modification speed of the data depending on the residual storage data amount obtained from the residual storage data amount monitor section, and a time scale expanding section for expanding time scale of the acoustic signal depending on the modification speed determined in the adaptive speed control section.

8. A speech time scale modification apparatus of claim 7, wherein the adaptive speed control section determines the modification speed in proportion to the residual storage data amount in the buffer memory, by defining the modification speed below the reproduction speed and above the recording speed.

9. A speech time scale modification apparatus of claim 7, wherein the adaptive speed control section determines the modification speed on a basis of a modification rule corresponding nonlinearly to the residual storage data amount, by defining the modification speed below the reproduction speed and above the recording speed of the recording medium.

10. A speech time scale modification apparatus of claim 7, wherein the time scale expanding section adjusts the time scale according to the modification speed determined in the adaptive speed control section, by adjusting a length of a section D linking in a sequence of A-C-B of sample value sections A, B of a specific number of adjacent pieces, A being followed by B, and a cross fade processing section C obtained by products of sample value sections in a specific number of adjacent pieces respectively multiplied by a monotonously decreasing weighting coefficient and multiplied by a monotonously increasing weighting coefficient, and a length of a non-processing section E for issuing the data directly, and issuing the section D and the non-processing section E alternately.

11. A speech time scale modification apparatus of claim 7, wherein the write control section controls the write address so as to store only the data judged to be the speech portion in the speech judging section into the buffer memory when the residual storage data amount is more than a specific quantity in the residual storage data amount monitor section, and to store all data into the buffer memory, regardless of judgement of the speech judging section when the residual storage data amount is less than the specific quantity in the residual storage data amount monitor section.

12. A speech time scale modification apparatus comprising a recording and reproducing section for reproducing an acoustic signal recorded in a recording medium at a reproduction speed of M ( M is a real number, 0 < M < 1) times a recording speed, an input buffer for storing data of the acoustic signal, a speech judging section for judging a speechless portion and a speech portion of the acoustic signal, a time scale expanding section for expanding time scale of the data of the acoustic signal of the input buffer by independently setting a time scale expanding ratio to the speechless portion and a time scale expanding ratio to the speech portion from a judging result of the speech judging section, an output buffer for storing output data of the time scale expanding section, a residual storage data amount monitor section for monitoring a residual storage data amount of the output data stored in the output buffer, and expanding ratio control section for determining an expanding ratio of time scale modification of the speech portion and the speechless portion depending on the residual storage data amount obtained from the residual storage data amount monitor section.

13. A speech time scale modification apparatus of claim 12, wherein the expanding ratio control section determines the expanding ratio of time scale modification of the speechless portion at 1/M or more, and the expanding ratio of time scale modification of the speech portion in a range of 1.0 or more and 1/M or less, depending on the residual storage data amount.

14. A speech time scale modification apparatus of claim 12, wherein the expanding ratio control section determines the expanding ratio of time scale modification of the speech portion at 1/M when the residual storage data amount is below a specified value, or at a fixed value otherwise, and the expanding ratio of time scale modification of the speechless portion in a range of 1/M or more, depending on the residual storage data amount.

Drawing

Search report