BACKGROUND OF THE INVENTION
1. Field of the Invention
[0001] The present invention relates to a speech time scale modification apparatus capable
of varying a reproduction speed without changing pitch of an acoustic signal mainly
of speech, and more particularly to a speech time scale modification apparatus used
for variable speed reproduction of an acoustic signal in a video tape recorder (VTR)
or a language learning system.
2. Description of the Prior Art
[0002] Various audio and visual (AV) related products are recently introduced, and a broadcasting
network and an information network are distributed and advanced, and our society is
flooded with AV information. Many products having functions for seeing and listening
the AV information efficiently and easily are developed these days. In particular,
as a representative example of a function of hearing efficiently audio information
mainly of speech, fast hearing reproduction function of cassette tape recorder or
automatic answering telephone is known. That is, reproducing a normally recorded speech
at, for example, a double speed, enables to hear efficiently in a half time. As a
similar function of the AV information, a variable speed reproducing function of the
VTR is known. This enables to see and hear the AV information in a short time by reproducing
recorded information of pictures and sounds at a high speed while synchronizing.
[0003] In a language learning system, too, a function of varying a reproduction speed of
speech is provided. For a beginner, natural speed speech of a native speaker is very
hard to hear. In the system, by reproducing the natural speed speech at a low speed,
it is easier to hear a foreign language easily, and language learning effect is enhanced.
[0004] Incidentally, in a case of varying the reproduction speed of the speech recorded
on a tape, usually, when the speed is reproduced at the high speed or the low speed,
the reproduced speech is also changed in the pitch, and it is very hard to hear the
reproduced speech. For example, when reproduced at the high speed, the pitch is higher,
and when reproduced at the low speed, the pitch is lower. Therefore, it is general
to process the speech so as not to change the pitch in such systems at the time of
variable speed reproduction of speech.
[0005] As a most general method for varying only the reproduction speed without changing
the pitch, for example, a method is proposed by G. Fairbanks, W.L. Everitt, R.P. Jaeger
"Method for time or frequency compression-expansion of speech" in Jae S. Lim "Speech
enhancement" pp. 302-307 published by Prentice-Hall Inc in 1983. This is an example
in which a rotary head and a tape are used. In this example, a data reading speed
is changed according to a desired speed. Since a rotating speed of the rotary head
and a running speed of the tape are different, the data in a quantity corresponding
to a difference between the rotating speed and the running speed are regularly discarded
or duplicated. For example, when reproducing at a double speed, first the data are
read out in a period of twice the recording speed. In this state, the pitch is twice
high, and hence the data are discarded at a rate of 1/2. Finally, the remaining data
are reproduced in the same period as in recording, so that only the reproduction speed
is doubled without changing the pitch. In the above methods, however, quality of the
speech deteriorates significantly when reproduced at a variable speed because of discarding
or duplicating the data. Accordingly, a speech time scale modification apparatus improved
in sound quality has been also proposed, and a basic concept is proposed, for example,
by R.J. Scott and S.E. Gerber "Pitch-synchronous tone compression of speech" in Jae
S. Lim "Speech enhancement" pp. 308-310 published by Prentice-Hall Inc in 1983. Herein,
pitch portions of speech signals are extracted, and the time axis is compressed by
regularly omitting repeated waveform portions.
[0006] In the above method, however, the reproducing speed is fixed, and as the reproducing
speed is further from the recording speed, it is harder to hear the speech. In particular,
in a case of viewing the pictures of the VTR or the like slowly or quickly, when the
reproduction speed of the tape is changed, the reproduction speed of the speech also
changes along with the pictures and it is very hard to hear the speech in the conventional
speech time scale modification apparatus.
SUMMARY OF THE INVENTION
[0007] In view of the above, a primary object of the invention is to present a speech time
scale modification apparatus which, when playing back an audio signal containing speech
from a recording medium at a playback speed different from a recording speed, reproduces
the speech at a speed close to the recording speed by sequentially changing a reproducing
speed of a speech portion depending on a quantity of a speechless portion in the audio
signal in a range between the playback speed and the recording speed, thereby enabling
to reproduce the speed at a clearly recognizable quality. It is another object of
the invention to realize a speech time scale modification apparatus allowing, when
playing back at the same speed as the recording speed, to hear easily rapid speech
by properly changing the speech to a slow speed below the recording speed depending
on the quantity of the speechless portion. It is still another object of the invention
to realize a speech time scale modification apparatus which, when playing back at
a lower speed than the recording speed, reproduces the speech at a speed close to
the recording speed, by properly changing an expanding ratio of the speechless portion
and an expanding ratio of the speech portion to thereby obtain a clearly recognizable
speech.
[0008] To achieve the objects, the invention provides a speech time scale modification apparatus
capable of notably improving clarity of the speech in variable speed reproduction,
by detecting the speechless portion of an acoustic signal being read out from a recording
medium, and compressing or expanding the speechless portion, and sequentially changing
the compressing or expanding ratio of speech portion depending on the quantity of
the speechless portion.
[0009] Accordingly, in one aspect of the invention, a speech time scale modification apparatus
comprises a recording and reproducing section for reproducing an acoustic signal recorded
in a recording medium at a reproduction speed higher than a recording speed, a speech
judging section for judging a speechless portion and a speech portion of the acoustic
signal, a buffer memory for storing data of the reproduced acoustic signal, a write
control section for controlling a write address of the buffer memory so as to write
the data of the acoustic signal judged to be the speech portion in the speech judging
section into the buffer memory, a read control section for controlling reading of
the data from the buffer memory and a read address of the buffer memory, a residual
storage data amount monitor section for monitoring a residual storage data amount
in the buffer memory from a current write address of the buffer memory and a current
read address of the buffer memory, an adaptive speed control section for determining
a modification speed of the data depending on the residual storage data amount obtained
from the residual storage data amount monitor section, and a time scale compressing
section for compressing time scale of the acoustic signal depending on the modification
speed determined in the adaptive speed control section.
[0010] In another aspect of the present invention, a speech time scale modification apparatus
comprises a recording and reproducing section for reproducing an acoustic signal recorded
in a r-ecording medium at the same speed as a recording speed, a speech judging section
to judge a speechless portion and a speech portion of the acoustic signal, a buffer
memory for storing data of the acoustic signal, a write control section for controlling
a write address of the buffer memory so as to write the data of the acoustic signal
judged to be the speech portion in the speech judging section into the buffer memory,
a read control section for controlling reading of the data from the buffer memory
and a read address of the buffer memory, a residual storage data amount monitor section
for monitoring a residual storage data amount in the buffer memory from a current
write address of the buffer memory and a current read address of the buffer memory,
an adaptive speed control section for determining a modification speed depending on
the residual storage data amount from the residual storage data amount monitor section,
and a time scale expanding section for expanding time scale of the acoustic signal
depending on the modification speed determined in the adaptive speed control section.
[0011] In a further aspect of the invention, a speech time scale modification apparatus
comprises a recording and reproducing section for reproducing an acoustic signal recorded
in a recording medium at a reproduction speed lower than a recording speed, a speech
judging a section for judging a speechless portion and a speech portion of the acoustic
signal, an input buffer for storing data of the acoustic signal, a time scale expanding
section for expanding time scale of the data of the acoustic signal of the input buffer
by independently setting a time scale expanding ratio to the speechless portion and
a time scale expanding ratio to the speech portion from a judging result of the speech
judging section, an output buffer for storing output data of the time scale expanding
section, a residual storage data amount monitor section for monitoring a residual
storage data amount being stored in the output buffer, and expanding ratio control
section for determining an expanding ratio of time scale modification of the speech
portion and the speechless portion depending on the residual storage data amount.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] Fig. 1 is a block diagram showing a constitution of an a speech time scale modification
apparatus in a first embodiment of the invention.
[0013] Fig. 2 (a) and Fig. 2 (b) are explanatory diagrams explaining measuring methods of
residual storage data amounts in the first embodiment.
[0014] Fig. 3 (a) is an explanatory diagram of a speed setting method by a linear rule of
an adaptive speed control section in the first embodiment.
[0015] Fig. 3 (b) is an explanatory diagram of a speed setting method by a nonlinear rule
of the adaptive speed control section in the first embodiment.
[0016] Fig. 3 (c) is an explanatory diagram of a speed setting method by a staircase rule
of the adaptive speed control section.
[0017] Fig. 4 is a circuit diagram of a time scale control section in the first embodiment.
[0018] Fig. 5 (a) shows a data row before processing data in the time scale control section
in the first embodiment.
[0019] Fig. 5 (b) shows a data row after processing the data in the time scale control section
in the first embodiment.
[0020] Fig. 6 is a flow chart showing other operation of a write control section in the
first embodiment.
[0021] Fig. 7 is a block diagram showing a constitution of a speech time scale modification
apparatus in a second embodiment of the invention.
[0022] Fig. 8 (a) is an explanatory diagram of a speed setting method by a linear rule of
an adaptive speed control section in the second embodiment.
[0023] Fig. 8 (b) is an explanatory diagram of a speed setting method by a nonlinear rule
of the adaptive speed control section in the second embodiment.
[0024] Fig. 8 (c) is an explanatory diagram of a speed setting method by a staircase rule
of the adaptive speed control section in the second embodiment of the invention.
[0025] Fig. 9 is a circuit diagram of a time scale control section in the second embodiment.
[0026] Fig. 10 (a) shows a data row before processing data in the time scale control section
in the second embodiment.
[0027] Fig. 10 (b) shows a data row after processing the data in the time scale control
section in the second embodiment.
[0028] Fig. 11 is a flow chart showing other operation of a write control section in the
second embodiment.
[0029] Fig. 12 is a block diagram showing a constitution of speech time scale modification
apparatus in a third embodiment of the invention.
[0030] Fig. 13 (a) is an explanatory diagram of a first expanding ratio setting table of
an expanding ratio determining section in the third embodiment of the invention.
[0031] Fig. 13 (b) is an explanatory diagram of a second expanding ratio setting table of
the expanding ratio determining section.
[0032] Fig. 14 (a), (b), (c) are principle diagrams showing operations of a time scale expanding
section in the third embodiment.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0033] An outline of a first embodiment of the invention is described below. The first embodiment
relates to a speech time scale modification apparatus capable of sequentially changing
to a speed below a reproduction speed depending on a quantity of a speechless portion
when reproducing an audio signal recorded in a recording medium at a higher speed
than a recording speed. In the first place, out of the audio signal being read out
at a high speed, a speech portion and the speechless portion are detected, and only
the speech portion is written into a buffer memory having a specific capacity. The
data is always output while processing speed modification. At this time, since the
speed differs in writing into and reading out from the buffer memory, a modification
speed is properly altered so as to avoid on a basis of a memory remainder in the buffer
memory overflow or underflow on the buffer memory. As a result, even in high speed
reproduction, it is possible to reproduce the audio signal at a speed below the reproduction
speed depending on the quantity of the speechless portion.
[0034] Referring now to the drawings, the first embodiment is described in detail below.
Fig. 1 is a block diagram showing a constitution of a speech time scale modification
apparatus in the first embodiment.
[0035] First, an acoustic signal is read out from a recording and reproducing section 101
at a speed of M (≧1) times the recording speed. Hereinafter, the speed refers to the
relative speed to the recording speed (M=1). Herein, supposing a sampling period of
recording in the recording and reproducing section 101 to be T, the acoustic signal
reproduced at M times speed from the recording and reproducing section 101 is converted
into a digital signal series in a sampling period T/M sequentially in an A/D converter
102. The digital signal series is fed into a speech judging section 103, and the speech
portion and the speechless portion of the digital signal series are judged. This speech
judgement is done, for example, as follows. Supposing a sample value series of the
digital signal series to be s
i, in N sample value series, it is judged that the sample value series is the speech
portion when a formula (1) in satisfied, and the speechless portion when the formula
(1) is not satisfied. Herein, Pth is a predetermined threshold value for judgement
between the speech portion and the speechless portion.

[0036] Supposing a pointer (hereinafter called a write point er) indicating an address for
storing next data on a buffer memory 105 to be Pw, when the sample value series is
judged to be the speech portion in the formula (1), the sample value series is sequentially
stored at the address in the buffer memory 105 indicated by the write pointer Pw by
a write control section 104, and Pw is increased. When the sample value series is
judged to be the speechless portion, to the contrary, the write control section 104
stops storing the sample value series in the buffer memory 105. In this way, only
data of speech portions are accumulated in the buffer memory 105.
[0037] The sample value series is judged herein to be the speech portion when the formula
(1) is satisfied, and the speechless portion when the formula (1) is satisfied, but
a short sample value series judged to be speechless consecutive before or after the
sample value series satisfying the formula (1) may be included in the speech portion.
[0038] In a read control section 106, the data in the buffer memory 105 is read out sequentially
in the period T, and sent into a time scale control section 109. Herein, a pointer
(hereinafter a read pointer) indicating an address of next data on the buffer memory
105 to be read out is supposed to be Pr. In a residual storage data amount monitor
section 107, by using configuration of the write pointer Pw and the read pointer Pr,
a residual storage data amount not read out yet from the buffer memory 105 is measured
sequentially. Fig. 2 (a) and Fig. 2 (b) are explanatory diagrams explaining measuring
methods of residual storage data amount, and there are two cases Fig. 2 (a) and Fig.
2 (b) depending on the configuration of the write pointer and the read pointer. In
Fig. 2 (a) and Fig. 2 (b), supposing a start address of the buffer memory to be a
o, and an end address to be a
n-1 (a
n-1 > a
o), a residual storage data amount Z not read out yet is shown in shaded areas in Fig.
2 (a) and Fig. 2 (b). and calculated as follows.

This is equivalent when the buffer memory 105 is handled as a so-called cyclic memory.
Usually, to read out and output the data from the buffer memory, the write pointer
Pw must be ahead of the read pointer Pr on the cyclic memory, and therefore if Pw
and Pr overlap (Pw = Pr), the read control section 106 stops reading out the data,
and the read pointer Pr maintains the address at this time. In the overlapped state
of Pw and Pr, two cases are considered, that is, Pr catches up with Pr in Fig. 2 (a),
and Pw catches up with Pr in Fig. 2 (b). In the latter case, actually, the residual
storage data amount corresponds to a capacity of the buffer memory 105, that is, i=n,
but in this case also the residual storage data amount Z is reset to 0.
[0039] On the basis of a value of the residual storage data amount Z obtained in the residual
storage data amount monitor section 107, in an adaptive speed control section 108,
the speed of the time scale modification is set to a slow speed as close to the recording
speed as possible when the residual storage data amount is small, or to a properly
fast speed so that the write pointer Pw may not catch up with the read pointer Pr
when the residual storage data amount is abundant. The operation of the adaptive speed
control section 108 is explained below in a case of reproducing at a double (M = 2)
speed of the recording speed from the recording and reproducing section 101. Herein,
a maximum value of the modification speed is 2 same as the reproduction speed, and
a minimum value of the modification speed is 1 same as the recording speed. Fig. 3
(a), (b), and (c) show a relation between the residual storage data amount and the
modification speed, and these are rules for setting the modification speed. Fig. 3
(a) shows a rule of linear correspondence between the residual storage data amount
and the modification speed. In this case, the modification speed V is calculated in
the following formula.

Fig. 3(b) shows an example of a rule of nonlinear correspondence between the residual
storage data amount and the modification speed. Corresponding the nonlinear correspondence
by quadratic curve, the modification speed V is calculated as follows.

In a case of Fig. 3 (a), the modification speed can be changed smoothly depending
on increment or decrement of the residual storage data amount, while it is a feature
of Fig. 3 (b) that it is stabilized near the recording speed 1 until the data is accumulated
to a certain extend in the buffer memory 105.
[0040] Fig. 3 (c) relates to an example of defining the nonlinear correspondence on a staircase
profile, and the modification speed V is calculated as follows.

A rule shown in Fig. 3 (c) can realize nearly the same control as the rule in Fig.
3 (b) in a smaller quantity of calculation and circuit scale.
[0041] In this way, by determining the modification speed on the basis of the rules in Fig.
3 (a), Fig 3 (b), or Fig. 3 (c), the modification speed can be set at an easy-to-hear
speed close to recording speed 1 as for an input signal including more than a specified
quantity of speechless portion in even a signal reproduced at a double speed, or set
at a maximum modification speed 2 if signals without the speechless portion are reproduced,
so that data missing does not occur. Herein, the maximum value of the modification
speed is 2 and the minimum value is 1, but the same rules can be applied if the maximum
value is smaller than 2 (for example, 1.8) and the minimum value is greater than 1
(for example, 1.5). However, when setting the maximum value smaller than 2, if the
signals without the speechless portion are reproduced continuously and the signals
are reproduced twice the recording speed, and all data cannot be read out and part
of the data must be discarded. It corresponds to the case when Pw catches up with
Pr in Fig. 2 (b), and it can be resolved by resetting the residual storage data amount
to 0 as mentioned above and discarding the data in the portion corresponding to the
capacity of the buffer memory accumulated so far. Supposing, for example, the capacity
of the buffer memory to be 256k bits and handling 8-bit data per one sample in 10
kHz sampling, speech data of 32k points (about 3.2 sec) is discarded. By thus setting,
although part of the data is discarded depending on the quantity of the speechless
portion, almost data can be reproduced stably at a slow easy-to-hear speed, by suppressing
the maximum value of the modification speed.
[0042] The value of the modification speed V determined in the adaptive speed control section
108 is sent out into a time scale compressing section 109, and the time scale modification
is set depending on the modification speed V. Fig. 4 is a block diagram showing a
detailed constitution of the time scale compressing section 109. In Fig. 4, reference
numeral 401 denotes a control circuit for controlling the time scale compressing section,
reference numeral 402 denotes a changeover circuit for changing over cross fade processing
section or non-processing section for weighting and adding according to a command
from the control circuit, reference numeral 403 denotes a latch circuit for temporarily
holding the data, and reference numeral 404 denotes a cross fade circuit for weighting
addition processing, and other sections are same as those in the same names in Fig.
1 and are hence identified with same reference numerals. Referring to Fig. 4, operation
of the time scale compressing section 109 is described below.
[0043] The control circuit 401 first determines cross fade section length K and non-processing
section length S in order to realize the modification speed V. Herein, the cross fade
section length K is fixed, but the K may be variable depending on the modification
speed V. Fig. 5 (a) and Fig. 5 (b) are schematic diagrams for explaining the time
scale modification processing, and Fig. 5 (a) shows a data row before processing the
data, and Fig. 5 (b) shows a data row after processing the data. Besides, a portion
corresponding to the cross fade section length K of the data in Fig. 5 (b) shows cross
fade processing of data A and data B. To realize the modification speed V, the length
S should be determined so that 1/V of length (2K + S) of a total of the data A, B,
C before processing may be data length (K + S) after time scale processing. The non-processing
section length S is determined in the following expression.

[0044] Supposing the read pointer Pr indicates a beginning of the data row A of Fig. 5 (a),
cross fade processing is explained. The control circuit 401 changes over the change-over
circuit 402 to cross fade processing side, and instructs the read control section
106 to read out the data indicated the read pointer Pr. The data is fed to and held
in the latch circuit 403. The control circuit 401 instructs the read control section
106 to read out the data indicated by the address of Pr + K of k samples ahead, and
the data indicated by the address of Pr + K is put directly into the cross fade circuit
404. The cross fade circuit 404 executes weighted addition by using the data indicated
by the read pointer Pr and the data indicated by the address of Pr + K. Herein, the
data row A in Fig. 5 (a) is supposed to be d(0), d(1), ..., d(k-1), and the data row
B to be d(k), d(k+1), ..., d(2k-1). Supposing the monotonously increasing weighting
function to be w₁(t) (where 0 ≦ w₁(t) ≦ 1, t = 0, 1, ..., k-1), and monotonously decreasing
weight function to be W₂(t) = 1 - w₁(t), the value c(t) after weighted addition is
obtained in a following equation.

Thereafter, the read pointer Pr is increased, and the control circuit 401 is similarly
processed K times continuously, and after all of the cross fade processing of the
data rows A and B in Fig. 5 (a) are completed, the value of Pr + K at that moment
is set at the read pointer. When the cross fade processing is over, the control circuit
401 changes over the changeover circuit 402 to non-processing side, and the data being
read out from the buffer memory 105 is determined in the expression (6) and the data
of the length S is directly put into a D/A converter 110. Thereafter, by alternately
repeating outputs of the data after the cross fading processing of the length K and
the data of the length S, the time scale modification for giving the modification
speed V is realized. When the modification speed set in the adaptive speed control
section 108 is changed at a certain point, the non-processing section length is varied
in the expression (6), and similar processing is continued thereafter, thereby varying
the modification speed as desired.
[0045] The data row thus processed by time scale modification is finally converted into
analog signal at the period T in the D/A converter 110, thereby obtaining an audio
signal adaptively changing over the speed below the reproducing speed M at same pitch
as in recording.
[0046] According to the first embodiment described so far, since the apparatus for time
scale modification of speech comprises the speech judging section 103, the memory
remainder monitor section 107 for measuring the memory remainder from the configuration
of the write pointer and the read pointer, and the adaptive speed control section
108 for determining the speed of time scale modification depending on the memory remainder,
the modification speed is controlled to be gradually slower when the residual storage
data amount is less and gradually faster when the residual storage data amount is
much, so that the audio signal reproduced at a high speed may be heard at a slow speed
below the reproducing speed depending on the quantity of the speechless portion contained
therein, and at a high speed with almost no missing of information. Besides, comprising
the time scale compressing section 109 for modifying the time scale at a desired modification
speed by adjusting the cross fade section length and the non-processing section length,
time scale modification at high quality is realized, and in particular when the cross
fade section length is fixed at a preset value, an arbitrary speed of time scale modification
is achieved only by changing the length of non-processing section, so that the speech
time scale modification apparatus can be realized in a very simple constitution. In
particular, in the recording and reproducing section accompanied by images such as
the VTR, for example, the images can be reproduced at double speed, and only the sound
may be reproduced at a slow speed of less than the double speed, and hence its effect
is great.
[0047] Incidentally, in the first embodiment, the operation of the write control section
104 may be done as follows. Fig. 6 is a flow chart showing other operation of the
write control section. Referring now to Fig. 6, the other operation of the write control
section is described below.
[0048] The write control section 104 sequentially takes in the values of the residual storage
data amount Z measured by the residual storage data amount monitor section 107 (S601),
and compares with the preset threshold value Zth (S602). Herein, if Z is greater than
Zth, or there is enough residual storage data amount, it is judged if the present
input data is speech or speechless from the result of the speech judging section 103
(S603), and is written into the buffer memory 105 only in the case of the speech portion
(S604), and the write pointer Pw is incremented (S605). If not satisfying a judging
condition at S602, or there is no enough residual storage data amount, regardless
of judgement of speech, the data is written into the buffer memory 105, and the write
pointer Pw is increased. In this series of processing, specifically, it is controlled
so that, in the case of signal containing much speechless portion, the read pointer
Pr may not catch up with the write pointer Pw in Fig. 2 (a), that is, the residual
storage data amount may not become 0.
[0049] In this way, by comprising the write control section for accumulating all data in
the buffer memory when the residual storage data amount is less than a preset value,
the residual storage data amount does not become 0, and the reproduced sound is prevented
from being interrupted (being in mute state), thereby realizing a speech time scale
modification apparatus capable of reproducing naturally without feel of strangeness.
[0050] As explained in the first embodiment, analog signals are recorded in the recording
and reproducing section 101, but it may be realized similarly when handling digital
signals. In this case, the digital signals of the sampling period T are directly fed
into the speech judging section 103, and the same processing is carried out thereafter,
so that the signals adaptively modified in time scale are output.
[0051] An outline of a second embodiment of the invention is described below. In this embodiment
relating to a speech time scale modification apparatus, when reading out the sound
signal recorded on a recording medium at a same speed as a recording speed, the time
scale is changed so that the speed may be below a proper recording speed depending
on the quantity of the speechless portion, so that it is effective to improve the
ease of hearing a fast speech, in particular. Fig. 7 is a block diagram showing a
constitution of a speech time scale modification apparatus in the second embodiment.
An operation of the second embodiment is specifically described below.
[0052] The acoustic signal recorded in a recording and reproducing section 101 is reproduced
at the same speed (M=1) as the recording speed (=1), and is converted into a digital
signal in a sampling period T in an A/D converter 102. This digital signal is sequentially
fed into a speech judging circuit 103 to judge a speech or speechless portion, and
only the signal judged to be a speech portion is written into a buffer memory 105
while a write control section 104 controls the pointer Pw of the address to be written
in. A read control section 106 reads out the data sequentially from the buffer memory
105 and sends out into a time scale expanding section 702, while controlling a read
pointer Pr. In a residual storage data amount monitor section 107, the residual storage
data amount Z not being readout is measured from the current read pointer Pr and the
current write pointer Pw. So far, the operation is same as in the first embodiment,
except that the value of the reproduction speed M is different.
[0053] On the basis of the value of the residual storage data amount Z obtained in the residual
storage data amount monitor section 107, in an adaptive speed control section 701,
the speed of time scale modification is set to a slower speed than the recording speed
1 when the residual storage data amount is less, or to a speed close to the recording
speed 1 adequately so that the write pointer Pw may not catch up with the read pointer
Pr when the residual storage data amount is much. The operation of the adaptive speed
control section 701 is explained below in a case of a reproduction speed M=1 from
the recording and reproducing section 101. Herein, the maximum value of the modification
speed is supposed to be 1 same as the reproducing speed, and the minimum value to
be V
o (where 0 < V
o < 1). Fig. 8 (a), Fig. 8 (b), and Fig. 8 (c) show the relation of the residual storage
data amount and the corresponding modification speed, and present rules for setting
the modification speed. Fig. 8 (a) shows a rule of linear correspondence between the
residual storage data amount and the modification speed. In this case, the modification
speed V is calculated in the following formula.

[0054] Fig. 8 (b) shows an example of a rule of nonlinear correspondence between the residual
storage data amount and the modification speed. By corresponding by quadratic curve,
the modification speed V can be calculated in the following formula.

In the case of Fig. 8 (a), the modification speed can be smoothly changed depending
on increment or decrement of the residual storage data amount, while in the case of
Fig. 8 (b), it is stabilized nearly at the recording speed 1 until the data are accumulated
to a certain extent in the buffer memory 105.
[0055] Fig. 8 (c) shows a case of staircase definition of the nonlinear correspondence,
and the modification speed V can be calculated as follows.

The rule shown in Fig. 8 (c) can realize nearly same control as in the rule in Fig.
8 (b) in a smaller quantity of operation and circuit scale.
[0056] By thus determining the modification speed on the basis of the corresponding rules
in Fig. 8 (a), Fig. 8 (b), and Fig. 8 (c), in the signals reproduced at a single speed,
a slow speed Vo less than the recording speed may be realized in the signal input
containing more than specified quantity of the speechless portion. When signals not
containing the speechless portion continue, the maximum modification speed 1 is set,
so that data missing does not occur.
[0057] The value of the modification speed V determined in the adaptive speed control section
701 is sent out into the time scale expanding section 702, and the time scale is modified
depending on the modification speed V.
[0058] Fig. 9 is a block diagram showing a detailed description of the time scale expanding
section 702. In Fig. 9, reference numeral 901 is a control circuit for controlling
the entire time scale expanding section, reference numeral 902 is a changeover circuit
for changing over cross fade processing section or non-processing section for weighting
and adding according to the command from the control circuit, reference numeral 903
is a latch circuit for temporarily holding the data, and reference numeral 904 is
a cross fade circuit for weighting addition processing, and other sections are same
as those in the same names in Fig. 1 and are hence identified with same reference
numerals. Referring to Fig. 9, an operation of the time scale expanding section 702
is described below.
[0059] The control circuit 901 first determines the cross fade section length K and the
non-processing section length S in order to realize the modification speed V. Herein,
the cross fade section length is fixed value K, but the value of K may be variable
depending on the modification speed V.
[0060] Fig. 10 are schematic diagrams for explaining the time scale modification processing,
and Fig. 10 (a) shows the data before processing, and Fig. 10 (b) shows the data after
processing. Besides, the portion corresponding to the length K enclosed by data row
A and data row B is the data row obtained by cross fade processing of the data row
A and the data row B.
[0061] To realize the modification speed V, the length S should be determined so that 1/V
of the length (2K + S) of the total of the data rows before processing A, B, C may
be the data length (3K + S) after time scale processing. The non-processing section
length S is determined in the following expression.

[0062] Supposing the read pointer Pr indicates a beginning of the data row A of Fig. 10
(a), the cross fade processing is explained. The cross fade processing comprises three
processes.
[0063] The first process is explained. Fig. 11 is a flow chart showing part of the cross
fade processing. First, referring to the modification speed V, the control circuit
901 changes over the changeover circuit 902 to non-processing side (S1101). It consequently
commands the read control section 106 to read out the data indicated by the read pointer
Pr (S1102). The read data is put into the D/A converter 110 without being processed
(S1103). Finally the read pointer Pr is increased (S1104). The same operation is repeated
until data row A is processed completely.
[0064] The second process is explained. The control circuit 901 commands the read control
section 106 so that the read pointer Pr may indicate the beginning data of the data
row A. The control circuit 901 changes over the changeover circuit 902 to the cross
fade processing side, and commands the read control section 106 to read out the data
indicated by the pointer Pr. The data are fed and held in the latch circuit 903. The
control circuit 901 commands the read control section 106 to read out the data shown
by address Pr+k of k samples ahead, and the data are directly put into the cross fade
circuit 904. The cross fade circuit executes weighted addition by using these two
sets of data. Herein, the data row A in Fig. 10 (a) are supposed to be d(0), d(1),
..., d(k-1), and the data row B to be d(k), d(k+1), ..., d(2k-1). Supposing the monotonously
increasing weighting function to be W₁(t) (where 0 ≦ w1(t) ≦ 1, t = 0, 1, ..., k-1),
and monotonously decreasing weight function to be w₂(t) = 1
- w₂(t), the value c(t) after weighted addition is obtained in the following equation.

Thereafter, the read pointer Pr is increased, and the control circuit 901 repeats
same processing K times continuously, and after completion of all cross fade processing
of the data rows A and B in Fig. 10(a), the value of Pr + K at that moment is set
at the read pointer.
[0065] A third process is explained. At an end of the second process, the read pointer Pr
indicates the beginning of the data row B, and the same processing on the data row
in the first process is conducted on the data row B. More specifically, the control
circuit 901 changes over the changeover circuit 902 to the non-processing side. It
also commands the read control section 106 to read out the data indicated by the read
pointer Pr. The read data is put into the D/A converter 110 directly without being
processed. Finally, the read pointer Pr is increased. This series of processing is
repeated on the data row B.
[0066] When the cross fade processing is over, the control circuit 901 changes over the
changeover circuit 902 to non-processing side, and the number of data corresponding
to the length S determined in formula (11) is read out from the buffer memory 105,
and directly put into the D/A converter 110.
[0067] Thereafter, by alternately repeating the cross fade processing of length 3K and output
of non-processed data in length S, the modification of time scale for giving the modification
speed V is realized. When the modification speed set in the adaptive speed control
section 701 is changed at a certain point, the non-processing section length is changed
in formula (11), and the same process is continued, so that the modification speed
may be changed whenever desired.
[0068] The data row thus modified in time scale is finally converted into an analog signal
in the period T by the D/A converter 110, thereby obtaining the acoustic signal adaptively
changing over the speed below the recording speed 1 at the same pitch as when recording.
[0069] In the second embodiment, the operation of the write control section 104 in Fig.
7 can be changed to the flow chart in Fig. 6 same as in the first embodiment.
[0070] According to the second embodiment, as described herein, comprising the speech judging
section 103, the residual storage data amount monitor section 107, and the adaptive
speed control section 701 for determining the speed of time scale modification depending
on the residual storage data amount, by controlling at a speed close to the reproducing
speed 1 when the residual storage data amount is much, and at a slow speed below 1
gradually when the residual storage data amount is less, the sound signal reproduced
at the recording speed can be heard at a slow speed below the recording speed depending
on the quantity of the speechless portion contained therein. It is particularly effective
for hearing sound signal of fast speech.
[0071] In the second embodiment, analog signals are recorded in the recording and reproducing
section 101, but it may be realized similarly in the case of digital signals. In this
case, the digital signals of the sampling period T are directly fed into the speech
judging section 103, and the same processing is carried out thereafter, so that the
signals adaptively modified in time scale are output.
[0072] An outline of a third embodiment of the invention is described below. In this embodiment
relating to a speech time scale modification apparatus, when reproducing the acoustic
signal at a slower speed than the recording speed, a larger expanding ratio than in
the speech portion is set in the speechless portion in the input signal depending
on the degree of accumulation of data to be output, and the speech portion is changed
to a speed as close to the recording speed as possible, so that the ease of hearing
of sound in low speed reproduction is enhanced.
[0073] Fig. 12 is a block diagram showing a constitution of the speech time scale modification
apparatus in the third embodiment. Its operation is descried in detail below while
referring to Fig. 12.
[0074] First, from a recording and reproducing section 1201, acoustic signals are read out
at a speed of M times (0 < M < 1) of recording speed are readout. Supposing the sampling
period in recording in the recording and reproducing section 1201 to be T, the acoustic
signals reproduced at M times speed from the recording and reproducing section 1201
are sequentially changed into digital signal series at sampling period T/M by the
A/D converter, and written into an input buffer 1203.
[0075] The data being read out from the input buffer 1203 is fed into the speech judging
section 1204, where the sample value row is judged to be the speech portion or the
speechless portion. The speech or speechless judgement may be done in the condition
in the formula (1) explained in the first embodiment. On the basis of the judgement,
a time scale expanding section 1206 processes time scale expansion on the data being
read out from an input buffer 1203, and issues to an output buffer 1208. At this time,
the residual storage data not issued to a D/A converter 1211 is monitored in every
specific time in a residual storage data monitor section 1209, and depending on the
remainder, consequently, an expanding ratio determining section 1210 determines an
expanding ratio Es for speechless portion in the speechless portion, and an expanding
ratio Ev for speech portion in the speech portion. Fig. 13 (a) and Fig. 13 (b) are
explanatory diagrams showing a setting method of expanding ratio in the expanding
ratio determining section 1210. The example in Fig. 13 (a) is a case of correspondence
of the residual storage data and expanding ratio by linear function, which prevents
from being empty by increasing an expanding rate when the residual storage data Z
obtained in the residual storage data monitor section 1209 is less, that is, when
the output buffer 1208 is nearly empty. In this case, the expanding rates Es, Ev for
speechless portion and speech portion are obtained in formulas (13) and (14) respectively.


Herein the expanding ratio of the speechless portion is larger than the expanding
ratio of the speech portion because it is intended to prevent the output buffer 1208
from being empty if the expanding rate of the speech portion is lowered. In the example
in Fig. 13 (b), the expanding rate is 1.0 so far as the residual storage data in the
speech portion is not 0, that is, it is reproduced at the same speed as the recording
speed. The speechless portion corresponds by a quadratic function. In this case, the
expanding ratio Es, Ev of speechless sound are expressed in formulas (15) and (16)
respectively.


In this case, if the expanding rate in the speech portion is fixed at 1, when the
speech portion continues, the residual storage data in the output buffer 1208 decreases
suddenly, and hence the expanding rate in the speechless portion is set generally
larger, so that the data may be easily accumulated in the output buffer. Although
it is possible to prevent the output buffer 1208 from being empty by expanding the
time, but if an excessively large expanding ratio is given, it may exceed the capacity
of the output buffer, and the continuity of the output signals cannot be maintained.
Hence, as the residual storage data increases, the expanding ratio is kept low.
[0076] Thus, the expanding ratio determining section 1210 determines the expanding ratios
Ev, Es of speech and speechless portions in every specific period according to the
rule shown in Fig. 13, and sends to a time scale control section 1206. In the time
scale control section 1206, on the basis of the expanding ratios, the time scale is
expanded at the expanding ratio Ev of speech portion in the speech portion and the
expanding ratio Es of speechless portion in the speechless portion.
[0077] Fig. 14 (a), (b), (c) are schematic diagrams showing the operation of the time scale
expanding section 1206 in an example of reproducing the recording medium at 2/3 times
(M=2/3) of the recording speed.
[0078] Fig. 14 (a) shows the time series of input signals in recording, and Fig; 14 (b)
shows a signal row when reproducing the sound from the recording medium at a reproducing
speed of M=2/3. In Fig. 14 (c), blocks 1, 2, 3 are the speechless portions, and blocks
4, 5, 6 are the speech portions, and the signal row after processing is shown, at
the expanding ratio Ev of speech portion of 1.0 as given by the expanding ratio determining
section 1210, and the expanding ratio Es of speechless portion of 2.0. Herein, in
the judged speechless portions (blocks 1, 2, 3), as shown in the second embodiment,
the time scale modification of expanding ratio 2.0 is realized by inserting the cross
fade processing section in formula (12), and the data is accumulated in the output
buffer 1208. In the judged speech portions (blocks 4, 5, 6), since the expanding ratio
is 1.0, the data is directly accumulated in the output buffer 1208. When the expanding
ratios obtained from the expanding ratio determining section 1210 are changed, the
expanding ratio is set again in the time scale expanding section 1206, and the time
scale expanding process as shown in Fig. 14 (c) is continued.
[0079] In this way, by properly setting again the expanding ratio while monitoring the quantity
of data accumulated in the output buffer 1208, and absorbing the excess or shortage
of time of the output data in the output buffer, the expanding ratio can be set independently
for the speechless portion and the speech portion even if the rate of the speechless
portion in the signal cannot be expected.
[0080] Thus, according to the third embodiment, by independently setting the time scale
expanding ratio in the speech portion and the speechless portion depending on the
residual storage data, setting the expanding ratio of speech portion at 1/M when the
residual storage data is less than a predetermined quantity to prevent the output
signal from being interrupted, and controlling the expanding ratio so that the speech
portion may be close to the sound speed in recording as far as possible, easy-to-hear
reproduced sound without feel of strangeness can be obtained even if the reproducing
speed from the recording medium is slow.
[0081] In the third embodiment, analog signals are recorded in the recording and reproducing
section 1201, but it may be realized similarly in the case of digital signals. In
this case, the digital signals of the sampling period T are directly fed into the
input buffer 1203, and the same processing as in the third embodiment is carried out
thereafter, so that-the signals adaptively modified in time scale are output.
1. A speech time scale modification apparatus comprising a speech judging section for
judging a speech portion and a speechless portion of an acoustic signal, a buffer
memory for storing data of the acoustic signal, a memory control section for controlling
writing of the data judged to be the speech portion in the speech judging section
into the buffer memory and reading of the data from the buffer memory, and a time
scale modification section for determining a time scale modification speed depending
on an amount of residual storage data which have not been read out from the buffer
memory, and modifying time scale of the acoustic signal depending on the time scale
modification speed.
2. A speech time scale modification apparatus comprising a recording and reproducing
section for reproducing an acoustic signal stored in a recording medium at a reproduction
speed of M (M is a real number and more than one) times a recording speed, a speech
judging section for judging a speech portion and a speechless portion of the acoustic
signal, a buffer memory for storing data of the acoustic signal, a write control section
for controlling a write address of the buffer memory so as to write the data of the
acoustic signal judged to be the speech portion in the speech judging section into
the buffer memory, a read control section for controlling reading of the data from
the buffer memory and a read address of the buffer memory, a residual storage data
amount monitor section for monitoring a residual storage data amount in the buffer
memory from a current write address of the buffer memory and a current read address
of the buffer memory, an adaptive speed control section for determining a modification
speed of the data depending on the residual storage data amount obtained from the
residual storage data amount monitor section, and a time scale compressing section
for compressing time scale of the acoustic signal depending on the modification speed
determined in the adaptive speed control section.
3. A speech time scale modification apparatus of claim 2, wherein the adaptive speed
control section determines the modification speed in proportion to the residual storage
data amount in the buffer memory, by defining the modification speed below the reproduction
speed and above the recording speed.
4. A speech time scale modification apparatus of claim 2, wherein the adaptive speed
control section determines the modification speed on a basis of a modification rule
corresponding nonlinearly to the residual storage data amount, by defining the modification
speed below the reproduction speed and above the recording speed.
5. A speech time scale modification apparatus of claim 2, wherein the time scale compressing
section adjusts the time scale according to the modification speed determined in the
adaptive speed control section, by adjusting a length of a cross fade processing section
for adding products of a sample value row in a specific number of adjacent pieces
respectively multiplied by a monotonously decreasing weighting coefficient and multiplied
by a monotonously increasing weighting coefficient, and a length of a non-processing
section for issuing the data directly, and issuing the length of the cross fade processing
section and the length of the non-processing section alternately.
6. A speech time scale modification apparatus of claim 2, wherein the write control section
controls the write address so as to store only the data judged to be the speech portion
in the speech judging section into the buffer memory when the residual storage data
amount is more than a specific quantity in the residual storage data amount monitor
section, and to store all data into the buffer memory, regardless of judgement of
the speech judging section when the residual storage data amount is less than the
specific quantity in the residual storage data amount monitor section.
7. A speech time scale modification apparatus comprising a recording and reproducing
section for reproducing an acoustic signal recorded in a recording medium at a reproduction
speed same as a recording speed, a speech judging section for judging a speechless
portion and a speech portion of the acoustic signal, a buffer memory for storing data
of the acoustic signal, a write control section for controlling a write address of
the buffer memory so as to write the data of the acoustic signal judged to be the
speech portion in the speech judging section into the buffer memory, a read control
section for controlling reading of the data from the buffer memory and a read address
of the buffer memory, a residual storage data amount monitor section for monitoring
a residual storage data amount in the buffer memory from a current write address of
the buffer memory and a current read address of the buffer memory, an adaptive speed
control section for determining a modification speed of the data depending on the
residual storage data amount obtained from the residual storage data amount monitor
section, and a time scale expanding section for expanding time scale of the acoustic
signal depending on the modification speed determined in the adaptive speed control
section.
8. A speech time scale modification apparatus of claim 7, wherein the adaptive speed
control section determines the modification speed in proportion to the residual storage
data amount in the buffer memory, by defining the modification speed below the reproduction
speed and above the recording speed.
9. A speech time scale modification apparatus of claim 7, wherein the adaptive speed
control section determines the modification speed on a basis of a modification rule
corresponding nonlinearly to the residual storage data amount, by defining the modification
speed below the reproduction speed and above the recording speed of the recording
medium.
10. A speech time scale modification apparatus of claim 7, wherein the time scale expanding
section adjusts the time scale according to the modification speed determined in the
adaptive speed control section, by adjusting a length of a section D linking in a
sequence of A-C-B of sample value sections A, B of a specific number of adjacent pieces,
A being followed by B, and a cross fade processing section C obtained by products
of sample value sections in a specific number of adjacent pieces respectively multiplied
by a monotonously decreasing weighting coefficient and multiplied by a monotonously
increasing weighting coefficient, and a length of a non-processing section E for issuing
the data directly, and issuing the section D and the non-processing section E alternately.
11. A speech time scale modification apparatus of claim 7, wherein the write control section
controls the write address so as to store only the data judged to be the speech portion
in the speech judging section into the buffer memory when the residual storage data
amount is more than a specific quantity in the residual storage data amount monitor
section, and to store all data into the buffer memory, regardless of judgement of
the speech judging section when the residual storage data amount is less than the
specific quantity in the residual storage data amount monitor section.
12. A speech time scale modification apparatus comprising a recording and reproducing
section for reproducing an acoustic signal recorded in a recording medium at a reproduction
speed of M ( M is a real number, 0 < M < 1) times a recording speed, an input buffer
for storing data of the acoustic signal, a speech judging section for judging a speechless
portion and a speech portion of the acoustic signal, a time scale expanding section
for expanding time scale of the data of the acoustic signal of the input buffer by
independently setting a time scale expanding ratio to the speechless portion and a
time scale expanding ratio to the speech portion from a judging result of the speech
judging section, an output buffer for storing output data of the time scale expanding
section, a residual storage data amount monitor section for monitoring a residual
storage data amount of the output data stored in the output buffer, and expanding
ratio control section for determining an expanding ratio of time scale modification
of the speech portion and the speechless portion depending on the residual storage
data amount obtained from the residual storage data amount monitor section.
13. A speech time scale modification apparatus of claim 12, wherein the expanding ratio
control section determines the expanding ratio of time scale modification of the speechless
portion at 1/M or more, and the expanding ratio of time scale modification of the
speech portion in a range of 1.0 or more and 1/M or less, depending on the residual
storage data amount.
14. A speech time scale modification apparatus of claim 12, wherein the expanding ratio
control section determines the expanding ratio of time scale modification of the speech
portion at 1/M when the residual storage data amount is below a specified value, or
at a fixed value otherwise, and the expanding ratio of time scale modification of
the speechless portion in a range of 1/M or more, depending on the residual storage
data amount.