FIELD OF THE INVENTION
[0001] The invention relates to a method for manipulating an audio equivalent signal, comprising
positioning of a chain of mutually overlapping time windows with respect to the audio
equivalent signal, as based on periodicity measurements on said audio equivalent signal,
and wherein a positional displacement between adjacent windows substantially corresponds
to a principal period of said periodicity, and synthesizing an audio output signal
by chained superposition of segment signals, each deriving from the audio equivalent
signal through weighting with the associated window function.
[0002] Such method has been described in EP-A-363 233. The known method is used during speech
synthesis for changing the prosody or pitch of synthesised speech, or to change the
duration of stretches of speech. The known method uses voice marks determined manually
for placing the windows. It is preferred that such a manipulation method can be performed
automatically, is robust against noise, and retains a high audio quality for the output
signal.
[0003] The inventors of the present invention have realized that the manipulation of the
duration can be used in various situations where there are external constraints to
the total length of a self-contained unit of speech, which constraints may specify
both the maximum and the minimum duration of such unit.
SUMMARY TO THE INVENTION
[0004] Accordingly, amongst other things, it is an object of the present invention to position
the manipulated audio equivalent signal in a predetermined time length that differs
from the original length, while on the one hand filling the interval more or less
completely, and on the other hand keeping the impression of the eventual representation
as natural as possible.
[0005] Now, according to one of its aspects, the object is realized in that the invention
is characterized by manipulating a duration of said output signal through systematically
repeating, maintaining, and/or suppressing said segment signals, to a resulting predetermined
overall length that differs from a corresponding duration of said audio equivalent
signal.
[0006] An advantage of the method of positioning windows according to the junior reference
is that it can be machine-executed without any window-to-window human control being
necessary. Furthermore, it has been found that the duration can be changed by a factor
between 2 and ½ without seriously impairing understandability of speech. For lesser
degrees of manipulating the duration, such as by + or - 30%, not only remains the
understandability very good, but also the natural quality of speech is maintained,
and a listener would hardly feel the change of duration as unnatural. A prerequisite
to applying the method is that the pitch can indeed be measured, which for human speech
is a problem knowing various solutions. Situations where the duration of speech should
be manipulated are various, such as in post-synchronizing of movies or other video
representative material, adapting a speech explanation or other matter to physical
motion of objects, such as the closing instant of a door, and many other instances.
In movies, actor utterances should preferably coincide with their facial motions,
or at least with their moving around in general. Typical time scales of the total
duration of the utterance are 0.3 to several seconds. In this short time frame, prior
art had not succeeded in duration manipulation with also preserving naturalness. On
a much longer time scale, the length of a pause can be manipulated, such as is often
done by human interpreters. If the available time is known beforehand, sometimes a
different verbalization can be used, but all these methods require specialized human
skills. The present method is easily applicable and just requires the setting of a
speed-up or slow-down percentage. Of course, the use of the present invention is also
for amending longer durations than in the seconds range.
[0007] In itself the automatic placement of overlapping windows is used in the non pre-published
European Patent EP-B-0 527 527 for adjusting prosody during speech synthesis. The
article "Simple pitch-dependent algorithm for high-quality speech rate changing",
E.P. Neuburg, Journal of the Acoustic Society of America, Vol. 63, No. 2, February
1978, pages 624-625 describes a cut-and-splice method for speeding up or slowing down
speech by removing or, respectively, repeating a stretch of the speech signal whose
length is equal to the pitch period. WO-A- 8 303 483 describes a system for replacing
an original dialogue recorded at the time of shooting a picture by a similar signal
recorded in the studio at a higher quality. The relative timing of the original recording
is kept by comparing both signals on a frame-basis and keeping or repeating a frame
of the studio recording depending on how well the frames match.
[0008] The invention relates also to an apparatus for executing the method and to a storage
medium containing a representation of audio signal equivalent. The invention allows
to fill the available space for a unit of speech (sentence, partial sentence, exclamation,
or other) well nigh completely.
[0009] A particular application is Compact Disc Interactive, especially so in a multi-language
environment. Editing CD-I is by itself a complicated task. Sizing the duration of
speech utterances may now be performed by the machine for relieving the program editor
from this tedium. By itself, CD-I is a well-published storage medium with associated
development platform, the storage itself being an extension from Compact Disc Audio.
[0010] Various advantageous aspects of the invention are recited in dependent Claims.
BRIEF DESCRIPTION OF THE FIGURE
[0011] These and other advantages will be described with reference to a preferred embodiment
which is shown in a number of Figures, of which
Figure 1 shows editing of CDI-program for storage on a CD-I-disc; The following Figures
especially show the technology of the junior reference:
Figures 2a,b,c show speech signals with windows placed according to the invention;
Figure 3 shows an apparatus for changing the pitch and/or duration of a signal;
Figure 4 shows multiplication means and window function value selection means for
use in an apparatus for changing the pitch and/or duration of a signal;
Figure 5 shows window position selection means for implementing the invention;
Figure 6 shows a subsystem for combining several segment signals.
DESCRIPTION OF A PREFERRED EMBODIMENT
[0012] As is commonly understood, the audio or speech equivalent signal may be direct analog
speech, or it may be speech that is stored as a sequence of codes for on the basis
thereon generating synthetic speech. The length of the various windows may be non-uniform,
and in a particular embodiment, the length of each window may be substantially equal
to a local actual pitch period length. Within the window, the window function is uniform,
which means that the window function scales linearly with the width of the window,
which means that generally, there may be an appreciable variation between the widths
of successive windows. The systematical character of the repeating, maintaining, or
suppression implies that there is a certain prescription for the sequence of window
positions that, first, restricts to either repeating or suppressing, either possibly
in combination with maintaining, and furthermore, that the repeating or maintaining
is done under control of an actual or emulated recurrent cycle.
Examples are:
each third window is repeated once, the others are maintained;
of each five successive windows, #2 and #4 are suppressed;
at each next window, a count is incremented by a particular amount and overflow controls
actual suppression or repetition.
[0013] It is commented that the systematical character would not need to be completely uniform.
For example, in post-synchronization of a movie, it could be advantageous to amend
the time durations of various parts of a sentence somewhat differently from each other,
as long as the natural character of the resulting speech would remain. In particular
the movement of a face while speaking speech could to a certain extent be followed
by the dynamics of the audio speech. Also, different sentences in various places of
the post-synchronizing now may have uniform pitch among each other.
[0014] The different representations in parallel may be different languages; it has been
found that the same sentence, translated to another language, would have different
length, counted for example, as a number of syllables: in particular, the German language
caused a longer duration as compared with English and French.
[0015] Other, in particular exotic, languages may lead to even more extreme situations.
Similar situations may distinguish child voices from adult voices.
[0016] In the Figure for a three language CD-I track the pictorial material 200 is shown
with accompanying speech representations in French (202), German (204) and English
(206) before editing. It is intended to lend each language representation (among which
a user may choose) exactly the same duration as the pictorial material (movie, animation,
etcetera). As shown, on line 202, a single window is suppressed, on line 204, five
windows are suppressed. On line 206, six windows are repeated one (crosses). The result
after editing is not shown. It has been found that analysis of the results can prove
infringement. Especially the occurrence of the repeated windows is well traceable.
Moreover, the substantially equal lengths of the various representations is, together
with the high subjective quality of the rendering is a clear indication for the use
of the present technology.
[0017] In certain situations, apart from changing the duration per se, the slowing down
or speeding may lend the speech a character, such as nervous (fast) or majestic (slow).
Also such use is sometimes advantageous. Changing the duration of the audio equivalent
signal may be combined with changing the pitch. The two types of manipulation may
be both in the same direction, for example in that both effectively shorten the duration.
In other circumstances, they could to some degree compensate the effects, so that
the change in duration would be less or even be zero. The change of duration may be
according to a time-varying pattern, whereby the overall change of duration is the
integral or sum of the elementary changes-of-duration.
DESCRIPTION OF A PREFERRED TECHNOLOGY
[0018] Hereinafter, a description of the preferred technology according to the junior reference
is given.
[0019] Figures 2a, 2b and 2c show speech signals with marks 52 placed apart by distances
determined with a pitch meter (that may be conventional), that is, without a fixed
phase reference. In Figure 2a, two successive periods where marked as voiceless by
placing their pitch period length indication outside the scale. The pitch marks (lower
scale) where obtained by interpolating the period length. Although the pitch period
lengths were determined without smoothing other than that inherent in determining
spectra of the speech signal extending over several pitch periods, a very regular
curve was obtained automatically.
[0020] The incremental placement of windows also solves another problem. For unvoiced stretches,
that contain fricatives like the sound "ssss", in which the vocal cords are not excited,
the windows are placed incrementally just like for voiced stretches. The pitch period
length is interpolated between the lengths measured for unvoiced stretches adjacent
to the voiced stretch. This provides regularly spaced windows without audible artefacts.
[0021] The placement of windows is easy if the input audio equivalent signal is monotonous.
In this case, the windows may be placed simply at fixed distances from each other.
This may be effected by preprocessing the signal, so as to change its pitch to a single
monotonous value. The final manipulation to obtain a desired pitch and/or duration
starting can then be performed with windows at uniform spacing.
An exemplary apparatus.
[0022] Figure 3 shows an exemplary embodiment of an apparatus for changing the pitch and/or
duration of an audible signal. The input audio equivalent signal arrives at an input
60, and the output signal leaves at an output 63. The input signal is multiplied by
the window function in multiplication means 61, and stored segment signal by segment
signal in segment slots in storage means 62. To synthesize the output signal on output
63, speech samples from various segment signals are summed in summing means 64. The
manipulation of speech signals, in terms of pitch change and/or duration manipulation,
is effected by addressing the storage means 62 and selecting window function values.
Accordingly, selection of storage addresses for storing the segments is controlled
by window position selection means 65, which also control window function value selection
means 69; selection of readout addresses is controlled by combination means 66.
[0023] In order to explain the operation of the components of the apparatus shown in Figure
3 it will be briefly explained that signal segments S are to be derived from the input
signal X (at 60), the segments being defined by

and these segments are to be superposed to produce the output signal Y (at 63):

(The sum being limited to indices i for which

).
At any point in time t' a signal X(t') is supplied at the input 60, which contributes
to two segments i, i+1 at respective t values

and

(these being the only possibilities that

).
[0024] Figure 4 shows the multiplication means 61 and the window function value selection
means 69. The respective t values t
a, t
b described above are multiplied by the inverse of the period length L
i (determined from the period length in an invertor 74) in scaling multipliers 70a,
70b to determine the corresponding arguments of the window function W. These arguments
are supplied to window function evaluators 71a, 71b (implemented for example in case
of discrete arguments as a lookup table) which outputs the corresponding values of
the window function, which are multiplied with the input signal in two multipliers
72a, 72b. This produces the segment signal values Si, Si+1 at two inputs 73a, 73b
to the storage means 62.
[0025] These segment signal values are stored in the storage means 62 in segment slots at
addresses in the slots corresponding to their respective time point values t
a, t
b and to respective slot numbers. These addresses are controlled by window position
selection means 65. Window position selection means suitable for implementing the
invention are shown in Figure 5. The time point values t
a, t
b are addressed by counters 81, 82, the segment slots numbers are addressed by indexing
means 84, (which output the segment indices i, i+1). The counters 81, 82 and the indexing
means 84 output addresses with a width as appropriate to distinguish the various positions
within the slots and the various slot respectively, but are shown symbolically only
as single lines in Figure 5.
[0026] The two counters 81, 82 are clocked at a fixed clock rate and count from an initial
value loaded from a load input (L), upon a trigger signal at trigger input (T). The
indexing means 84 increment the index values upon reception of this trigger signal.
According to one embodiment, pitch measuring means 86 determine a pitch value from
input 60, and control the scale factor for the scaling multipliers 70a, 70b, and provide
the initial value of the first counter 81 (the initial count being minus the pitch
value), whereas the trigger signal is generated internally in the window position
selection means, once the counter reaches zero, as detected by a comparator 88. This
means that successive windows are placed by incrementing the location of a previous
window by the time needed by the first counter 81 to reach zero.
[0027] In another embodiment, a monotonized signal is applied to the input 60 (this monotonized
signal being obtained by prior processing in which the pitch is adjusted to a time
independent value). In this monotonized case, a constant value, corresponding to the
monotonized pitch is fed as initial value to the first counter 81. In this case the
scaling multipliers 70a, 70b can be omitted since the windows have a fixed size.
[0028] The combination means 66 of Figure 3 are shown in Figure 10. The purpose of the output
side is to superpose segments from the storage means 62 according to

The sum being limited to index values i for which

;
in principle, any number of index values may contribute to the sum at one time point
t. But when the pitch is not changed by more than a factor of 3/2, at most 3 index
values will contribute at a time. By way of example, therefore, Figures 3 and 7 show
an apparatus which provides for only three active indices at a time; extension to
more than three segments is straightforward.
[0029] For addressing the segments, the combination means 66 are quite similar to the input
side: they comprise three counters 101, 102, 103 (clocked at a fixed rate), outputting
the time point values t-T
i for the three segment signals. The three counters receive the same trigger signal,
which triggers loading of minus the desired output pitch interval in the first of
the three counters 101. Upon the trigger signal the last position of the first counter
101 is loaded into the second counter 102, and in the third counter 103 the last position
of the second counter 102 is loaded. The trigger signal is generated by a comparator
104, which detects zero crossing of the first counter 101. The trigger signal also
updates indexing means 106.
[0030] The indexing means address the segment slot numbers which must be read out and the
counters address the position within the slots. The counters and indexing means address
three segments, which are output from the storage means 62 to the summing means 64
in order to produce the output signal.
[0031] By applying desired pitch interval values at the pitch control input 68a, one can
thus control the pitch value. The duration of the speech signal is controlled by a
duration control input 68b to the indexing means. Without duration manipulation, the
indexing means simply produce three successive segment slot numbers. At the trigger
signal, the values of the first and second output are copied to the second an third
output respectively, and the first output is increased by one. When the duration is
increased, the first output is kept constant once every so many cycles, as determined
by the duration control input 68b. To decrease the duration, the first output is increased
by two every so many cycles. The change in duration is determined by the net number
of skipped or repeated indices. When the apparatus is used to change the pitch and
duration of a signal independently (for example changing the pitch and keeping the
duration constant), the duration input 68b should be controlled to give a net frequency
F at which indices should be skipped or repeated according to

(D being the factor by which the duration is changed, t being the pitch period length
of the input signal and T being the period length of the output signal; a negative
value of F corresponds to skipping of indices, a positive value corresponds to repetition).
[0032] Figure 3 only provides one embodiment by way of example. The principal point is the
incremental placement of windows at the input side with a phase determined from the
phase of a previous window. There are many ways of generating the addresses for the
storage means 62, of which Figure 5 is but one. For example, the addresses may be
generated using a computer program, and the starting addresses need not have the values
given in the example.
[0033] Figure 3 can be implemented in various ways, for example using digital samples at
input 60, where the sampling rate has at any convenient value, for example 10000 samples
per second; conversely, it may use continuous signal techniques, where the clocks
81, 82, 101, 102, 103 provide continuous ramp signals, and the storage means provide
for continuously controlled access like a magnetic disk. Furthermore, in Figure 3
in practice segment slots may be reused after some time, as they are not needed permanently.
Not all components of Figure 4 need to be implemented by discrete function blocks:
often it may be implemented in whole or part by a computer.
1. A method for manipulating an audio equivalent signal, comprising:
- positioning of a chain of mutually overlapping time windows with respect to the
audio equivalent signal, wherein a positional displacement between adjacent windows
substantially corresponds to a principal period as based on periodicity measurements
on said audio equivalent signal,
- forming segment signals Si each deriving from the audio equivalent signal through weighting with a window function
of the associated window Wi; and
- synthesizing an audio output signal by chained superposition of the segment signals,
characterized:
- in that the step of positioning the chain of mutually overlapping time windows includes
shifting each window Wi with respect to a previous window Wi-1 in the chain over an actual pitch period length Li of said audio equivalent signal, where the window Wi has a window function formed by linearly stretching a first half of a normalised
window function by Li and a second half of the normalised window function by Li+1; and
- in manipulating a duration of said output signal through systematically repeating,
maintaining, and/or suppressing said segment signals, to a predetermined length of
pictorial material corresponding to said audio equivalent signal, where said length
differs from a duration of said audio equivalent signal.
2. A method as claimed in Claim 1, wherein said predetermined length applies to a plurality
of speech equivalent signals in parallel that correspond in content but have differences
in representation.
3. A method as claimed in Claim 2 wherein said differences originate from said plurality
of audio equivalent signals being in as many different languages.
4. A method as claimed in Claim 1, 2 or 3, wherein said predetermined length pertains
to an intermission between non-manipulated audio equivalent signals.
5. A method as claimed in any of Claims 1 to 4 for post-synchronizing human speech as
featured by a video representable item.
6. A method for producing a software title from predetermined pictorial material and
at least one corresponding audio equivalent signal; the method comprising:
manipulating the audio equivalent signal, by positioning of a chain of mutually overlapping
time windows with respect to the audio equivalent signal, as based on periodicity
measurements on said audio equivalent signal, and wherein a positional displacement
between adjacent windows substantially corresponds to a principal period of said periodicity;
deriving segment signals from the audio equivalent signal through weighting with the
associated window function; and synthesizing an audio output signal by chained superposition
of said segment signals, wherein a duration of said audio output signal is manipulated
to a predetermined length of the pictorial material through systematically repeating,
maintaining, and/or suppressing said segment signals, where said length differs from
a duration of said audio equivalent signal; and
storing the pictorial material and the resulting audio output signal in a unitary
storage medium for synchronised playback.
7. An apparatus for manipulating an audio equivalent signal; the apparatus comprising:
means for positioning a chain of mutually overlapping time windows with respect to
the audio equivalent signal, as based on periodicity measurements on said audio equivalent
signal, by shifting each window Wi with respect to a previous window Wi-1 in the chain over an actual pitch period length Li of said audio equivalent signal, where the window Wi has a window function formed by linearly stretching a first half of a normalised
window function by Li and a second half of the normalised window function by Li+1; and
means for deriving segment signals from the audio equivalent signal through weighting
with the associated window function; and
means for synthesizing an audio output signal by chained superposition of said segment
signals by manipulating a duration of said output signal to a predetermined length
of pictorial material corresponding to said audio equivalent signal through systematically
repeating, maintaining, and/or suppressing said segment signals, where said length
differs from a duration of said audio equivalent signal.
1. Verfahren zur Manipulation eines audioäquivalenten Signals, das folgendes umfasst:
- Positionieren einer Kette von sich gegenseitig überlappenden Zeitfenstern in Bezug
auf das audioäquivalente Signal, wobei eine Positionsverschiebung zwischen benachbarten
Fenstern im wesentlichen einer Hauptperiode auf der Basis von Periodizitätsmessungen
am genannten audioäquivalenten Signal entspricht;
- Bilden von Segmentsignalen Si , die jeweils durch Gewichtung mit einer Fensterfunktion des zugehörigen Fensters
Wi von dem audioäquivalenten Signal abgeleitet werden;
- Synthetisieren eines Audioausgangssignals durch verkettete Überlagerung der Segmentsignale,
dadurch gekennzeichnet,
- dass der Schritt des Positionierens der Kette von sich gegenseitig überlappenden
Zeitfenstern das Verschieben jedes Fensters Wi in Bezug auf ein vorhergehendes Fenster Wi-1 in der Kette um eine tatsächliche Tonhöhenperiodenlänge Li des genannten audioäquivalenten Signals umfasst, wo das Fenster Wi eine Fensterfunktion hat, die durch lineares Dehnen einer ersten Hälfte einer normalisierten
Fensterfunktion um Li und einer zweiten Hälfte der normalisierten Fensterfunktion um Li+1 gebildet wird; und
- dass eine Dauer des genannten Ausgangssignals durch systematisches Wiederholen,
Beibehalten und/oder Unterdrücken der genannten Segmentsignale so manipuliert wird,
dass eine vorgegebene Länge des bildlichen Material dem genannten audioäquivalenten
Signal entspricht, wo die genannte Länge von einer Dauer des genannten audioäquivalenten
Signals abweicht.
2. Verfahren nach Anspruch 1, wobei sich die genannte Länge auf eine Vielzahl von parallelen
sprachäquivalenten Signalen bezieht, die einander im Inhalt entsprechen, jedoch Unterschiede
in der Darstellung aufweisen.
3. Verfahren nach Anspruch 2, wobei die genannten Unterschiede darauf zurückzuführen
sind, dass die genannte Vielzahl von audioäquivalenten Signale in entsprechend vielen
verschiedenen Sprachen vorliegt.
4. Verfahren nach Anspruch 1, 2 oder 3, wobei die genannte vorgegebene Länge zu einer
Unterbrechung zwischen nicht-manipulierten audioäquivalenten Signalen gehört.
5. Verfahren nach einem der Ansprüche 1 bis 4 zum Nachsynchronisieren von menschlicher
Sprache, wie sie durch ein per Video darstellbares Element gegeben ist.
6. Verfahren zum Erzeugen eines Software-Titels aus vorgegebenem bildlichen Material
und zumindest einem entsprechenden audioäquivalenten Signal, wobei das Verfahren folgendes
umfasst:
Manipulieren des audioäquivalenten Signals durch Positionieren einer Kette von sich
gegenseitig überlappenden Zeitfenstern in Bezug auf das audioäquivalente Signal auf
der Basis von Periodizitätsmessungen am genannten audioäquivalenten Signal, und wobei
eine Positionsverschiebung zwischen benachbarten Fenstern im wesentlichen einer Hauptperiode
der genannten Periodizität entspricht; Ableiten von Segmentsignalen von dem audioäquivalenten
Signal durch Gewichten mit der zugehörigen Fensterfunktion; und Synthetisieren eines
Audioausgangssignals durch verkettete Überlagerung der genannten Segmentsignale, wobei
eine Dauer des genannten Audioausgangssignals auf eine vorgegebene Länge des bildlichen
Materials manipuliert wird, indem die genannten Segmentsignale systematisch wiederholt,
beibehalten und/oder unterdrückt werden, wo die genannte Länge von einer Dauer des
genannten audioäquivalenten Signals abweicht; und
Speichern des bildlichen Materials und des resultierenden Audioausgangssignals in
einem einheitlichen Speichermedium zur synchronisierten Wiedergabe.
7. Gerät zum Manipulieren eines audioäquivalenten Signals, wobei das Gerät folgendes
umfasst:
Mittel zum Positionieren einer Kette von sich gegenseitig überlappenden Zeitfenstern
in Bezug auf das audioäquivalente Signal auf der Basis von Periodizitätsmessungen
am genannten audioäquivalenten Signal, indem jedes Fenster Wi in Bezug auf ein vorhergehendes Fenster Wi-1 in der Kette um eine tatsächliche Tonhöhenperiodenlänge Li des genannten audioäquivalenten Signals verschoben wird, wo das Fenster Wi eine Fensterfunktion hat, die durch lineares Dehnen einer ersten Hälfte einer normalisierten
Fensterfunktion um Li und einer zweiten Hälfte der normalisierten Fensterfunktion um Li+1 gebildet wird; und
Mittel zum Ableiten von Segmentsignalen von dem audioäquivalenten Signal durch Gewichten
mit der zugehörigen Fensterfunktion; und
Mittel zum Synthetisieren eines Audioausgangssignals durch verkettete Überlagerung
der genannten Segmentsignale durch Manipulieren einer Dauer des genannten Audioausgangssignals
auf eine vorgegebene Länge des bildlichen Materials, die dem genannten audioäquivalenten
Signal entspricht, indem die genannten Segmentsignale systematisch wiederholt, beibehalten
und/oder unterdrückt werden, wo die genannte Länge von einer Dauer des genannten audioäquivalenten
Signals abweicht.
1. Procédé de manipulation d'un signal audio équivalent, comprenant : le positionnement
d'une chaîne de fenêtres temporelles se chevauchant mutuellement par rapport au signal
audio équivalent, dans lequel un déplacement de position entre des fenêtres adjacentes
correspond sensiblement à une période principale sur la base de mesures de périodicité
sur ledit signal audio équivalent;
- la formation de signaux segmentés Si, chacun dérivant du signal audio équivalent par pondération avec une fonction de
fenêtre de la fenêtre associée Wi; et la synthèse d'un signal audio de sortie par superposition chaînée des signaux
segmentés, caractérisé en ce que :
- l'étape de positionnement de la chaîne de fenêtres temporelles se chevauchant mutuellement
comprend le décalage de chaque fenêtre Wi par rapport à une fenêtre précédente Wi-1 dans la chaîne sur une longueur de période tonale réelle Li dudit signal audio équivalent, la fenêtre Wi ayant une fonction de fenêtre formée par extension linéaire d'une première moitié
d'une fonction de fenêtre normalisée par Li et une seconde moitié de la fonction de fenêtre normalisée par Li+1; et
- la manipulation d'une durée dudit signal de sortie se fait par répétition, maintien
et/ou suppression systématiques desdits signaux segmentés, à une longueur prédéterminée
du matériau pictural correspondant audit signal audio équivalent, ladite longueur
différant d'une durée dudit signal audio équivalent.
2. Procédé selon la revendication 1, dans lequel ladite longueur prédéterminée s'applique
à une pluralité de signaux équivalents vocaux en parallèle dont le contenu correspond,
mais ont des différences de représentation.
3. Procédé selon la revendication 2, dans lequel lesdites différences proviennent de
ladite pluralité de signaux audio équivalents qui sont dans autant de langues différentes.
4. Procédé selon la revendication 1, 2 ou 3, dans lequel ladite longueur prédéterminée
appartient à une pause entre les signaux audio équivalents non manipulés.
5. Procédé selon l'une quelconque des revendications 1 à 4 pour la postsynchronisation
de la parole humaine, telle qu'elle est caractérisée par un article vidéo représentable.
6. Procédé de production d'un titre de logiciel à partir d'un matériel pictural prédéterminé
et d'au moins un signal audio équivalent correspondant, le procédé comprenant :
la manipulation du signal audio équivalent en positionnant une chaîne de fenêtres
temporelles se chevauchant mutuellement par rapport au signal audio équivalent, sur
la base de mesures de périodicité sur ledit signal audio équivalent, un déplacement
de position entre des fenêtres adjacentes correspondant sensiblement à une période
principale de ladite périodicité, l'obtention de signaux segmentés à partir du signal
audio équivalent par pondération avec la fonction de la fenêtre associée; et la synthèse
d'un signal audio de sortie par superposition chaînée desdits signaux segmentés, une
durée dudit signal de sortie audio étant manipulée sur une longueur prédéterminée
du matériau pictural par répétition, maintien et/ou suppression systématiques desdits
signaux segmentés, ladite longueur différant d'une durée dudit signal audio équivalent,
et
le stockage du matériel pictural et du signal de sortie audio obtenu dans un support
de stockage unitaire pour une reproduction synchronisée.
7. Appareil pour manipuler un signal audio équivalent, l'appareil comprenant :
des moyens pour positionner une chaîne de fenêtres temporelles se chevauchant mutuellement
par rapport au signal audio équivalent, sur la base de mesures de périodicité sur
ledit signal audio équivalent, en décalant chaque fenêtre Wi par rapport à une fenêtre précédente Wi-1 dans la chaîne sur une longueur de période tonale réelle Li dudit signal audio équivalent, la fenêtre Wi ayant une fonction de fenêtre formée par extension linéaire d'une première moitié
d'une fonction de fenêtre normalisée par Li et une seconde moitié de la fonction de fenêtre normalisée de Li+1, et
des moyens pour obtenir des signaux segmentés à partir du signal audio équivalent
par pondération avec la fonction de fenêtre associée, et
des moyens pour synthétiser un signal audio de sortie par superposition chaînée desdits
signaux segmentés par manipulation d'une durée dudit signal de sortie à une longueur
prédéterminée du matériel pictural correspondant audit signal audio équivalent par
répétition, maintien et/ou suppression systématiques desdits signaux segmentés, ladite
longueur différant d'une durée dudit signal audio équivalent.