[0001] This application relates to an improved method and apparatus for comparing signals.
The method and apparatus relate in particular to the field of broadcasting and systems
for monitoring the content of the broadcast signal.
[0002] In the broadcasting industry, it is common for a central programme originator to
relay programmes over a private network for broadcasting at regional centres. The
regional centres can then add local programmes to the received programme and transmit
this for reception in their catchment area. This system means that in most cases a
strong signal can be provided to a viewer, as the signal is transmitted from a local
receiver, rather than a more distant central receiver. It also allows the signal to
be adapted to the region in which it is received, for example, the insertion of local
news programmes after national news programmes.
[0003] Similarly, the private network circuits can be used to relay signals in the opposite
direction allowing regional centres to contribute content to other catchment areas,
such as to the region of the central programme originator, or other regional areas.
[0004] The private circuits relaying the programmes for broadcast can fail however, either
partially or totally, preventing broadcast signals from being transmitted altogether,
or misrouting signals such that a different signal is received at a location than
the signal that was intended. We have appreciated that such failures need to be detected,
and need to be detected in real time.
[0005] Detection of such failures can be achieved by comparison of the signal being transmitted
by the central programme originator with that being transmitted at the regional centre.
However, the comparison is complicated by a number of problems inherent in the broadcasting
system, such as timing delay.
[0006] Typically, the broadcast signal being transmitted at the regional centre will lag
behind that transmitted by the programme originator by up to a few seconds. This is
partly due to the inherent delay resulting from the transmission of the broadcast
signal to the regional centre over the private circuits or broadcast chain, known
as the "signature chain delay". MPEG coding/decoding processing time for digital signals
can also have a delaying effect. For example, analogue signals are typically delayed
by around 100ms, and digital signals by 1 to 2 seconds.
[0007] In order to perform any comparison therefore the two signals must be synchronised.
As there is no inherent timing structure within audio data, such as radio broadcasts,
synchronising audio signals can be difficult.
[0008] Another problem arises from limited network capacity. If comparison of the two signals
is to take place, either the signals themselves, or information about the signals
must eventually be routed to the same location to be compared. This can be expensive
in terms of network capacity.
[0009] United States Patent number 4,230,990 describes a system and method for identifying
broadcast programs, wherein a pattern recognition process is combined with a signalling
event which acts as a trigger or cue signal. A segment of each programme at a predetermined
location with respect to one of these cue signals is sampled and processed to form
the programme's reference signature which is stored in computer memory. In the field,
the monitoring equipment detects cue signals broadcast by a monitored station and,
upon detection samples the broadcast program signal at the same predetermined location
to create a broadcast signature of unknown programme identity. By comparing broadcast
signatures to reference signatures, a computer identifies the broadcast of programmes
whose reference signatures have been stored in memory.
SUMMARY OF THE INVENTION
[0010] The invention is defined in the appendant claims to which reference should now be
made. Advantageous features are set forth in the dependent claims.
[0011] The invention provides a method and apparatus for determining the relative time difference
or delay between first and second audio signals that represent substantially the same
audio content. The invention also provides a method and an apparatus for determining
whether two audio signals contain the same audio content.
[0012] A comparison of the two audio signals is carried out using a low-bit representation
of each signal that is generated using the dominant frequency within successive portions
or frames of the signal. This audio representation can also be used as a means of
comparing two video programmes of which the audio signal is a part.
[0013] As the analysis is based on the frequencies present within the audio signal, it can
for example, be performed more quickly than an analysis based on the energy of the
audio signal. As a result, the determination can be made in real time, allowing it
to be advantageously used in the field of broadcasting to confirm that a signal being
transmitted by a regional broadcasting centre is in accordance with the master signal
being sent to it by a programme originator.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] The invention will now be described in detail, by way of example, and with reference
to the drawings in which:
Figure 1 is a flow chart illustrating the operation of the preferred system for performing
comparison of audio signals in accordance with the preferred embodiment of the invention;
Figure 2 illustrates the method of dividing an audio signal into frames for analysis in accordance
with the method illustrated in Figure 1;
Figure 3 illustrates the step of dividing the audio signal into overlapping frames;
Figure 4 illustrates an example representation of a first audio signal;
Figure 5 illustrates an example representation of a second audio signal, the representation
being shorter than the first;
Figure 6 illustrates a schematic representation of correlation results of the first and second
audio signal;
Figure 7 illustrates actual correlation results produced by experiment for two signals containing
the same audio content;
Figure 8 illustrates actual correlation results produced by experiment for two unrelated signals;
Figure 9 schematically illustrates the generation of a successive representation of the first
audio signal;
Figure 10 illustrates apparatus according to a first preferred embodiment of the invention;
Figure 11 illustrates apparatus according to a second preferred embodiment of the invention;
Figure 12 illustrates an embodiment employing a single computer terminal;
Figure 13 illustrates an embodiment employing a remote computer terminal.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
[0015] The preferred system provides a method and apparatus for comparing two audio signals
and determining whether they are substantially the same, that is whether they contain
the same audio content. The system has particular application to the broadcast industry
and allows a programme originator to verify that regional centres are transmitting
the correct programmes. If they are not, then the private circuits used by the programme
originator to transmit broadcast information to the regional centres may be at fault.
Thus, any problems with the circuits, such as complete failure, or mis-routing may
be identified and addressed.
[0016] Although the method compares audio signals, it will be understood that the technique
provided by the preferred system is not limited to comparison of radio broadcast programmes,
but can also be used to compare the audio parts of video signals.
[0017] Furthermore, the preferred system actually compares a low-bit rate representation
of the original audio signal. Transmission of the representation to another location
on the network for comparison is less costly in terms of network capacity than if
the original audio signal was transmitted. The saving in network capacity is particularly
germane if the representation is of the audio corresponding to a video broadcast,
as it allows the comparison of two video broadcasts to be achieved while only ever
transmitting a considerably paired down version of the original signal.
[0018] The operation of the preferred system will now be described in detail with reference
to Figures 1 to 13 of the drawings.
[0019] Figure 1 is a flow chart illustrating the steps preformed by the preferred system.
The system compares two audio signals: a first audio signal, such as the local signal
transmitted by a regional broadcasting centre, and a second audio signal, such as
a national signal being transmitted to the regional broadcasting centre from a national
broadcasting centre or programme originator.
[0020] In practice, these steps may be embodied by software running on a computer, or by
equivalent hardware such as dedicated circuits. Operation begins therefore in step
S10 which represents the initialisation of the software or circuits respectively.
[0021] In step S20, the raw audio data of the first audio signal is captured, that is the
signal that is to be compared to the master signal. Preferably, this is achieved by
sampling the original audio signal. As the capture will be occurring in real time,
the system feeds the captured audio data directly into a buffer for storage. The buffer
need only be large enough to store a few seconds of audio data, as the timing delay
between the signals being compared is typically less than a few seconds. If larger
synchronisation delays are expected then the buffer will need to be larger to accommodate
sufficient data. In practice, the preferred system has been found to tolerate a delay
of about 2s in the two signals being compared. This is sufficient to handle most digital
signals. The maximum delay that the system can accommodate however is limited only
by the size of the buffers in which the audio signal and signatures are stored and
the processing power of the computer.
[0022] Once a sufficient amount of the audio signal has been captured, control flows to
step S30 where processing occurs to generate a signature or representation of the
captured audio data. The buffer need not be full before the processing starts, as
the processing and the capture of audio data can occur simultaneously. Also, although
a predetermined amount of data is preferably stored before processing begins, it is
possible that the processing begin almost immediately after the first data points
are stored in the buffer. In the preferred system however, the processing does not
begin until the buffer is full.
[0023] The processing of the audio data to generate the signature or representation will
now be explained in more detail with reference to Figures 2 to 4.
[0024] Figure 2 shows a block representing the audio data of the first audio signal captured
in step S20. For purposes of illustration, the audio data is taken to comprise 17408
samples. The x axis of the data block will be understood to represent time, and the
y axis, although no variation in the data is actually shown, will be understood to
represent the amplitude of the audio signal at a point in time.
[0025] Firstly, the preferred system breaks the audio data down in smaller frames of data
for analysis. It can be seen that the block of data containing 17408 samples shown
in Figure 2 can be subdivided into 17 frames of 1024 samples each.
[0026] The preferred system processes the data block by extracting the dominant frequency
from each frame using a Fast Fourier Transform (FFT). This is a conventional technique
known in the art and so will not be discussed further here. Other known techniques
could also be used. Thus, each frame containing 1024 samples of audio data is converted
into a single data point representing the dominant frequency of that frame. The collection
of the data points from all of the frames is used as a signature or representation
of the original audio data.
[0027] To some extent, the shape of the signature or representation generated in this way
will depend on the number of frames used and the starting position of the frames within
the audio signal. This can cause later problems for recognition of an audio signature
particularly in cases where the audio content of the signal varies quickly, such as
for rock music for example.
[0028] The effect of starting the frames at different positions in the audio signal is exacerbated
by the delay in time between the two signals being compared. Thus, in the preferred
system, the audio data is broken down into a larger number of overlapping frames,
as this provides greater tolerance against synchronisation problems and quickly varying
audio content.
[0029] Figure 3 shows an illustration of the audio data broken down into overlapping frames.
Four overlapping frames each containing 1024 samples are shown, displaced from each
other by 256 samples. As a result, the audio data of 17408 samples is in fact broken
down into 65 frames, and is therefore represented by 65 dominant frequency data points.
[0030] An example signature is illustrated in Figure 4 for the first audio signal, that
is the one which is being compared to the master signal. The signature contains M
signature points. In practice, it is preferred if M is of the order of 2000. It should
be remembered that in this diagram, the x axis represents time, measured in frame
number, and the y axis represents the dominant frequency.
[0031] The data rate of the audio signature shown in Figure 4 can be calculated from the
following equation.

[0032] The first term on the right hand side of this equation can be understood as the number
of data points or samples of the original audio signal, that is the block size, divided
by the frame size, which is given by the number of samples per frame, multiplied by
the overlap co-efficient, or the number of frames per frame width. In the above example
shown in Figure 3, the overlap co-efficient is 4, as for every frame width of size
1024 samples, there are four frames, beginning at sample 1, 257, 513 and 769 respectively.
The next frame (the fifth) will then start at sample 1025.
[0033] The last term in this equation represents the fact that the last frame in the audio
data block cannot be sub-divided into further frames, because those further frames
will lie at least partially beyond the end of the data block. The number of such frames
produced by sub-dividing the last frame in the data block is given by the overlap
co-efficient minus one. In the above example, this value is therefore 3.
[0034] The number of samples in the audio data block, given by the sampling rate of the
audio data block and the data block size, the FFT frame size and the overlap ratio
are all variable, and can be chosen according to the application and the computer
processing power available. If the sampling rate or the overlap ratio are chosen to
be too high, the computer processor may not be able to perform the analysis quickly
enough, and the system will cease to work in real time. It has been found that for
a 1 GHz computer processor, in this case an Intel Pentium III, a sampling rate of
greater than 22 KHz and an overlap ratio of 8 or greater, are too high for the computer
to reliably function for a frame size of 1024 samples. Increasing the size of the
FFT frame, to 8196 samples say, allows the computer to function again with this sampling
rate and overlap ratio. However, increasing the frame size is not desirable, as it
can lead to problems synchronising the two audio signals, particularly for certain
types of fast music. This is because the dominant frequency for a short block of signal
may not be the same as the dominant frequency for a longer block of signal.
[0035] The preferred system employs an overlap ratio of 8, a sampling rate of 11.025 KHz,
an FFT frame size of 1024 samples and an audio data block size of 17408 samples. In
practice however, the overlap coefficient may be varied from 4 to 32, with the sampling
frequency being varied accordingly between 8kHZ and 44kHz. For processors faster than
1GHZ a higher overlap coefficient or sampling frequency may be used.
[0036] The results of the sampling step S30 is to produce an audio signature of the type
illustrated in Figure 4.
[0037] Reference shall now be made again to Figure 1. Following the generation of the audio
signature in step S30, the signature is stored in a buffer S40 ready for comparison.
If the comparison method is being employed at a regional broadcasting centre to detect
reliable transmission of a signal received from the programme originator, the first
audio signal stored in the buffer in step S40 will be the local broadcast signal being
transmitted; the second audio signal that is to be compared with the first, will be
the signal transmitted to the regional centre from the national broadcasting centre.
[0038] In any case, in order to perform the comparison, an audio signature of the second
audio signal must also be generated. The signature or representation of the second
signal is generated in the same way as described above for first audio signal except
that the representation is deliberately made for a smaller section of the second signal
than the first. The first signal representation is therefore longer, in that it contains
more samples than the second representation, and in that it therefore represents a
longer period in the time domain. The second signature is illustrated in Figure 5
and contains N samples. In practice, a typical figure for N might be 340.
[0039] In the case of a broadcast network, the audio signature representing the second audio
signal is generated at the national broadcasting centre. This signal is then transmitted
through the server computer of the regional television centre via an IT network. For
this reason a processing step for the second signal is not shown in Figure 1. Instead,
the second signature is received in step S50.
[0040] A separate IT network is preferred as techniques can be employed to detect faults
such as those caused by mutual mis-routing. Such faults can be differentiated from
faults at source. It is also possible that the audio signature could be transmitted
to the regional broadcasting centre with the regional audio signal. However if there
is a fault in the private circuits routing the original audio signal, the audio signature
will not be received at the regional broadcasting centre and no comparison can then
be made.
[0041] Once the audio signature of the source signal has been obtained in step S50, comparison
of the two audio signatures is performed, S60, and a judgment is made as to whether
the two signatures represent the same audio content. Of course, because of the difference
in size of the two signatures, the audio content will not be exactly the same; instead
the two signals are analysed to determine if the second signature or representation
is contained within the first.
[0042] This is achieved by using a standard correlation or cross-correlation technique,
applied in a particular way. Figure 6 schematically illustrates the process. The top
row of the figure shows the signature (n) of N signature points representing the second
audio signal. The middle row represents the longer sequence of M (>N) signature points
of the audio signature (m) for the local audio signal, and the bottom row shows the
cross-correlation results of the signatures against each other for different relative
time displacements.
[0043] The top two rows of the figure showing the signatures for the two signal sections
are plotted against time on the x axis, as each point of the plot represents the dominant
frequency in a frame of the audio signal. The bottom row of the figure however, showing
the correlation results, is plotted on the x axis (labelled D) against relative displacement
of the two signatures (n) and (m) against each other. The first point at position
D=0 is the result of the correlation when the beginning of the signature n is aligned
with the beginning of signature m. The next point is obtained when the signature n
is shifted to the right by one signature point in comparison to the beginning of the
signature m and the correlation is performed again. The last point plotted on the
axis D is given by the correlation result when the last signature point of signature
n is aligned with the last signature point of signature m.
[0044] The cross-correlation is calculated as though the sequences of N and M signature
points were both continuous wave forms expressed as a series of regular digitized
samples. The cross-correlation result is also shown as if it were a continuous wave
form, although it will be appreciated that it actually consists of M-N+1 discrete
values.
[0045] If a standard cross-correlation technique is employed, then the height of the cross-correlation
wave form at a particular point is a function of the integral of the product of the
second signature and the part of the first signature which it overlaps when the second
signature is aligned with that particular point. Assuming that the second signature
is contained within the first, the position of the peak of the cross-correlation wave
form should occur when the second shorter signature (n) is aligned with an identical
region in the longer first signature (m). This should occur at only one point in the
first signature.
[0046] To find a match between the first and second signature therefore, the preferred system
detects the maximum value of the correlation wave form. The number of peaks contained
within the correlation wave form gives an indication of how likely it is that the
maximum peak represents the point at which the first and second signatures match.
So, in order to determine a measure of the reliability of the match, the number of
points where the value exceeds two-thirds of the maximum value are calculated. Providing
the ratio between this number and the total number of points is lower than a certain
predetermined value, it can be assumed that there is one clear and strong peak in
the correlation results. The software will then deem that there is a match between
the two signatures. This is illustrated in Figure 6 which shows two peaks of a height
exceeding the threshold of two thirds of the maximum height. In this case a match
is indicated to occur at the peak on the right of the plot.
[0047] The key element of the technique employed in the preferred system is the difference
in length between the two audio signatures that are cross-correlated. Were they to
be the same length or close in length, the number of points (M-N+1) in the cross-correlation
would be one or close to zero. This would make the correlation process unreliable
and intolerant of any relative delay between the two signals.
[0048] Although, preferred values of M and N have been given, it will be appreciated that
they could take any values providing that the number of points in the correlation
(M-N+1) is sufficiently large to provide a reliable result. In practice, a minimum
value for (M-N+1) of about 200 points has been found acceptable, although values of
1000 to 2000 for (M-N+1) are preferred.
[0049] Figure 7 shows the results of a correlation performed between two audio signatures
in actual experiments. The results of four trials are plotted on the same axis; the
first three trials lead to points centred at positions D=70, 71, and 72. The fourth
trial results in a peak at position D=140. The difference in the position at which
the peaks occur is caused by differences in timing between the two signals being compared,
such as those caused by transmission for example.
[0050] Figure 8 on the other hand shows a correlation wave form produced by two audio signatures
which represent different audio signals. The graph is plotted on approximately the
same horizontal scale, and shows the region of the correlation plot from point 600
to 800. This range has been chosen for illustration purposes.
[0051] As can be seen from the drawing, the wave form contains a large number of unrelated
and sporadic peaks, none of which could reliably be taken to indicate the presence
of a match. This diagram illustrates the need to count the number of peaks for a given
sample size in order to determine the quality of the plot.
[0052] The results of the correlation are output in step S70 indicating whether a match
was found or not. If a match was found, the relative delay between the two signals
may also be output. This could be used in a synchronising method for example, where
one of the signals is a master or timing signal and the other is the signal to be
synchronised. Using the relative delay calculated for one signal to the other, synchronisation
could be achieved by calculating the phase shift or time shift required to align the
two signal in time and applying this to the signal that is to be synchronised.
[0053] It will be appreciated that the audio signatures generated for the audio signals
represent only a small fraction of the audio stream in a broadcast signal. Thus in
order to monitor the two signals and provide a continuous comparison, it is necessary
to repeat the procedure illustrated in steps S10 to S70 continuously or at least periodically
in real time. Thus, once a match has been found, those signature points of the first
audio signature which occur before the position at which the maximum peak of the correlation
wave form was found are discarded. The remaining points of the first audio signature
are then moved forward in the storage buffer, making room for new audio signature
points to be added. The new audio signature points may be appended to the existing
points in the buffer when sufficient points have been calculated to fill the remaining
space. Alternatively, the points may be added to the buffer as they are calculated.
[0054] Thus, a new audio signature is formed representing a new section of audio signal
starting approximately at the point corresponding to that at which the previous signatures
were matched. This is shown in Figure 9, which illustrates the portion of the first
audio signature that is discarded and the newly appended portion which is added to
the buffer to form a subsequent signature or representation of the audio signal moved
on in time.
[0055] The previous signature for the second signal, that is the signal from the programme
originator, is also discarded, and a new source signature representing the next portion
or region of the second audio signal is received for comparison. This new portion
may be that part of the signal following on directly in time from the previous signal
section. The comparison is then performed again based on the next sections of the
two signals, and a match can be expected to occur at a position in the first audio
signature given by one length of the second signature.
[0056] Each time a match is found not to occur, a flag is preferably set within the monitoring
computer or monitoring device. When the number of flags exceeds a predetermined threshold
value an alarm may be raised to indicate that there is no correlation between the
first and second audio signals. The threshold at which this occurs can be determined
in practice based on the difficulty in matching the signals and the amount of time
that is acceptable before a warning is given. The use of a threshold allows some tolerance
of non-matches which occasionally occur even for signals that are identical.
[0057] It will be appreciated that matching two signals is performed more easily for simple
signals. The system described above therefore works well when employed to match audio
signals representing speech, such as the audio stream from news programmes. However,
if the audio signals being matched represent a quickly varying rhythmic signal such
as rock music, determining that there is a match between the signals can take longer.
To ensure accurate correlation, it is preferable in the case of signals such as rock
music to increase the sampling or overlap co-efficient.
[0058] Figure 10 shows a schematic illustration of the preferred system in an implementation
for monitoring broadcast signals at a programme originator and at a destination transmitter.
The input audio source of the programme originator is first received at an input terminal
102, connected to a server or computer processor 104 at the programme originator's
location 106, and also to an audio network 108. The signal is passed to the computer
processor 104 which generates the audio signature of the master or original signal
and transmits it on IT network 110 to the location 112 of the destination transmitter.
At the same time, the audio source signal is transmitted on the private circuits of
audio network 108 to the same destination 112.
[0059] At the destination transmitter location 112, are a correlator 114 and a processor
or client computer 116. The processor 116 generates an audio signature of the signal
received on the audio network and passes this to the correlator 114. The correlator
114 receives the generated audio signature from the IT network 110, as well as the
signature generated by the processor 116 and performs the comparison method illustrated
in Figure 1. The results are output to output terminal 118, and may be routed back
to the programme originator or used at the destination transmitter.
[0060] Figure 11 shows an alternative embodiment in which the comparison is performed at
the location of the programme originator 106. The audio source signal is transmitted
from the input 102 to the computer processor 104 and to the audio network 108, and
the computer processor 104 generates an audio signature representing the original
audio source.
[0061] The destination processor 116 receives the audio source signal over the audio network
108 and generates an audio signature representing the received signal. This signature
is then transmitted back to the programme originator's location 106 for comparison
on correlator 120. The correlator 120 also receives representation of the audio signal
generated by processor 104. The correlator 120 compares the audio representation received
from the transmitter location 112 with audio signature it received from the processor
104 and the results are output to output 122 at the programme originator's location.
[0062] The signal being transmitted at a regional broadcasting centre will in most cases
differ from the original signal transmitted from the programme originator. This is
because any local content to be added to the received signal will be added at the
regional centre. As a result, it is preferable that during broadcasting of local content
the preferred system is deactivated. Otherwise, the system will report repeated occurrence
of signals that do not match.
[0063] Other embodiments of systems for monitoring audio content are also possible. Figure
12 for example shows an arrangement in which the comparison is done locally within
a single computer. Providing the computer has means to capture two audio signals,
such as two audio capture cards connected to audio inputs, the single computer can
prepare both of the audio signatures and perform the comparison.
[0064] Alternatively, as shown in Figure 13, the comparison may be performed by a third
party computer, which is adapted to receive inputs from first and second computers
which perform the audio capture.
[0065] The term 'capture' should also be understood to mean receive, as the signal could
be captured by any means known in the art and the transmitted to the computer for
comparison. In this case, the computer merely needs a receiver in order to receive
the already captured signal.
[0066] The preferred system therefore provides an effective way of comparing two signals
and determining whether they are the same. It will also be appreciated that the preferred
system provides an effective method and apparatus for determining the relative time
delay between two like signals. Once a match between the two signals has been determined,
the relative delay in timing between the two signals can be calculated. The comparison
process is repeated, ensuring that synchronization once obtained is tracked and maintained.
[0067] The preferred system provides an audio content monitoring system that is able to
work in real time on continuous signals, and that is able to match audio content in
the presence of delays even when the system has no previous knowledge of which signal
will arrive first. The system can react quickly and reliably to indicate any incorrect
audio content, and can regain the lock between the two signals once the incorrect
audio content has been corrected. The system is able to resist impairments to the
audio signal such as noise, coding/decoding artefacts while requiring no external
or internal synchronisation to operate.
1. A system for determining the relative time difference between first and second audio
signals that represent substantially the same audio content, the apparatus comprising:
capturing means for obtaining first and second audio signals;
processing means (104, 116) for generating a representation of sections of each of
the first and second signals, the representation being based on the frequencies present
within each section; the signal section and representation of the first audio signal
being longer than the signal section and representation of the second audio signal;
and
a correlator (114, 120) for correlating the representation of the first audio signal
section with the representation of the second signal section at different relative
timing differences, and for providing an output;
wherein the output indicates a match when the first and second audio signal sections
contain substantially the same audio content, and if there is a match, indicates the
timing difference between the points at which that audio content occurs in both the
first and second audio signal sections.
2. A system for detecting whether first and second audio signals represent substantially
the same audio content, the apparatus comprising:
capturing means for obtaining first and second audio signals;
processing means (104, 116) for generating a representation of sections of each of
the first and second signals, the representation being based on the frequencies present
within each section; the signal section and representation of the first audio signal
being longer than the signal section and representation of the second audio signal;
and
a correlator (114, 120) for correlating the representation of the first audio signal
section with the representation of the second signal section at different relative
timing differences, and for providing an output;
wherein the output indicates a match when the first and second audio signal sections
contain substantially the same audio content.
3. A system according to claims 1 or 2, wherein the processor is operable to divide the
first and second signal sections into a number of constituent signal frames, and to
generate the representations of the first and second signal sections using the dominant
frequency of each frame.
4. A system according to claim 3 wherein the processor is operable, in dividing the first
and second signal sections into constituent frames, to cause the frames within a signal
section to overlap with one or more adjacent frames in the signal section.
5. A system according to any preceding claim wherein if the correlator indicates that
there is a match, the processor is operable to generate a representation of subsequent
first and second audio signals sections from the first and second audio signals.
6. A system according to claim 5 wherein the processor is operable such that the subsequent
section from the first audio signal begins substantially at the point where the audio
content present in both the first and second signal sections was found to occur.
7. A system according to claims 6 or 7 wherein the processor is operable such that the
subsequent section from the second audio signal substantially begins at the end of
the section already obtained.
8. A system according to any preceding claim wherein the processor is operable to employ
a Fourier Transform in generating the representation of the first and second signal
sections.
9. A system for monitoring broadcast signals at a first and second location, wherein
broadcast signals are transmitted, over a network, from the second location to the
first location for transmission at the first location; the system comprising:
means for capturing a first audio signal from the broadcast signal transmitted at
the first location;
means for capturing a second audio signal from the broadcast signal transmitted at
the second location, the second audio signal being the broadcast signal transmitted
to the first location over the network (108);
processing means (104, 116) for generating a representation of a signal section of
each of the first and second audio signals based on the frequencies present within
each section; the signal section and representation of the first audio signal being
longer than the signal section and representation of the second audio signal;
a correlator (114, 120) for correlating the representation of the first audio signal
section with the representation of the second signal section at different relative
timing differences, and for providing an output; wherein the output indicates a match
when the first and second audio signal sections contain substantially the same audio
content.
10. A system according to claim 9 wherein the processing means includes a first processor
(116) at the first location for generating the representation of the first audio signal
section, and a second processor (104) at the second location for generating the representation
of the second audio signal section, and wherein the system comprises a second network
(110) connecting the first and second locations.
11. A system according to claim 10 wherein the correlator comprises a correlator (114)
located at the first location, and wherein the second processor (104) is operable
to transmit the representation of the second signal section to the correlator at the
first location via the second network (110).
12. A system according to claims 10 or 11 wherein the correlator comprises a correlator
(120) located at the second location, and wherein the first processor (116) is operable
to transmit the representation of the first signal section to the correlator (120)
at the second location via the second network (110).
13. A system according to claim 9 wherein the processing means comprise a single processor
for receiving both the first and second audio signals and for generating the representations
of the first and second audio signal.
14. A system according to claims 9 to 13 wherein if the correlator indicates that there
is a match, the processing means is operable to generate subsequent representations
of signal sections of the first and second audio signals.
15. A system according to claim 14 wherein the processing means is operable such that
the signal section used to generate the subsequent representation of the second signal
begins at the end of the previous signal section.
16. A system according to claims 14 or 15 wherein the processing means is operable such
that the subsequent signal section of the first audio signal substantially begins
at the point in the first audio signal where the match was found to occur.
17. A system according to claims 9 to 16 wherein the processing means is operable to employ
a Fourier Transform in generating the representation of the signal sections.
18. A system according to any of claims 9 to 17 wherein the processing means is operable
to divide the signal section into a number of constituent signal frames, and to generate
the representations of the signal section using the dominant frequency of each frame.
19. A system according to claim 18 wherein the processing means is operable, in dividing
the signal section into constituent frames, to cause the frames within the signal
section to overlap with one or more adjacent frames in the signal section.
20. A system according to any preceding claim wherein when the correlator is operable
to indicate that the signal sections do not contain the same audio content only after
a predetermined number of signal sections have been correlated.
21. Apparatus for monitoring broadcast signals at a first location, wherein broadcast
signals for transmission at the first location are received, over a network, from
a second location, and wherein a signal representation of a section of the broadcast
signal based on the frequencies contained within the broadcast signal is received
from the second location;
the system comprising:
means for capturing a first audio signal from the broadcast signal transmitted at
the first location;
processing means (104, 116) comprising a processor for receiving the first audio signal
and for generating a first representation of a signal section of the first audio signal
based on the frequencies present within the section; the first representation being
longer than the signal representation received from the second location;
a correlator (114, 120) for correlating the first representation with that received
from the second location at different relative timing differences, and for providing
an output;
wherein the output indicates a match when the signal sections of the first audio
signal and of the broadcast signal received from the second location contain substantially
the same audio content.
22. Apparatus according to claim 20 wherein if the correlator indicates that there is
a match, the processing means is operable to generate a representation of a subsequent
signal section from the first audio signal, and receive a signal representation of
a subsequent signal section from the second location.
23. Apparatus according to claim 22 wherein the processing means is operable to receive
from the second location a subsequent representation of the signal section beginning
at the end of the previous signal section.
24. Apparatus according to claims 22 or 23 wherein the processing means is operable such
that the subsequent signal section of the first audio signal begins at the point in
the first audio signal where the match was found to occur.
25. Apparatus according to claims 21 to 24 wherein the processing means is operable to
employ a Fourier Transform in generating the representation of the signal sections.
26. Apparatus according to claims 21 to 25 wherein the processing means is operable to
divide the signal section into a number of constituent signal frames, and to generate
the representations of the signal section using the dominant frequency of each frame.
27. Apparatus according to claim 26 wherein the processing means is operable, in dividing
the signal section into constituent frames, to cause the frames within the signal
section to overlap with one or more adjacent frames in the signal section.
28. Apparatus according to claims 21 to 27 wherein when the correlator is operable to
indicate that the signal sections do not contain the same audio content only after
a predetermined number of signal sections have been correlated.
29. A method for determining the relative time difference between first and second audio
signals that represent substantially the same audio content, the method comprising
the steps of:
capturing first and second audio signals;
generating a representation of sections of each of the first and second signals, the
representation being based on the frequencies present within each section; the signal
section and representation of the first audio signal being longer than the signal
section and representation of the second audio signal;
correlating the representation of the first audio signal section with the representation
of the second signal section at different relative timing differences; and
providing an output, wherein the output indicates a match when the first and second
audio signal sections contain substantially the same audio content, and if there is
a match, indicates the timing difference between the points at which that audio content
occurs in both the first and second audio signal sections.
30. A method for detecting whether first and second audio signals represent substantially
the same audio content, the method comprising the steps of:
capturing first and second audio signal;
generating a representation of sections of each of the first and second signals, the
representation being based on the frequencies present within each section; the signal
section and representation of the first audio signal being longer than the signal
section and representation of the second audio signal;
correlating the representation of the first audio signal section with the representation
of the second signal section at different relative timing differences;
and providing an output, wherein the output indicates a match when the first and second
audio signal sections contain substantially the same audio content.
31. A method according to claims 29 or 30, wherein the generating step comprises dividing
the first and second signal sections into a number of constituent signal frames, and
generating the representations of the first and second signal sections using the dominant
frequency of each frame.
32. A method according to claim 31 wherein the dividing step comprises dividing the first
and second signal sections into constituent frames such that the frames within a signal
section overlap with one or more adjacent frames in the signal section.
33. A method according to claims 29 to 32 comprising the step of, if the output indicates
that there is a match, generating representations of subsequent first and second audio
signals sections from the first and second audio signals.
34. A method according to claim 33 wherein the generating step includes generating a representation
of a subsequent section of the first audio signal beginning substantially at the point
where the audio content present in both the first and second signal sections was found
to occur.
35. A method according to claims 33 or 34 wherein the generating step includes generating
a representation of a subsequent section of the second audio signal substantially
beginning at the end of the section already obtained.
36. A method according to claims 29 to 35 wherein the generating step includes employing
a Fourier Transform.
37. A method for monitoring broadcast signals at a first and second location, wherein
broadcast signals are transmitted, over a network, from the second location to the
first location for transmission at the first location; the method comprising the steps
of:
capturing a first audio signal from the broadcast signal transmitted at the first
location;
capturing a second audio signal from the broadcast signal transmitted at the second
location, the second audio signal being the broadcast signal transmitted to the first
location over the network;
generating a representation of a signal section of each of the first and second audio
signals based on the frequencies present within each section; the signal section and
representation of the first audio signal being longer than the signal section and
representation of the second audio signal;
correlating the representation of the first audio signal section with the representation
of the second signal section at different relative timing differences; and
providing an output, wherein the output indicates a match when the first and second
audio signal sections contain substantially the same audio content.
38. A method according to claim 37 wherein the generating step comprises generating the
representation of the first audio signal section at the first location, and generating
the representation of the second audio signal section at the second location.
39. A method according to claim 38 comprising transmitting the representation of the second
signal section to the first location, and wherein the correlating step is performed
at the first location.
40. A method according to claims 38 or 39 comprising transmitting the representation of
the first signal section to the second location and wherein the correlating step is
performed at the second location.
41. A method according to claim 37 wherein the step of generating step representations
of the first and second audio signal sections is performed at a single location.
42. A method according to claims 37 to 41 comprising the step of, if the output indicates
that there is a match, generating subsequent representations of signal sections of
the first and second audio signals.
43. A method according to claim 42 wherein the generating step includes generating a representation
of a subsequent section of the first audio signal beginning substantially at the point
where the audio content present in both the first and second signal sections was found
to occur.
44. A method according to claims 41 or 43 wherein the generating step includes generating
a representation of a subsequent section of the second audio signal substantially
beginning at the end of the section already obtained.
45. A method according to claims 42 to 44 wherein the generating step includes employing
a Fourier Transform.
46. A method according to claims 37 to 45, wherein the generating step comprises dividing
the first and second signal sections into a number of constituent signal frames, and
generating the representations of the first and second signal sections using the dominant
frequency of each frame.
47. A method according to claim 44 wherein the dividing step comprises dividing the first
and second signal sections into constituent frames such that the frames within a signal
section overlap with one or more adjacent frames in the signal section.
48. A method according to any of claims 45 to 47 comprising the step of indicating that
the signal sections do not contain the same audio content only after a predetermined
number of signal sections have been correlated.
49. A method for monitoring broadcast signals at a first location, wherein broadcast signals
for transmission at the first location are received, over a network, from a second
location;
the method comprising the steps of:
capturing a first audio signal from the broadcast signal transmitted at the first
location;
receiving from the second location a signal representation of a section of the broadcast
signal based on the frequencies contained within the broadcast signal;
generating a first representation of a signal section of the first audio signal based
on the frequencies present within the section; the first representation being longer
than the signal representation received from the second location;
correlating the first representation with that received from the second location at
different relative timing differences; and
providing an output, wherein the output indicates a match when the signal sections
of the first audio signal and of the broadcast signal received from the second location
contain substantially the same audio content.
50. A method according to claim 49 comprising the steps of, if the output indicates that
there is a match, generating a representation of a subsequent signal section from
the first audio signal, and receiving a signal representation of a subsequent signal
section from the second location.
51. A method according to claim 50 wherein in the receiving step, a subsequent representation
of the signal section beginning substantially at the end of the previous signal section
is received.
52. A method according to claims 50 or 51 wherein the generating step includes generating
the representation the subsequent signal section of the first audio signal beginning
at the point in the first audio signal where the match was found to occur.
53. A method according to claims 49 to 52 wherein in the generating step a Fourier Transform
is employed in generating the representation of the signal sections.
54. A method according to claims 51 to 53, wherein the generating step comprises dividing
the first signal sections into a number of constituent signal frames, and generating
the representations of the first signal sections using the dominant frequency of each
frame.
55. A method according to claim 54 wherein the dividing step comprises dividing the first
signal section into constituent frames such that the frames within a signal section
overlap with one or more adjacent frames in the signal section.
56. A method according to any of claims 49 to 55 comprising the step of indicating that
the signal sections do not contain the same audio content only after a predetermined
number of signal sections have been correlated.
57. A computer software product for controlling a computer to determine the relative time
difference between first and second audio signals that represent substantially the
same audio content, the computer software product comprising a computer readable medium
having program code stored thereon which when executed on a computer causes the computer
to perform the steps of:
capturing first and second audio signals;
generating a representation of sections of each of the first and second signals, the
representation being based on the frequencies present within each section; the signal
section and representation of the first audio signal being longer than the signal
section and representation of the second audio signal;
correlating the representation of the first audio signal section with the representation
of the second signal section at different relative timing differences; and
providing an output, wherein the output indicates a match when the first and second
audio signal sections contain substantially the same audio content, and if there is
a match, indicates the timing difference between the points at which that audio content
occurs in both the first and second audio signal sections.
58. A computer software product for controlling a computer to detect whether first and
second audio signals represent substantially the same audio content, the computer
software product comprising a computer readable medium having program code stored
thereon which when executed on a computer causes the computer to perform the steps
of:
capturing first and second audio signals;
generating a representation of sections of each of the first and second signals, the
representation being based on the frequencies present within each section; the signal
section and representation of the first audio signal being longer than the signal
section and representation of the second audio signal;
correlating the representation of the first audio signal section with the representation
of the second signal section at different relative timing differences;
and providing an output, wherein the output indicates a match when the first and second
audio signal sections contain substantially the same audio content.