TECHNICAL FIELD OF THE INVENTION
[0001] This invention relates to sound synthesisers, and more specifically to sound synthesisers
used in devices where the computational resources are limited, such as in portable
devices.
BACKGROUND OF THE INVENTION
[0002] Modern sound synthesisers are required to have a large number of voices. The number
of voices a synthesiser has is defined as the number of sounds that can be generated
simultaneously.
[0003] There are several different protocols and standards that define how electronic sound
synthesisers reproduce a required set of sounds.
[0004] One popular way of generating sounds in electronic devices is by using the MIDI (Musical
Instrument Digital Interface) protocol. Unlike digital audio files, (such as those
found on compact disks) a MIDI file does not contain details of specific sounds. Instead,
the MIDI file contains a list of events that a device must perform in order to recreate
the correct sound. Sampled sounds are stored in the synthesiser and are accessed according
to the instructions contained in the MIDI file. Therefore, MIDI files can be much
smaller than digital audio files and are suited to an environment where storage memory
is limited.
[0005] In the General MIDI System Level 1 (GM-1), a synthesiser is required to have at least
24 voices.
[0006] Synthesisers, such as MIDI synthesisers, that generate sounds from pre-recorded sounds
are known as wave-table based synthesisers. In such a synthesiser, one or several
pre-recorded sequences of a musical instrument will be stored in a wave-table. Each
sequence will contain a series of samples that are played in order to recreate the
sound.
[0007] Often, a musical instrument can generate a high number of notes, and since sampling
and recording every possible note would require a lot of memory, only a few notes
are stored.
[0008] Therefore, when a synthesiser is required to produce a note or sound that has a frequency
that is different to one stored in the memory, the synthesiser uses one of the stored
sequences and a technique known as 'sample rate conversion' to re-sample it and change
the frequency and obtain the requested tone.
[0009] Changing the frequency of the stored sequence is achieved by accessing the stored
samples at different rates. That is to say, for example, if the stored samples represent
a musical note at a frequency of 300Hz, accessing every sample in turn will reproduce
the musical note at 300Hz. If each stored sample is output twice before the next stored
sample is read out, the note reproduced by the synthesiser will have a frequency of
150Hz. Similarly, if a note of 600Hz is required then every other stored sample is
read out.
[0010] It is important to note that the rate at which samples are output by the synthesiser
remains constant and is equal to one sample period (the time between each stored sample).
[0011] In the example above, by accessing every sample twice, artefacts (distortions) will
be introduced into the output sound. To overcome these distortions, the synthesiser
computes additional samples based on the stored samples. Therefore, in the 150Hz example
above, instead of repeating each stored sample twice, the synthesiser will output
one stored sample and calculate the next sample on the basis of the surrounding stored
samples.
[0012] To do this, the synthesiser requires an interpolation technique.
[0013] The simplest interpolation technique uses a weighted average of the two surrounding
samples. However, this technique is often inaccurate and still results in audible
distortions.
[0014] The optimum interpolation algorithm uses a sin(x)/x function and requires an infinite
number of calculations. Of course, this is impractical and therefore sub-optimum algorithms
have been developed.
[0015] One sub-optimum interpolation technique is described in Chapter 8 of "Applications
of DSP to Audio and Acoustics" by Dana C. Massie, where several stored samples are
used in the calculation (the number of samples used in the interpolation is known
as the interpolation degree). The larger the number of samples used in the interpolation,
the better the performance of the synthesiser.
[0016] In a synthesiser, each voice is implemented using one or several digital signal processors
(DSPs) and the computational power of the DSP system imposes a limit on the number
of voices that a synthesiser can produce, and also limits the interpolation degree
used for each voice.
[0017] When using a sub-optimum interpolation algorithm such as a truncated sin(x)/x algorithm,
the computational complexity grows linearly with the interpolation degree.
[0018] In many commercial synthesisers, an interpolation degree of 10 is often used as this
results in a good trade-off between computational complexity and sound quality.
SUMMARY OF THE INVENTION
[0019] It is desirable to be able to implement MIDI sound synthesisers in portable devices,
such as mobile phones, to allow the devices to produce polyphonic ring tones and higher
quality sounds.
[0020] However, the limits placed on computational power in a portable device (such as by
cost and available space in the device) are not sufficient to allow the implementation
of a sound synthesiser that conforms to the General MIDI System Level 1 (GM-1) (i.e.
having 24 voices) and has an interpolation degree of around 10.
[0021] The present invention therefore seeks to provide a sound synthesiser that reduces
the computational requirements of a synthesiser with a high degree of polyphony, while
keeping audible artefacts to a minimum.
[0022] Therefore, according to a first aspect of the present invention there is provided
a synthesiser that comprises a memory, containing a plurality of stored samples; means
for calculating an output signal for each of a plurality of active voices, using a
plurality of samples selected from the stored samples for each of the active voices;
wherein the number of samples used for each active voice by the means for calculating
depends upon the number of active voices.
[0023] Preferably, each voice is only able to compute one output at a time.
[0024] Preferably, the number of samples used for each active voice by the means for calculating
decreases as the number of active voices increases.
[0025] Preferably, the number of samples used for each active voice by the means for calculating
decreases as the number of active voices increases so that a maximum computational
complexity is not exceeded.
[0026] Alternatively, the number of samples used for each active voice by the means for
calculating decreases non-linearly as the number of active voices increases.
[0027] Preferably, the plurality of samples stored in the memory comprise samples of musical
notes.
[0028] Preferably, the plurality of samples stored in the memory comprise samples of musical
notes produced by different musical instruments.
[0029] According to a second aspect of the present invention, there is provided a portable
device that comprises a music synthesiser as described above.
[0030] Preferably, the portable device is a mobile telephone.
[0031] Alternatively, the portable device is a pager.
[0032] It should be noted that the term "comprises/comprising" when used in this specification
is taken to specify the presence of stated features, integers, steps or components
but does not preclude the presence or addition of one or more other features, integers,
steps, components or groups thereof.
BRIEF DESCRIPTION OF THE DRAWINGS
[0033] For a better understanding of the present invention and to show how it may be carried
into effect, reference will now be made by way of example to the accompanying drawings,
in which:
Figure 1 shows a sound synthesiser in accordance with the invention.
Figure 2 shows a method performed by the controller of Figure 1 in accordance with
the invention.
Figure 3 shows a scheme for determining the interpolation degree based on the number
of active voices in accordance with the invention.
Figure 4 shows an alternative scheme for determining the interpolation degree based
on the number of active voices in accordance with the invention.
Figure 5 shows a voice of the synthesiser of Figure 1 in more detail.
Figure 6 shows a mobile phone with a music synthesiser in accordance with the invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0034] Figure 1 shows a music synthesiser in accordance with the invention. As is conventional,
the synthesiser comprises a controller 2, a plurality of voices 4, a wave-table memory
6, a filter table 8, a mixer 10 and a digital-to-analogue conversion module 12.
[0035] Although the synthesiser is hereinafter described as a wave-table based synthesiser
that uses the MIDI protocol, it will be appreciated that the invention is applicable
to any wave-table based synthesiser that is required to calculate a sample that lies
between two stored samples.
[0036] It should be noted that the term 'sample' used herein, refers to a single audio sample
point.
[0037] The total number N of voices 4 in the synthesiser defines the maximum polyphony of
the system. As N increases, the polyphony increases allowing a greater number of sounds
to be produced simultaneously. For a MIDI synthesiser conforming to the General MIDI
System Level 1 (GM-1), the value of N will be at least 24. For clarity, only three
voices are shown in Figure 1.
[0038] The controller 2 receives data through an input 14. The data will comprise a stream
of MIDI information that relates to a piece of music or specific set of sounds. Each
MIDI file will contain a list of events that describe the specific steps that the
synthesiser must perform in order to generate the required sounds.
[0039] In the case of MIDI files stored within portable communication devices, a file may,
for example, relate to a short piece of music that can be used as a ring-tone.
[0040] The controller 2 processes the MIDI data stream and directs the appropriate parts
of the data to the relevant voices 4 so that the required sound can be synthesised.
For example, the required sound may consist of several different instruments playing
at once, and therefore each voice 4 will handle one monophonic instrument or one part
of a polyphonic instrument at a time. Often, the MIDI file will contain instructions
relating to the particular voices 4 that are to be used in synthesising the next output.
[0041] Depending upon the particular content of the MIDI file, a different number of voices
may be in use at any one time, depending upon the particular piece of music being
reproduced.
[0042] Each voice 4 is connected to the controller 2, the mixer 10, the wave-table memory
6 and the filter table 8.
[0043] The wave-table memory 6 contains a number of sequences of digital samples. Each sequence
may, for example, represent a musical note for a particular musical instrument. Due
to restrictions on memory, only a few notes per instrument may be stored.
[0044] Filter table 8 contains a number of values of a filter. In a preferred embodiment,
the values represent a sinc function (where a sinc function is (sin(x))/x).
[0045] Although not shown in Figure 1, both the wave-table memory 6 and the filter table
8 have a multiplexer that allows each table to be accessed more than once per sample
period (a sample period is defined as the inverse of the sampling rate, i.e. the rate
at which the original sound was sampled). Therefore, each of voices 1 to N can share
the same set of resources.
[0046] As is conventional, a voice 4, based upon the instructions received from the controller
2 and the interpolation degree of the system, produces the required output sample
16.
[0047] Often the sound to be produced by a particular voice 4 does not correspond in frequency
to one of the stored sequences of samples. Therefore, the voice 4 must 'shift' the
frequency of the stored sequence to produce a sound at the required frequency.
[0048] For example, if a stored sequence of samples represents a middle C note on a piano,
then this sequence can be shifted in frequency to obtain a C# note or D note.
[0049] The frequency of the required sound can be expressed as a multiple of the frequency
of the stored sequence. This multiple is written as a rational number M/L and is known
as the phase increment.
[0050] Therefore, if the required frequency is twice the frequency of the stored sequence,
then the phase increment will be equal to 2. If the required frequency is half the
frequency of the stored sequence then the phase increment will be equal to 1/2. In
the example where a C# note is required, the phase increment will be the twelfth root
of 2 (an irrational number) which can be approximated by a rational number.
[0051] Often, when the frequency of a stored sequence of samples is shifted, the required
samples are not stored in the memory. That is, the required sample falls between two
stored samples.
[0052] Therefore, the voice 4 retrieves a number of samples surrounding the required sample
from the wave-table memory 6 and an equal number of filter coefficients from the filter
table 8. Each sample retrieved from the wave-table memory 6 is then multiplied with
an appropriate filter coefficient from the filter table 8 and the products combined
to produce the output of the voice 16.
[0053] The coefficients of the filter table 8 are chosen so that, if the wave-table memory
6 does contain the required sample, then the other samples retrieved from the wave-table
memory 6 are multiplied by a zero filter coefficient and the stored sample is output.
[0054] In a preferred embodiment where the filter table 8 contains values that are representative
of a sinc function, the period of the sinc function is twice the sample period.
[0055] Each output 16 of a voice 4 is sent to a mixer 10 where the outputs 16 of all active
voices 4 are combined into a combined output 18 and passed to the DAC module 12.
[0056] The DAC module 12 contains one or more digital-to-analogue converters that convert
the combined output 18 of the mixer 10 to an analogue signal 20.
[0057] Figure 2 shows a method performed by the controller 2 of Figure 1 in accordance with
the invention.
[0058] In step 101, the controller 2 analyses the MIDI data stream and determines the number
of voices 4 that will be active during the next sample period. That is, the controller
2 determines how many different voices 4 will be contributing outputs 16 to the mixer
10.
[0059] In step 103, the controller 2 determines the number of samples to be used by each
voice 4 in calculating the next output 16 (known as the interpolation degree I
D) and instructs the voices 4 appropriately.
[0060] In step 105, each active voice 4 calculates an output 16 on the basis of instructions
received from the controller 2 using a number of stored samples in the calculation
equal to the interpolation degree I
D. Each active voice 4 will also use a number of filter coefficients from the filter
coefficient table 8 equal to the interpolation degree I
D.
[0061] The process repeats for each output cycle, i.e. the process is repeated once every
sample period.
[0062] In the embodiments of the invention described with reference to Figures 3 and 4,
the synthesiser has 24 voices 4 and has a maximum interpolation degree of 11.
[0063] Figure 3 is a table that shows a scheme for determining the interpolation degree
based on the number of active voices in accordance with the invention. Specifically,
for any given number of active voices, the table gives the interpolation degree to
be used.
[0064] For example, if the controller 2 determines that only one voice 4 will be active
during the next sample period, the controller 2 instructs the voice 4 to use an interpolation
degree of 11.
[0065] As the number of active voices 4 increases, the interpolation degree used in the
calculation of the outputs 16 decreases in a linear fashion.
[0066] If all 24 voices 4 of the synthesiser are active then the controller 2 determines
that an interpolation degree of 4 will be used.
[0067] Alternatively, if a maximum computational complexity is defined for the synthesiser,
such as for a synthesiser used in a portable device, the interpolation degree may
be chosen such that the maximum computational complexity is not exceeded.
[0068] Figure 4 is another table that shows such a scheme. Again, the interpolation degree
decreases as the number of active voices 4 increases. However, the change is not linear.
Instead, the interpolation degree is calculated so that the maximum computational
complexity is not exceeded.
[0069] For example, if a synthesiser has 24 voices, a maximum interpolation degree of 11
and consumes 0.5MIPS/degree/ voice (Millions of Instructions Per Second/degree/ voice)
then a conventional synthesiser may require up to 132 MIPS. This computational power
far exceeds that available in a typical current portable device such as a mobile terminal.
[0070] Using the scheme shown in Figure 4, the computational power will not exceed 50 MIPS.
This value is more appropriate for a portable device.
[0071] The actual scheme used will be determined by the computational power available to
the synthesiser and the amount of computational power required to implement each degree
of interpolation.
[0072] Figure 5 shows a voice of Figure 1 in more detail. The voice 4 is shown with the
controller 2, wave-table memory 6 and filter table 8.
[0073] A processor 22 receives the instructions relevant to the voice 4 from the controller
2. The instructions will comprise the MIDI information relevant to the voice 4 and
an indication relating to the interpolation degree to be used in calculating the next
output 16.
[0074] The controller 2 may indicate to each voice 4 the actual interpolation degree that
is to be used in calculating the next output, or alternatively, the controller 2 may
indicate the number of active voices to each voice 4 and let the processor 22 determine
the appropriate interpolation degree.
[0075] The processor 22 is connected to a phase increment register 24, a counter 26 and
a filter coefficient selector 28.
[0076] The filter coefficient selector 28 is connected to the filter table 8 for retrieving
appropriate filter coefficients.
[0077] The filter coefficient selector 28 is also connected to the counter 26.
[0078] In accordance with the invention, the processor 22 informs the counter 26 and the
filter coefficient selector 28 of the interpolation degree that is to be used for
calculating the next output 16.
[0079] The processor 22 sets the value of the phase increment register 24 for producing
the required output 16. The value of the phase increment register 24 will be M/L,
where L and M are integers and is determined by the processor 22 on the basis of the
instructions received from the controller 2.
[0080] The phase increment value is passed to an adder 30. The adder 30 is connected to
a phase register 32 that records the current phase. The output of the adder 30 comprises
an integer part and a fractional part.
[0081] Both the integer part and fractional part of the output of the phase register are
fed back to the adder 30.
[0082] The integer part of the output of phase register 32 is also passed to a second adder
34 where it is added to the output of the counter 26. The integer output of the adder
34 is connected to the wave-table memory 6 and determines a sample that is to be read
out.
[0083] The samples that are retrieved from the wave-table memory are passed to a multiply-accumulate
circuit 36.
[0084] In addition to being fed into the adder 30, the fractional part of the phase register
32 output is fed to the filter coefficient selector 28.
[0085] The output of the filter coefficient selector 28 is passed to the multiply-accumulate
circuit 36 where it is combined with the samples retrieved from the wave-table memory
6.
[0086] The operation of the voice 4 is now briefly described.
[0087] When the input of the phase register 32 is a non-integer value, i.e. the fractional
part is non-zero, the required sample lies between two tabulated samples. Therefore
the required sample must be calculated.
[0088] The adder 30 operates once per sample period to add the phase increment from the
phase increment register 24 to the current phase (provided by the phase register 32).
[0089] The integer part of the phase register 32 output indicates the wave-table memory
address that contains the stored sample immediately before the required sample. To
calculate the required sample, a number of samples equal to I
D are read out from the wave-table memory 6.
[0090] The counter 26 increments by one each time to select I
D samples from around the required sample. Therefore, when I
D is 8, four samples before the required sample are read out along with four samples
after the required sample. If I
D is 5, three samples before the required sample are read out along with two samples
after the required sample. Alternatively, two samples before the required sample are
read out and three samples after the required sample. These samples are passed to
the multiply-accumulate circuit 36.
[0091] It should be noted that the counter operates from its initial value to its final
value once each sample period.
[0092] The filter coefficient selector 28 obtains appropriate filter coefficients from the
filter table 8 depending upon the fractional part of the phase register output and
the interpolation degree.
The filter coefficient selector 28 is controlled by the counter 26 to obtain I
D coefficients from the filter table 8.
[0093] Once the filter coefficients 44 have been obtained from the filter table 8, the input
received from the counter 26 is used to pass the filter coefficients to the multiply-accumulate
circuit 36. Here, the samples obtained from the wave-table memory 6 are multiplied
with the appropriate filter coefficients 44, and the products added to obtain the
output for the voice 16.
[0094] As the fractional part of the phase register 32 changes, the filter coefficients
obtained from the filter table 8 will change.
[0095] As the number of active voices 4 changes, the processor will instruct the counter
26 and filter coefficient selector 28 of the required interpolation degree as appropriate.
[0096] Figure 6 shows a mobile phone with a music synthesiser in accordance with the invention.
Although the invention is described as being incorporated in a mobile phone, it will
be appreciated that the invention is applicable to any portable device such as a personal
digital assistant (PDA), pagers, electronic organisers, or any other equipment in
which it is desirable to be able to reproduce high quality polyphonic sound.
[0097] As is conventional, the mobile phone 46 comprises an antenna 48, transceiver circuitry
50, a CPU 52, a memory 54 and a speaker 56.
[0098] The mobile phone 46 also comprises a MIDI synthesiser 58 in accordance with the invention.
The CPU 52 provides the MIDI synthesiser 58 with MIDI files. The MIDI files may be
stored in a memory 54, or may be downloaded from a network via the antenna 48 and
transceiver circuitry 50.
[0099] There is thus described a sound synthesiser that reduces the computational requirements
of a synthesiser with a high degree of polyphony, while ensuring that audible artefacts
are kept to a minimum.
1. A synthesiser comprising:
a memory, containing a plurality of stored samples;
means for calculating an output signal for each of a plurality of active voices, using
a plurality of samples selected from the stored samples for each of the active voices;
wherein the number of samples used for each active voice by the means for calculating
depends upon the number of active voices.
2. A synthesiser as claimed in claim 1, wherein the number of samples used for each active
voice by the means for calculating decreases as the number of active voices increases.
3. A synthesiser as claimed in claim 2, wherein the number of samples used for each active
voice by the means for calculating decreases as the number of active voices increases
so that a maximum computational complexity is not exceeded.
4. A synthesiser as claimed in claim 1, wherein the number of samples used for each active
voice by the means for calculating decreases non-linearly as the number of active
voices increases.
5. A synthesiser as claimed in one of claims 1 to 4 wherein the plurality of samples
stored in the memory comprise samples of musical notes.
6. A synthesiser as claimed in claim 5 wherein the plurality of samples stored in the
memory comprise samples of musical notes produced by different musical instruments.
7. A synthesiser as claimed in any preceding claim wherein the means for calculating
an output signal comprises a filter table.
8. A synthesiser as claimed in claim 7 wherein the filter table contains coefficients
of a sinc function.
9. A synthesiser as claimed in any preceding claim, wherein the synthesiser is a MIDI
music synthesiser.
10. A portable device, comprising a synthesiser as claimed in any preceding claim.
11. A portable device as claimed in claim 10 wherein the portable device is a mobile phone.
12. A portable device as claimed in claim 10 wherein the portable device is a pager.
13. A method of operating a synthesiser having a plurality of samples stored in a memory,
the method comprising the steps of:
determining the number of voices that will be active in producing a sound;
determining an interpolation degree on the basis of the number of voices that will
be active, wherein the interpolation degree is defined as the number of samples to
be selected from the plurality of samples stored in the memory; and
calculating an output for each active voice, using the number of said stored samples
determined by the interpolation degree.
14. A method as claimed in claim 13, wherein the interpolation degree decreases as the
number of active voices increases.
15. A method as claimed in claim 13, wherein the interpolation degree decreases as the
number of active voices increases so that a maximum computational complexity is not
exceeded.
16. A method as claimed in claim 13, wherein the interpolation degree decreases non-linearly
as the number of active voices increases.