BACKGROUND OF THE INVENTION
FIELD OF THE INVENTION The present invention relates to a multi-processor system provided
with facility for allowing synchronous communications between processors arranged
in a master and slave relationship.
DESCRIPTION OF THE PRIOR ART
[0001] For the purpose of accomplishing scientific computations or calculations at an increased
speed, there has been developed a high-speed processor for executing at a high speed
the arithmetic operations for those arrays which occur at a high frequency in the
scientific calculation. The system for processing the arithmetic operations for the
arrays at a high speed may be generally classified into two categories, i.e., a vector
processor designed for processing one-dimensional vectors through pipeline at a high
speed and a parallel processing system including a plurality of processors arranged
in parallel with one another for executing processings in parallel. Although the application
of the present invention is not restricted to the vector processor or the parallel
processor, it seems convenient to elucidate the problems of the hitherto known systems
in conjunction with the vector processor for facilitating the understanding of the
underlying concept of the present invention.
[0002] The vector processor includes a vector processing mechanism for processing through
pipeline at a high speed a series of array data (vector data) ordered in a sequence.
However, it is not possible to process all the vector data with a single program.
There exist those data which can not but be processed through sequential processing
(referred to as the scalar processing) as in the case of conventional general purpose
computer. Under the circumstances, the vector processor includes in addition to the
vector processing mechanism for pipeline-processing of the vector data at a high speed
a scalar processing mechanism for realizing the function analogous to that of the
hitherto known general purpose computer. Concerning the relationship to be established
between the vector processing mechanism and the scalar processing mechanism incorporated
in the vector processor, several approaches may be conceived. In many vector processors,
however, the vector processing mechanism is physically separated from the scalar processing
mechanism.
[0003] As an example of the processor incorporating the vector processing mechanism and
the scalar processing mechanism described above, there can be mentioned a processor
disclosed in GB-A- 2 113 878. The vector processor disclosed in this publication is
composed of a scalar processing unit corresponding to the aforementioned scalar processing
mechanism and a vector processing unit corresponding to the vector processing mechanism
mentioned above.
[0004] More specifically, in the case of the processor system disclosed in GB-A- 2 113 878,
the vector processor is activated only after a previous or preparatory setting procedure
such as loading of address data required for the vector processing in registers incorporated
in the vector processor has been executed by the scalar processor. Upon completion
of the vector processing, the vector processor inform the scalar processor of the
completion of vector processing by issuing an interrupt to the scalar processor or
by taking advantage of the test performed by the scalar processor. On the other hand,
the scalar processor executes predetermined scalar processing by utilizing the results
of the vector processing. In this manner, in the case of this known system, all the
data required for the vector processing are placed in the vector processor before
activation of the latter. It is however noted that each of vector instructions commanding
the vector processing does not require all the data to be supplied from the scalar
processor. Thus, execution of such vector instruction which requires only a part of
the data supplied from the scalar processor involves a problem of wasteful loss of
time (dead time), because the execution of such vector instruction is allowed only
after all the data have been set.
[0005] As described above, the scalar processor can perform the scalar processing after
completion of the vector processing in the vector processor by utilizing the results
of the vector processing. In this connection, it is also noted that each of the scalar
instructions commanding the scalar processing does not require all the results of
the vector processing. In other words, execution of such scalar instruction which
requires only a part of the results of the vector processing has to wait for completed
execution of all the vector processings, which in turn means that wasteful loss of
time is involved, giving rise to an additional problem.
[0006] In EP-A- 0 042 442 is described an information processing system, comprising a main
storage and a data processor. The data processor contains therein an instruction controller
and a plurality of arithmetic units, the instruction controller functions to receive
the instructions from the main storage and distribute the same to the respective arithmetic
units for parallel execution.
[0007] A synchronization instruction (WAIT) is inserted, in advance, into the sequential
instruction stream supplied from the main storage, such that the instructions, which
are provided after the occurrence of said synchronization instruction (WAIT), are
not executed until the execution of preceding-instructions.
SUMMARY OF THE INVENTION
[0008] It is therefore an object of the present invention to provide a multi-processor system
in which individual processors are imparted with capability of performing parallel
or multiple processings with improved efficiency by providing each processor with
means for accomplishing fine synchronous communication control among a number of the
processors.
[0009] In a multi-processor system including a master processor and at least one slave processor
which requires for executing the processing operation thereof data to be made available
by the master processor, when the slave processor starts the execution of an instruction
which requires only a part of data available from the master processor in response
to the setting of that part of data in the slave processor, erroneous operation will
take place if an instruction which is to be executed in succession to the above mentioned
instruction and which requires the setting of other data than the above mentioned
partial data is started before the setting of the other data. For excluding such erroneous
operation, it is necessary to establish, so to say, a synchronism or synchronization
between the master processor and the slave processor in such a manner that only after
completed execution of a certain processing, e.g., a particular instruction in the
slave processor, the processing by the slave processor is interrupted until a certain
processing by the master processor, e.g., loading of data required for execution of
a next instruction in the slave processor has been executed, and then the processing
in the slave processor is allowed to be restarted. However, the processing steps which
require such synchronization will differ from one to another program. Accordingly,
the synchronization has to be established on the instruction basis (i.e. instruction
by instruction).
[0010] In the case of the multi-processor system according to the present invention as claimed
in claim 1, the slave processor is imparted with a function capable of executing such
an instruction that upon execution of an instruction for stopping temporarily the
processing, a stop or pause indication is produced in the slave processor, whereby
activation of the slave processor for executing a next instruction is inhibited until
the stop or pause indication is reset by the master processor. On the other hand,
the master processor is imparted with a function or capability to execute such an
instruction with which it is checked whether the stop or pause indication is issued
in the slave processor and resets the stop indication when it is issued, while generation
of the stop indication is awaited if it is not issued. With this arrangement, synchronization
can be established between the master processor and the slave processor on the instruction
basis. More specifically, it is assumed, by way of example, that the master processor
activates the slave processor by setting only the data that are required for executing
a certain instruction by the slave processor, which in its turn stops the processing
after execution of the aforementioned instruction. The master processor can reset
the stop indication issued by the slave processor after having set the data required
for execution of a next instruction to be executed by the slave processor, which can
then be activated for executing the next instruction in response. In this way, the
time taken for processing can be reduced significantly. It is again assumed that the
master processor serves as a scalar processor with the vector processor serving as
the slave processor. In that case, even if many processing steps (e.g. 100 steps)
are involved in the processing for setting data in the vector processor by the scalar
processor, significant reduction in the time taken for the processing can be accomplished
by virtue of such arrangement that the vector processor is allowed to start the processing
without waiting for the setting of all data from the scalar processor.
[0011] In order to make it possible for the master processor to perform a processing by
utilizing the interim result of operation executed by the slave processor on the way
of the execution, while assuring that the slave processor can continue the operation
by using the data initially set by the master processor to thereby complete the arithmetic
operation with high speed or within a short time, it is necessary that the slave processor
informs the master processor of execution (patial completion) of the processing performed
to a particular step while continuing the processing under execution so that the master
processor can utilize as early as possible the result of the processing executed subsequently
by the slave processor in continuation. Also in this case, since the particular processing
step at which the master processor requires the interim result of the arithmetic operation
performed by the slave processor differs from one to another program, it is necessary
for the slave processor to inform the master processor of the completion of execution
of instruction on the instruction basis.
[0012] In the multi-processor system according to the present invention as claimed in claim
3, the slave processor is imparted with such a function to execute an instruction
for indicating completion of execution of a succeeding instruction by discriminatively
detecting the completed execution of the succeeding one, while the master processor
is imparted with a function or capability to execute an instruction for checking whether
the completed execution is indicated by the slave processor to thereby reset the indication
of completed execution if it is issued and otherwise wait for generation of the indication
of completed execution.
[0013] When the slave processor is constituted by the vector processor, the arithmetic unit,
vector register and others participating in the execution of instructions may differ
from one to another instruction. To deal with such situation, the slave processor
may be provided with means for storing the decoded result of a succeeding instruction
(i.e. information associated with the execution of that instruction) in response to
an instruction for identifying the completion of execution of that succeeding instruction,
wherein the decoded result is comparatively collated with information produced by
the slave processor as the result of completion of execution of that instruction,
to thereby identify the completion of execution of the instruction.
[0014] With the arrangements described above, the slave processor can perform arithmetic
operation under the command of the master processor and issue indication of completion
of execution of a particular instruction, as occasion requires, on the way of the
operation, while the master processor can detect discriminatively completion of execution
of a particular instruction performed by the slave processor to thereby carry out
arithmetic operation by utilizing the results of the arithmetic operation performed
by the slave processor up to that time point, whereby the overall processing time
can be shortened significantly. When the master processor is the scalar processor
with the slave processor being the vector processor, the scalar processor can perform
processing by utilizing the interim result available from the vector processor on
the way of operation in which the time for executing a vector instruction requires
a relatively long time. Thus, significant reduction in the processing time can be
attained.
[0015] Implementation of the aforementioned functions in the master processor and the slave
processor can be realized by addition and modification of logic circuits of the conventional
processor on a relatively small scale. The synchronization control according to the
invention scarcely exerts any serious influence to the instruction sequence adopted
heretofore. Accordingly, burden to be borne by language compilers and others due to
application of the present invention can be very small.
[0016] In summary, in a system including a plurality of processors which are interconnected
in master and slave relation, the slave processor can stop the instruction activation
processing at any given time point while the master processor can clear or remove
the stop or pause. Further, indication of the occurrence of event in the slave processor
can be made on the instruction basis. On the other hand, the master processor can
stop temporarily the processing until the occurrence of event is indicated. Besides,
fine synchronization control can be effectuated between the master processor and the
slave processor on the instruction basis. These are main advantages attendant on the
present invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0017] These and other objects and advantages of the present invention will become more
apparent upon reading the following detailed description taken in conjunction with
the drawings, in which:
Fig. 1 is a view showing a general arrangement of a vector processor;
Fig. 2 is a view for illustrating synchronous communication means in a hithto known
multi-processor system;
Fig. 3 is a view showing a FORTRAN program used in conjunction with description of
a multi-processor system according to an exemplary embodiment of the invention;
Figs. 4a and 4b are views for illustrating scalar object codes and vector object codes
corresponding to the FORTRAN program shown in Fig. 3 employed in the hitherto known
multi-processor system;
Figs. 5a and 5b are views for illustrating, respectively, scalar object codes and
vector object codes corresponding to the FORTRAN program shown in Fig. 3 to be employed
in the multi-processor system according to the invention;
Fig. 6 is a view showing a time chart for illustrating execution of the object codes
shown in Figs. 4a and 4b;
Fig. 7 is a view showing a time chart for illustrating execution of the object codes
shown in Figs. 5a and 5b; and
Fig. 8 is a view showing a circuit for carrying out the synchronization control according
to the invention.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0018] Before entering into detailed description of an exemplary embodiment of the present
invention, an arrangement of a hitherto knwon vector processor will be considered.
[0019] Fig. 1 is a view showing an arrangement of a vector processor such as disclosed in
GB-A-2 113 878. In this figure, there are shown those portions of the vector processor
which are relevant to the invention. It should further be added that the general arrangement
of the vector processor to which the invention can be applied is substantially similar
to that shown in Fig. 1 and differs from the latter in the respect that a circuit
described later on by reference to Fig. 8 is additionally incorporated. Now, referring
to Fig. 1, a reference numeral 1 denotes a main storage, 2 denotes a main storage
controller, 3 denotes a scalar processing unit. A numeral 31 denotes a cache or a
high-speed buffer memory for storing a map of a segment of the main storage. A numeral
32 denotes a group of registers which may include, for example, sixteen general purpose
registers and sixteen floating point registers. A numeral 33 denotes a group of functional
units for performing operations in the scalar processing units. A numeral 34 denotes
a scalar instruction controller / for performing reading, decoding and controlling
the execution of scalar instructions which correspond to those employed in the hitherto
known general purpose computer. A numeral 41 denotes a group of registers incorporated
in the vector processing unit which may include, for example, a group of vector registers
and a group of scalar registers. The group of vector registers may include, for example,
thirty-two vector registers each of which may hold vector data consisting of 256 elements,
by way of example. The scalar register group may include, for example, thirty-two
scalar registers each of which is destined to hold scalar data as in the case of the
general purpose register and the floating point register incorporated in the scalar
processing unit. A reference numeral 42 denotes a group of vector arithmetic units
for processing by pipeline the data read out from the vector register or the scalar
register, the results of the processing being stored in the vector register or scalar
register. As the vector operation units, there can be mentioned adders and multipliers.
A numeral 43 denotes a group of vector address registers used for indicating location
of vector data in the main storage when the vector processing unit 4 reads or writes
the vector data from or to the main storage 1. The vector address register is composed
of a vector base register (VBR) used for holding the base address of the vector data
and a vector increment register (VIR) for holding inter-element space of the vector
data. A numeral 44 denotes a vector instruction execution controller for reading and
decoding vector instruction and controlling the execution thereof.
[0020] Next, description will be made concerning operations of the scalar processing unit
and the vector processing unit upon execution of a program.
[0021] For performing the vector processing, preprocessing such as previous loading of such
data or values to the vector address registers which are used when the vector data
are read out from the main storage is required. In the hitherto known vector processor
shown in Fig. 1, the vector processing is executed in accordance with the procedure
described below.
PROCEDURE 1
[0022] In precedence to the start of the vector processing, predetermined values requisite
for executing the vector processing are loaded in the vector address registers and
the scalar registers in the scalar processing unit.
PROCEDURE 2
[0023] Information concerning the base addresses of the main storage where the vector instruction
string is stored, the number of elements of the vector data to be processed and the
like is sent to the vector processing unit from the scalar processing unit to thereby
activate the vector processing unit.
PROCEDURE 3
[0024] The activated vector processing unit reads out and executes the vector instructions
sequentially in accordance with information sent from the scalar processing unit to
perform the vector processing.
PROCEDURE 4
[0025] After vector processing unit has been activated, the scalar processing unit can perform
independently other scalar processing such as, for example, preparation for the succeeding
vector processing in parallel with execution of the vector processing by the vector
processing unit.
PROCEDURE 5
[0026] Completion or end of execution of the vector processing in the vector processing
unit is dealt with by testing the status of the vector processing unit by the scalar
processing unit or by issuing an interrupt to the scalar processing unit from the
vector processing unit.
[0027] As will be appreciated from the above, the relation between the scalar processing
unit and the vector processing unit is such that the former is a master with the latter
being a slave, wherein the processing proceeds with in such a manner in which the
vector processing unit executes the vector processing under the command issued by
the scalar processing unit.
[0028] Fig. 2 shows instructions prepared for allowing synchronous communication between
the scalar processing unit and the vector processing unit in the hitherto known processor
shown in Fig. 1. All of these instructions are decoded and executed by the scalar
processing unit serving as the master processing unit.
[0029] Next, taking as the example a simple FORTRAN-program processing, description will
be made in what manner the synchronous communication is carried out between the scalar
processing unit and the vector processing unit for proceeding with execution of the
processings, while making clear the problems as involved.
[0030] Fig. 3 is a view showing an exmaple of the FORTRAN program, in which a DO-loop including
statements indicated by the statement identifying numbers 2 to 6 is processed in the
vector processing unit, while the other statements are processed in the scalar processing
unit.
[0031] Figs. 4a and 4b are views showing object programs corresponding to the FORTRAN program
shown in Fig. 3. The object programs include scalar object codes to be executed by
the scalar processing unit and vector object codes to be executed by the vector processing
unit, the scalar object codes being shown in Fig. 4a while the vector object codes
are shown in Fig. 4b. In the scalar objects shown in Fig. 4a, eleven scalar instructions
ID (identification) labelled with S1 to S11, respectively, are for the preparation
processing which is executed in precedence to the vector processing. Among them, ten
instructions S2 to S11 are used for loading the address information for arrays A,
B, C, P and Q in the program shown in Fig. 3 in the vector base register (VBR) and
the vector increment register (VIR) incorporated in the vector processing unit. The
instruction S1 is used for placing the initial value 0.0 of a variable S contained
in the program shown in Fig. 3 to the scalar register provided in the vector processing
unit. The scalar instruction ID designated by S12 (hereinafter expressed in the form
such as ID-S12) is used for activating the vector processing by informing the later
of the addresses of the main storage where the vector objects shown in Fig. 4b are
stored (detailed description in this respect is omitted). In response thereto, the
vector processing unit executes sequentially the instructions shown in Fig. 4b as
vector objects. The scalar instruction ID-S13 is used for testing whether the vector
processing unit is in the operating state or in the idle state, the result of this
test being reflected to the condition code. (This instruction is referred to as the
vector processor test instruction.) When the vector processing unit is in the operating
state, this means that execution of the activated vector processing is not completed
yet. In this case, a BC instruction (branch-on-condition instruction) designated by
S14 is activated to be looped to the instruction ID-S13, whereby the completion of
the vector processing is waited for. At the end of the processing performed by the
vector processing unit, the results of the summing operation (the variable S in the
program shown in Fig. 3) placed in the scalar register at the zeroth address in the
vector processing unit is transferred to the floating point register at the zeroth
address in the scalar processing unit to be utilized in the succeeding operation (refer
to the statement identified by the number 7 in the program shown in Fig. 3).
[0032] The processing performed with the aid of the synchronous communication unit effected
between the scalar processing unit and the vector processing unit as described above
suffers problems mentioned below.
(1) The address information for all the array data used in the vector processing has
to be loaded in the address registers incorporated in the vector processing unit in
precedence to the start of execution of the vector processing.
However, in order to execute the vector load instruction corresponding to the instruction
V1 in the vector object codes shown in Fig. 4b, it is sufficient that the processing
of two instructions S2 and S3 in the scalar objects has been completed. It is unnecessary
to load completely all the address information.
Since the scalar processing unit and the vector processing unit can be operated
in parallel, it will be obvious that the processing can be carried out with high efficiency,
if the scalar instruction for loading the address information can be processed in
a proper synchronism with the vector instruction which utilizes the address information.
(2) When the result of computation performed by the vector processing unit is to be
referred to by the scalar processing unit, it is necessary for the scalar processing
unit to check whether the result has been loaded. However, the scalar processing unit
is capable of carrying out only the check as to whether the vector processing unit
is operating or in the idle state. Accordingly, in the case of the hitherto known
system shown in Figs. 4a and 4b, even when the result of the summing operation for
the array A has been determined through execution of the vector instruction V4, the
scalar processing unit is not allowed to refer to the result so long as all the vector
instructions V5 to V8 have not been completely executed.
[0033] As will be appreciated from th above elucidation, the synchronous communication means
effective between the scalar processing unit and the vector processing unit shown
in Fig. 2 can neither establish the synchronization nor perform the communication
between both the units during the period from the start of the vector processing to
the complete end thereof.
[0034] With the present invention, it is contemplated to provide means for controlling finely
the synchronization and communication among a plurality of processors bearing the
master and slave relation to one another, to thereby realize the parallel processings
with enhanced efficiency.
[0035] The hitherto known synchronizing means is so arranged that the master processor activates
the processing in the slave processor and checks whether execution of the activated
processing has been completed or not in the slave processor. In contrast, according
to the present invention, there can be realized the function for temporarily stopping
the activation of a new instruction on the side of the slave processor until a certain
event occurs in the master processor, the function of issuing an indication of occurrence
of a certain event in the slave processor, and the function of testing the indication
by the master processor. Further by providing the indication of activation of instruction
being temporarily stopped in the slave processor as well as indication of the occurrence
of event in the slave processor in program status words (PSW) on the side of the slave
processor, synchronizing means can be implemented in a convenient manner. Besides,
at the time of the task switching, the synchronization control information is also
stored by recovering the PSW from the saved state.
[0036] Now, the invention will be described in detail in conjunction with exemplary embodiments
thereof. The processor system referred to in the following description of the embodiment
is same as the one shown in Fig. 1. In the processor system shown in Fig. 1, the scalar
processing unit corresponds to the master processor, and the slave processing unit
corresponds to the vector processor.
[0037] According to the illustrated embodiment, a vector processing program status word
(hereinafter referred to as VPPSW in abbreviation) is added with two information bits,
while the instructions to be dealt with by the scalar processing unit are added with
two instructions while those executed by the vector processing unit being added with
two instructions. Accordingly, the VPPSW and the added instructions will be briefly
explained, being followed by the description of an example of the synchronization
control with the aid of the status word and the instructions.
[0038] The program status word or PSW is used in most of the conventional processors for
holding concentratedly the important information concerning the operation state of
the processor, the address of the succeeding instruction and others. In this connection,
it is noted that in many instances, the PSW usually includes unused bits. Accordingly,
in many cases, these unused bits may be used for the two bits which are added according
to the teaching of the invention. In the case of the vector processing unit shown
in Fig. 1, the PSW is present. The PSW for the vector processing unit is referred
to as VPPSW. Since the details of the format for the VPPSW bears no direct relation
to the present invention, description thereof will be unnecessary. According to the
teaching of the invention, the VPPSW is added with two bits mentioned below.
(1) Pause Bit (referred to as P-bit in abbreviation)
When this bit is "1", initiation of the processing of a new instruction is temporarily
stopped. When this bit becomes "0", the temporary stop or pause state is cleared.
It should be mentioned that this bit has only the function to inhibit temporarily
the start of execution of a new instruction and exerts no influence to the instruction
of which execution has been started as well as the instruction being executed.
(2) Signal Bit (referred to as S-bit in abbreviation)
This bit assumes "1" when the processing of an instruction has been completed in
the designated vector processing unit. Usage of this bit will be described later on.
[0039] Next, two instructions additionally employed in the vector processing unit will be
elucidated below.
(1) VP Pause Instruction (referred to as VPPAU instruction)
When this instruction is executed, the P-bit of the VPPSW (vector processing program
status word) assumes "1", whereupon the processing for issuing the instruction succeeding
to this VPPAU instruction is stopped temporarily.
(2) Vector Signal Instruction (referred to as VSIG instruction)
Upon completion of the processing of an instruction succeeding to this VSIG instruction,
the S-bit of the VPPSW assumes "1".
[0040] Next, two instructions additionally employed in the scalar processing unit will be
described.
(1) Resume Vector Processing Instruction (referred to as RSMVP)
This instruction is for testing the value of the P-bit of the PSW. When the value
of the P-bit is "1", the latter is reset to "0", whereupon activation of instruction
in the vector processing unit is released from the pause state (temporary stop state).
When the Value of P-bit is "0", the releasing of the vector processing unit from the
pause is inhibited until the P-bit assumes "1". Processing of this instruction is
terminated by resetting the P-bit to "0" from "1".
(2) Test and Reset B-Bit Instruction (referred to as TRB)
With this instruction, the S-bit of the VPPSW is tested. When the value of S-bit
is "1", the processing of this TRB instruction is terminated by resetting the S-bit
to "0". When the S-bit assumes the "0", execution of this TRB instruction is inhibited
until the S-bit assumes the value "1". Then, by resetting the S-bit to "0", execution
of this instruction is terminated.
[0041] Next, description will be made in what manner the FORTRAN program illustrated in
Fig. 3 is processed by making use of the synchronization means described above.
[0042] Figs. 5a and 5b illustrate object codes used in connection with the FORTRAN program
shown in Fig. 3 when the synchronization means described above is employed according
to the invention. The object codes include the scalar object codes and the vector
object code as in the case of the hitherto known object code system illustrated in
Figs. 4a and 4b. As will be seen from comparison of Figs. 4a and 4b with Figs. 5a
and 5b, the object codes employed in association with the synchronization control
means according to the invention bear close resemblance to those known heretofore.
In particular, those instructions are utterly same which have same scalar instructions
ID and same vector instructions ID added at the left to the individual instructions.
In the case of the scalar object codes illustrated in Fig. 5a, the RSMVP instruction
for the scalar instruction ID-S101, the RSMVP instruction for the scalar instruction
ID-S103 and the TRB instruction for the scalar instruction ID-S102 are added, while
the TVP instruction and the BC instruction for the scalar instructions ID-S13 and
ID-S14, respectively, are deleted. On the other hand, in the case of the vector object
codes shown in Fig. 5b, the VPPAU instruction for the vector instruction ID-V101,
the VSIG instruction for the vector instruction ID-V102 and the VPPAU instruction
for the vector instruction ID-V103 are added. In the following, functions realized
by these added instructions will be described in detail.
(1) EXVP instruction for the scalar instruction ID-S12
This instruction is not issued after completion of all the vector processing preparations
such as data loading in the address registers and others, but issued at the time point
when the setting of the address information for the array B has been completed and
when execution of the instruction VL for the vector instruction ID-V1 can be initiated.
(2) RSMVP instruction for the scalar instruction ID-S101
This instruction commands clearing of the temporary pause of instruction activation
in the vector processing unit at the time point when the setting of address information
for the array C has been completed in succession to the activation of vector processing
in response to the EXVP instruction for the scalar instruction ID-S12. In the case
of the vector object codes, the VPPAU instruction for the vector instruction ID-V101
is issued in precedence to the VL instruction for the vector instruction ID-V2 which
uses the address information of array C, whereby the initiation of execution of the
VL instruction for the vector instruction ID-V2 is temporarily stopped. The RSMVP
instruction for the scalar instruction ID corresponds to the VPPAU instruction for
the vector instruction ID V101 and functions to clear the temporary stop of activation
of the VL instruction for the vector instruction ID-V2 upon completed setting of the
address information for the array C.
It should be noted that regardless of which of the PSMVP instruction for the scalar
instruction ID-S101 and the VPPAU instruction for the vector instruction ID-V101 is
executed in precedence, there arises no problem since the scalar processing unit and
the vector processing unit operate in parallel independent of each other. More specifically,
when the VPPAU instruction for the vector instruction ID-V101 is executed in precedence,
the vector processing unit is set to the stand-by state until the RSMVP instruction
for the scalar instruction ID-S101 is issued in the scalar processing unit. In reverse,
when the RSMVP instruction for the scalar instruction ID-S101 is executed earlier,
the scalar processing unit is set to the stand-by state until the VPPAU instruction
for the vector instruction ID-V101 is issued in the vector processing unit.
(3) RSMVP instruction for the scalar instruction ID-S103
This instruction commands the clearing of the temporary stop or pause of instruction
activation in the vector processing unit at the time point when the setting of address
information for the arrays A, Q and P has been completed after the RSMVP instruction
for the scalar instruction ID-S101 was issued. In the case of the vector object codes,
the VPPAU instruction for the vector instruction ID-V103 is issued in precedence to
the VST instruction ID-V5 which uses the address information for the array A, whereby
initiation of execution of the VST instruction for the vector instruction ID-V5 is
temporarily stopped. The RSMVP instruction for the scalar instruction ID-S103 corresponds
to the VPPAU instruction for the vector ID-V103 and serves to clear the temporary
stop of instruction succeeding to the vector instruction ID-V5.
It should be mentioned here that several RSMVP instruction may be inserted between
the RSMVP instruction for the scalar instruction ID-S101 and the RSMVP instruction
for the scalar instruction ID-S103 (with corresponding VPPAU instructions being inserted
between the vector object codes) for realizing more fine synchronization between the
scalar processing unit and the vector processing unit. However, in view of the fact
that processing of the vector instructions V2, V3 and V4 which require longer time
for execution when compared with the scalar instruction is started in response to
the clearing of the pause in activation of the vector instructions by the RSMVP instruction
for the scalar instruction ID-S101, it is considered that enough time is available
for processing the scalar instructions S6 to S11 in the meantime. Accordingly, arrangement
is adopted such as illustrated in Fig. 5a.
(4) TAB instruction for scalar instruction ID-S102
With this TAB instruction, it is waited until indication of completion of writing
of the results of the vector summing operation in the scalar register at the zeroth
address is made in order to allow the result of the vector summing operation as placed
in the scalar registered at the zeroth address to be referred to after execution of
the MVFS instruction for the scalar instruction ID-S15. This TAB instruction corresponds
to the VSIG instruction for the vector instruction ID-V102 among the vector object
codes. Indication for completion of the writing operation is made with the aid of
S-bit of the VPPSW instruction, as described hereinbefore.
(5) VPPAU instruction for vector instruction ID-V101
This instruction serves to stop temporarily the processing for activating the vector
instructions until the address information for the array C has been set in the VBR
and VIR at the respective zeroth addresses, for allowing the execution of the VL instruction
for the vector instruction ID-V2. This temporary stop of activation is indicated by
setting the P-bit by the VPPAU instruction to "1". This instruction VPPAU corresponds
to the RSMVP instruction for the scalar instruction ID-S101 among the scalar object
codes.
(6) VSIG instruction for vector instruction ID-V102
This instruction VSIG serves to set the S-bit of the instruction VPPSW upon completed
execution of the VSM instruction (vector summing operation) for the vector instruction
ID-V4 which succeeds to this VSIG instruction. Due to provision of this instruction,
the result of execution of the VSLM instruction can be referred to as early as possible
without being subjected to the influence of the other instructions.
(7) VPPAU instruction for vector instruction ID-V103
This VPPAU instruction serves to stop temporarily activation of the vector instruction
until address information for the arrays A, Q and P has been completely set, for allowing
the VST instruction or VL instruction succeeding to the vector instruction ID-V103
to be executed. This VPPAU instruction corresponds to the RSMVP instruction for the
scalar instruction ID-S103 among the scalar objects.
[0043] Now, description will be made concerning the efficiency attained with the object
codes employed in association with the synchronization control means according to
the invention (illustrated in Figs. 5a and 5b) in comparison with the hitherto known
object codes illustrated in Figs. 4a and 4b with the aid of time charts.
[0044] Fig. 6 shows a time chart corresponding to the hitherto known object codes illustrated
in Figs. 4a and 4b, and Fig. 7 shows a time chart corresponding to the object codes
utilizing the synchronization control means according to the present invention (shown
in Figs. 5a and 5b). In the time charts shown in Figs. 6 and 7, the order or sequence
in which instructions are decoded or issued is taken along the ordinate while the
time base is taken along the abscissa in terms of the number of machine cycles. The
order in which instructions are decoded is shown in terms of the scalar instructions
ID and the vector instructions ID illustrated in Figs. 4a and 4b or Figs. 5a and 5b,
wherein an upper half is allocated to the scalar object codes with a lower half allocated
to the vector object codes. Preparation of the time charts is based on the assumption
mentioned below.
(1) The pipeline processing pitch in the vector processing unit is one cycle.
(2) Although the time taken for the first data to pass through the pipe (often referred
to as travel time) in the vector processing varies in dependence on the types of operations,
it is assumed that the travel time is ten cycles uniformly.
(3) The number of times the DC-loop in the FORTRAN program shown in Fig. 3 is executed
is one hundred. With a single vector operation, one hundred elements are processed.
Accordingly, from the assumptions (1) and (2), the time taken for processing one vector
instruction amounts to 110 cycles in total which is a sum of 10 cycles taken for obtaining
the first result and subsequent 100 cycles taken for obtaining successively one hundred
results over 100 cycles.
(4) The time taken for executing the scalar instruction is assumed to be two cycles
uniformly.
(5) The time pitch in decoding the scalar instruction or the vector instruction as
well as the time pitch in issuing these instructions, respectively, is assumed to
be two cycles.
(6) Many of the vector processors adopt speeding-up technique referred to as the chaining.
For the details of this technique, reference may be made GB-A-2 113 878 Among the
individual instructions of the vector objects shown in Figs. 4b and 5b, the chaining
can be realized between the vector instruction ID-V1 or ID-V2 and the vector instruction
ID-V3, between the vector instructions ID-V6 and ID-V7, and between the vector instructions
ID-V7 and ID-V8.
[0045] Although the assumptions enumerated above do not reflect the actual parameters in
concern of the processor with accuracy, they reflect the characteristics of the vector
processor in general and are reasonably adequate for explaining the effects accomplished
with the present invention.
[0046] From the comparison of the time chart shown in Fig. 6 with the one shown in Fig.
7, the following differences can be seen.
(1) The total processing time is 272 cycles in the case of the conventional technique
illustrated in Fig. 6. In contrast, according to the invention, the processing time
is shortened to 248 cycles as can be seen in Fig. 7. This difference in the processing
time can be explained by the fact that in contrast to the prior art technique in which
the vector processing can be initiated only after all the preparatory processings
for the vector processing have been completed, the present invention allows the vector
processing to be initiated at an earlier time-point when only a part of the preparatory
processing for the vector processing has been completed.
(2) According to the prior art technique illustrated in Fig. 6, the scalar processing
unit must wait for completion of the vector processing for a period which amounts
to as long as 239 cycles. This period includes the time taken for processing the vector
instructions V5 to V8 for which the scalar processing unit need not await the completion
of processing in actuality. In contrast, the corresponding stand-by time of the scalar
processing unit is reduced down to 105 cycles in the case of the technique illustrated
in Fig. 7.
(3) The result of the summing operation performed by the vector processing unit can
be derived at the 272-nd cycle by executing the scalar instruction ID-S15 by the scalar
processing unit in the case of the technique illustrated in Fig. 6. In contrast, according
to the invention, the corresponding result can be extracted as early as at 140-th
cycle.
[0047] Next, control logic for implementing the synchronization control means according
to the present invention will be described. The control logic is not of a large scale
but can be realized by some logic circuits added to the scalar instruction controller
34 and the vector instruction execution controller 44 in the vector processor shown
in Fig. 1.
[0048] Fig. 8 is a view showing the control logic configuration for implementing the synchronization
control means according to the present invention. In Fig. 8, reference numerals 34
and 44 denote a scalar instruction controller and a vector instruction execution controller
which are equivalent to those shown in Fig. 1, respectively. At first, description
will be directed to the internal structure of the scalar instruction controller 34.
A numeral 301 denotes a scalar instruction register for fetching and holding the incoming
scalar instruction transmitted over a signal line 310. A numeral 302 denotes a scalar
instruction decoding circuit for decoding the scalar instruction held by the scalar
instruction register 301. A numeral 303 denotes a scalar instruction activating logic
for supplying an activating signal on the basis of decoded result of the scalar instruction
decoding circuit 302 to the functional unit, registers and others participating in
the processing of the instruction through a group of signal lines 318. A signal of
logic "1" makes appearance on the signal line 311 when the PSMVP instruction is decoded,
while a signal of logic "1" is produced on the signal line 312 upon decoding of the
TRS instruction. The portion performing the instruction executing processing inclusive
of the processing mentioned above on the basis of the decoded information supplied
from the instruction decoder circuit 302 constitutes an executing portion.
[0049] Next, the internal structure of the vector instruction execution controller 44 will
be described. A reference numeral 401 denotes a vector instruction register for fetching
and holding the incoming vector instruction transmitted over a signal line 410. A
numeral 402 denotes a vector instruction decoding circuit for decoding the vector
instruction held in the vector instruction register 401. A numeral 403 denotes a vector
instruction activation deciding logic having functions mentioned below. The logic
403 centrally manages the states of the use of the functional units, vector registers
and others in the vector processing unit, checks the decoded information of instruction
inclusive of information identifying the vector register used for executing the instruction
fed from the vector instruction decoding circuit 402 as well as other information,
and makes decision as to whether the vector instruction in concern can be activated
or not. When the decision results in that the activation is permitted, an activation
signal is produced to be supplied through the signal lines 411 to the functional unit,
the vector registers and others which participate in the processing of the above mentioned
instruction. The signal lines 412 serve to transmit a message that the vector instruction
under execution has been ended. Information including the data identifying the vector
register used for executing the instruction is transmitted through these signal lines
412 to be inputted to the vector instruction activation deciding logic 403, whereby
the information concerning the functional unit, vector register and others used in
executing the completed vector instruction is altered. There is produced on the signal
line 413 a signal of logic "1" upon decoding of the VPPAU instruction, while the signal
of logic "1" is produced on the signal line 414 upon decoding of the VSIG instruction.
Numerals 450 and 451 denote registers, respectively. The register 450 is set when
the VSIG instruction is decoded to thereby produce logic "1" on the signal line 414.
The register 450 in the ON state commands that the information outputted from the
vector instruction decoding circuit 402 concerning the instruction decoded subsequently
is to be placed in the register 451. Thus, the information concerning the instruction
succeeding to the VSIG instruction is held by the register 451. A numeral 452 denotes
a comparison circuit for comparing the information concerning the instruction succeeding
to the VSIG instruction and held by the register 451 with the information supplied
through the signal line 412 concerning the instruction of which execution has been
completed. When the information supplied through the signal lines 412 concerns the
instruction held in the register 451, the comparison circuit 452 produces logic "1"
on the signal line 415. More specifically, upon completion of execution of the instruction
succeeding to the VSIG instruction, the logic "1" signal is produced on the signal
line 415 to message this fact. A numeral 453 denotes a vector program status word
(VPPSW in abbreviation) for holding the status of the program being processed in the
vector processing unit. According to the invention, the P-bit and S-bit mentioned
hereinbefore are added. The outputted P-bit is inputted to the vector instruction
activation deciding logic 403. When the P-bit is "1", this is informed to the vector
instruction activation deciding logic 403, whereby activation of the vector instruction
is inhibited. A numeral 454 denotes a P-bit registration control logic. The portion
for performing the instruction executing processing inclusive of the above mentioned
processing on the basis of the decoded information transferred from the instruction
decoder circuit 402 constitutes an executing section.
[0050] Next, functions of the control logics described above will be elucidated in conjunction
with the processing of the scalar instructions PSMVP and TRS as well as the vector
instructions VPPAU and VSIG.
(1) Function of the P-bit registration control logic 454
When the VPPAU instruction is decoded by the vector processing unit, this is informed
to the P-bit registration control logic 454 by way of the signal line 413. In response,
the P-bit is set to "1".
When the PSMVP instruction is decoded by the scalar processing unit, this is informed
to the logic 454 through the signal line 311. If the P-bit has the value of "1" at
that time, the P-bit is reset to "0". In that case, the signal line 416 is held in
the OFF state. Consequently, the subsequent instruction is not suspended from activation
in the instruction activating logic 303. The instruction activation deciding logic
403 releases the vector instruction from the activation inhibiting state in response
to the resetting of the P-bit to "0". On the other hand, when the value of the P-bit
is "0", the signal line 416 is set to the ON state, and this is informed to the scalar
instruction controller 303 for suspending temporarily the activation of the scalar
instruction (i.e. the RSMVP instruction). Subsequently, when the VPPAU instruction
is decoded in the vector processing unit and the corresponding message is issued,
the signal line 416 is set to the OFF state to release the scalar instruction from
the suspended state. At this time, the value of P-bit is left to be "0".
When both the RSMVP instruction and the VPPAU instruction are simultaneously decoded,
the P-bit is reset to "0". Neither the scalar instruction activating logic 303 nor
the vector instruction activation deciding logic 403 suspends the instruction from
activation.
(2) Function of the S-bit registration logic 455
Upon completion of the processing of the instruction succeeding to the VSIG instruction
having been processed in the vector processing unit, this is informed to the registration
control logic 455 by way of the signal line 415. In response, the registration control
logic 455 sets the S-bit to "1". When the TRS instruction is decoded in the scalar
processing unit, corresponding information is given to the logic 455 through the signal
line 312. When the value of S-bit is "1" at that time, the S-bit is reset to "0".
In that case, the signal line 417 is held in the OFF state, whereby the activation
of the succeeding instruction is protected from being suspended in the scalar instruction
activation logic 303. On the other hand, the vector instruction activation deciding
logic 403 performs operation for activating the succeeding instruction regardless
of the value assumed by the S-bit. When the value of S-bit is "0", the signal line
417 is turned on, and this status is informed to the scalar instruction controller
303 to thereby temporarily suspend the activation of the scalar instruction (i.e.
the TRS instruction). Subsequently, the processing of the instruction succeeding to
the VSIG instruction comes to end, in response to which the signal line 417 is set
to the OFF state, whereby the scalar instruction is released from the activation suspended
state. At this time, the value of S-bit is left to be "0".
When the TRS instruction occurs simultaneously with the completion of execution
of the instruction succeeding,to the DSIG instruction, the S-bit is reset to "0".
In the scalar instruction activating logic 303, the activation of the succeeding instruction
is prevented from being suspended.
[0051] With the circuit arrangement described above with reference to Fig. 8, the concept
of synchronization control according to the invention can be realized.
[0052] In the case of the illustrated embodiments, discussion has been made in conjunction
with the vector processor. However, it should be understood that the application of
the present invention is never restricted to the vector processor, but can be applied
to synchronization control between processors of any types so far as they are in the
master-slave relation, such as between the scalar processing unit and an array processing
unit, by way of example.
1. Multi-Prozessorsystem mit einem Hauptspeicher (1) zum Speichern von Befehlen und Daten,
einem Master-Prozessor (3) zum Liefern von Daten an einen Slave-Prozessor (4), die
für die von diesem durchzuführende Bearbeitung benötigt werden, wobei der Master-Prozessor
(3) eine Testfunktion für den Betriebsstatus des Slave-Prozessors aufweist, und mit
mindestens einem Slave-Prozessor (4) zum Einleiten der Verarbeitung in Antwort auf
einen von dem genannten Master-Prozessor (3) gegebenen Befehl, wobei der Slave-Prozessor
eine Funktion zur Benachrichtigung des Master-Prozessors von der Beendigung der Verarbeitung
aufweist,
wobei der Slave-Prozessor (4) umfaßt:
ein Befehlsregister (401) zur Speicherung eines aus dem Hauptspeicher (1) ausgelesenen
Befehls;
einen Dekoder (402) zur Dekodierung des in dem Befehlsregister (401) gespeicherten
Befehls;
eine Ausführungseinrichtung (41, 42) zur Ausführung des Befehls entsprechend dem
Ergebnis der von dem Dekoder (402) durchgeführten Dekodierung; und
eine Indikator-Einrichtung (453, 454) zur Angabe des Betriebsstatus des Slave-Prozessors;
wobei der Dekoder (402) mit einer Einrichtung (413) zur Lieferung eines dekodierten
Ausgangssignals versehen ist, das, wenn ein eine Pause angebender Pause-Befehl (VPPAU)
im Befehlsregister (401) gespeichert ist, in der genannten Indikator-Einrichtung einen
Pause-Indikator (P) setzt, der anzeigt, daß sich der Slave-Prozessor im Pause-Status
befindet, und
wobei der Slave-Prozessor so lange, wie von der Indikator-Einrichtung der Pause-Indikator
ausgegeben wird, ist die Einleitungs-Bearbeitung für einen folgenden Befehl durchzuführen;
und
wobei der Master-Prozessor umfaßt:
ein Befehlsregister (301) zum Speichern eines aus dem Hauptspeicher (1) ausgelesenen
Befehls;
einen Dekoder (302) zur Dekodierung des in dem Befehlsregister (301) gespeicherten
Befehls; und
eine Ausführungseinrichtung (32, 33) zur Ausführung des Befehls entsprechend dem
Ergebnis der von dem Dekoder (302) durchgeführten Dekodierung;
wobei der Dekoder (302) mit einer Einrichtung (311) zur Lieferung eines dekodierten
Ausgangssignals, wenn ein Pause-Lösch-Befehl (RSMVP) in dem Befehlsregister (301)
gespeichert ist, versehen ist, um dem Slave-Prozessor einen Befehl zur Freigabe des
Slave-Prozessors aus dem Pause-Status zu liefern;
wobei der Slave-Prozessor eine Einrichtung (454) zur Beantwortung des von dem Master-Prozessor
gelieferten Befehls zur Freigabe aus dem Pausen-Status aufweist, um, wenn ein Pause-Indikator
in der Indikator-Einrichtung (311, 312) gesetzt ist, diesen Pause-Indikator (P) zurückzusetzen,
um dem Master-Prozessor ein Abschlußsignal zu liefern, das angibt, daß der Pause-Indikator
zurückgesetzt ist, und um, wenn der PauseIndikator nicht gesetzt ist, die Bearbeitung
des folgenden Befehls durchzuführen, während dem Master-Prozessor ein Nicht-Abschlußsignal
geliefert wird,
wobei der Master-Prozessor auf das Nicht-Abschlußsignal durch Aussetzen der Einleitung
des folgenden Befehls bis zum Empfang des Abschlußsignals antwortet.
2. Multi-Prozessorsystem nach Anspruch 1, wobei die Indikator-Einrichtung des Slave-Prozessors
als Indikator ein Programm-Status-Wort (453) verwendet.
3. Ein Multi-Prozessorsystem mit einem Hauptspeicher (1) zur Speicherung von Befehlen
und Daten, einem Master-Prozessor (3) zur Lieferung von Daten an einen Slave-Prozessor
(4), die für die von diesem durchzuführende Verarbeitung benötigt werden, wobei der
Master-Prozessor (3) eine Testfunktion für den Betriebsstatus des Slave-Prozessors
(4) aufweist und eine Verarbeitung unter Verwendung des Ergebnisses der von dem Slave-Prozessor
durchgeführten Verarbeitung durchführt, und mit mindestens einem Slave-Prozessor (4)
zur Einleitung einer Verarbeitung in Antwort auf einen von dem Master-Prozessor gegebenen
Befehl, wobei der Slave-Prozessor eine Funktion zur Benachrichtigung des Master-Prozessors
vom Abschluß der Verarbeitung aufweist,
wobei der Slave-Prozessor (4, 44) umfaßt:
ein Befehlsregister (401) zur Speicherung eines aus dem Hauptspeicher (1) ausgelesenen
Befehls;
einen Dekoder (402) zur Dekodierung des im Befehlsregister (401) gespeicherten
Befehls;
eine Ausführungseinrichtung (41, 42) zur Ausführung des Befehls entsprechend dem
Ergebnis der von dem Dekoder (402) durchgeführten Dekodierung; und
eine Indikator-Einrichtung (453, 455, 417) zur Angabe eines Betriebsstatus des
Slave-Prozessors;
wobei der Dekoder (402) mit einer Einrichtung (414) zur Lieferung eines dekodierten
Ausgangssignals, wenn ein Indikator-Befehl (VSIG) im Befehlsregister (401) gespeichert
ist, versehen ist,
wobei die Ausführungseinrichtung mit einer Speichereinrichtung (450) zur vorübergehenden
Speicherung eines Indikator-Befehls in Antwort auf das dekodierte Ausgangssignal,
und einer Ausführungs-Abschluß-Erkennungseinrichtung (451, 452, 455) versehen ist,
die von dem Indikator-Befehl in der genannten Speichereinrichtung freigegeben wird,
um den Abschluß der Ausführung des auf den Indikator-Befehl folgenden Befehls zu erkennen
und einen Indikator (S) für den Abschluß einer Ausführung in der Indikator-Einrichtung
(453) zu setzen; und
wobei der Master-Prozessor (3, 34) umfaßt:
ein Befehlsregister (301) zur Speicherung eines aus dem Hauptspeicher (1) gelesenen
Befehls;
einen Dekoder (302) zur Dekodierung des in dem Befehlsregister (301) gespeicherten
Befehls; und
eine Ausführungseinrichtung (32, 33) zur Ausführung des Befehls entsprechend dem
Ergebnis der von dem Dekoder durchgeführten Dekodierung;
wobei der Dekoder (302) mit einer Einrichtung (312) zur Lieferung eines dekodierten
Ausgangssignals, wenn ein Indikator-Rücksetz-Befehl (TRS) im Befehlsregister gespeichert
ist, versehen ist, um dem Slave-Prozessor ein Indikator-Lösch-Befehl zu senden,
wobei der Slave-Prozessor mit einer auf den genannten von dem Master-Prozessor
gelieferten Indikator-Lösch-Befehl ansprechenden Einrichtung (455) versehen ist, wenn
der Indikator (S) für den Abschluß der Ausführung in der Indikator-Einrichtung (453)
gesetzt ist, diesen zurückzusetzen, um dadurch dem Master-Prozessor ein Abschlußsignal
(417) zu liefern, das anzeigt, daß der Indikator für den Abschluß der Ausführung zurückgesetzt
worden ist, und um, wenn der Indikator für den Abschluß der Ausführung nicht gesetzt
ist, dem Master-Prozessor ein Nicht-Abschluß-Signal zu liefern, wobei der Master-Prozessor
auf das Nicht-Abschluß-Signal durch Aussetzung der Einleitung des folgenden Befehls
bis das Abschluß-Signal empfangen ist antwortet.
4. Multi-Prozessorsystem nach Anspruch 3, wobei der Slave-Prozessor ein Vektor-Prozessor
(4) ist, und die Ausführungs-Abschluß-Erkennungseinrichtung eine zweite Speichereinrichtung
(451) zur Speicherung des Ergebnisses der Dekodierung des folgenden Befehls und eine
Vergleichseinrichtung (452) zum Vergleich des in der zweiten Speichereinrichtung gespeicherten
Inhalts mit der den Ausführungsabschluß des folgenden Befehls betreffenden Information
aufweist, um dadurch den Ausführungsabschluß des folgenden Befehls zu erkennen.
5. Multi-Prozessorsystem nach Anspruch 3, wobei die Indikator-Einrichtung des Slave-Prozessors
als Indikator ein Programm-Status-Wort (453) des Slave-Prozessors verwendet.