BACKGROUND
I. Claim of Priority under 35 U.S.C. §119
II. Field of the Disclosure
[0002] The technology of the disclosure relates generally to out-of-order processor (OOP)-based
devices, and, in particular, to avoiding instruction flushes caused by resource overflows
in OOP-based devices.
III. Background
[0003] Out-of-order processors (OOP) are computer processors that are capable of executing
computer program instructions in an order determined by an availability of each instruction's
input operands, regardless of the order of appearance of the instructions in the computer
program being executed. By dispatching and executing instructions out-of-order, an
OOP may be able to fully utilize processor clock cycles that otherwise would go wasted
while the OOP waits for data access operations to complete. One implementation of
OOP-based devices is based on what is referred to herein as a "block-atomic architecture,"
in which computer programs are subdivided into instruction blocks that each include
multiple instructions that are committed atomically as a group. Load instructions
and store instructions within each instruction block may be buffered until execution
of the instruction block is complete, at which time all of the load instructions and
store instructions are committed together.
[0004] Some conventional OOPs also include one or more system resources (e.g., queues or
data structures, as non-limiting examples) that may be occupied or otherwise consumed
by instructions that are decoded and issued in-order but dispatched out-of-order.
Such system resources generally are decoupled from the dispatch stage of the OOP and/or
subsequent resource-freeing stages of the OOP, such that there is no mechanism for
communicating an occupancy status of the system resource to the relevant stages. As
a consequence, the OOP must be designed to handle or avoid "resource overflows" that
result from the system resource being fully occupied.
[0005] One extreme design approach to reducing the occurrence of resource overflows is to
provision the system resource to be so large as to virtually guarantee that it will
never be completely occupied. This approach, though, is generally prohibitively expensive
in terms of power consumption and physical space within the OOP. Alternatively, the
system resource may be provisioned to be adequately sized for most use cases, and
the OOP may be configured to stall the processing of instructions if the system resource
temporarily becomes fully occupied. However, due to the possibility of out-of-order
processing of instructions, the system resource may become fully occupied by younger
instructions that block the forward progress of older instructions, thereby causing
a deadlock. In such circumstances, the OOP must perform a "resource overflow flush"
of the execution pipeline, which negatively impacts overall system performance. Moreover,
while it is possible to provide a communications path between the system resource
and the dispatch stage of the OOP to provide feedback regarding the occupancy status
of the system resource, relevant information for preventing resource overflows may
not be available until after instructions have already been dispatched by the dispatch
stage. Thus, a mechanism for regulating instruction dispatch in OOP-based devices
to prevent resource overflows is desirable.
SUMMARY OF THE DISCLOSURE
[0006] Aspects disclosed in the detailed description include providing predictive instruction
dispatch throttling to prevent resource overflows in out-of-order processor (OOP)-based
devices. In this regard, in one aspect, an OOP-based device includes an OOP that provides
a system resource that may be consumed or otherwise occupied by instructions. In some
aspects, the system resource may comprise an unordered load/store queue (ULSQ) of
a load/store unit (LSU) of the OOP. The OOP also provides an execution pipeline that
includes a decode stage for receiving and performing in-order decoding of instruction
blocks, as well as a dispatch stage for performing out-of-order dispatch of the instruction
blocks for execution. The OOP further maintains a running order. Such system resources
generally are decoupled from the dispatch stage of the OOP and/or subsequent resource-freeing
stages of the OOP, such that there is no mechanism for communicating an occupancy
status of the system resource to the relevant stages. As a consequence, the OOP must
be designed to handle or avoid "resource overflows" that result from the system resource
being fully occupied.
[0007] One extreme design approach to reducing the occurrence of resource overflows is to
provision the system resource to be so large as to virtually guarantee that it will
never be completely occupied. This approach, though, is generally prohibitively expensive
in terms of power consumption and physical space within the OOP. Alternatively, the
system resource may be provisioned to be adequately sized for most use cases, and
the OOP may be configured to stall the processing of instructions if the system resource
temporarily becomes fully occupied. However, due to the possibility of out-of-order
processing of instructions, the system resource may become fully occupied by younger
instructions that block the forward progress of older instructions, thereby causing
a deadlock. In such circumstances, the OOP must perform a "resource overflow flush"
of the execution pipeline, which negatively impacts overall system performance. Moreover,
while it is possible to provide a communications path between the system resource
and the dispatch stage of the OOP to provide feedback regarding the occupancy status
of the system resource, relevant information for preventing resource overflows may
not be available until after instructions have already been dispatched by the dispatch
stage. Thus, a mechanism for regulating instruction dispatch in OOP-based devices
to prevent resource overflows is desirable.
Attention is drawn to
US 2012/297170 A1 describing a method for decentralized resource allocation in an integrated circuit.
The method includes receiving a plurality of requests from a plurality of resource
consumers of a plurality of partitionable engines to access a plurality resources,
wherein the resources are spread across the plurality of engines and are accessed
via a global interconnect structure. At each resource, a number of requests for access
to said each resource are added. At said each resource, the number of requests are
compared against a threshold limiter. At said each resource, a subsequent request
that is received that exceeds the threshold limiter is canceled. Subsequently, requests
that are not canceled within a current clock cycle are implemented.
SUMMARY OF THE DISCLOSURE
[0008] The present invention is set forth in the independent claims. Preferred embodiments
of the invention are described in the dependent claims.
[0009] Aspects disclosed in the detailed description include providing predictive instruction
dispatch throttling to prevent resource overflows in out-of-order processor (OOP)-based
devices. In this regard, in one aspect, an OOP-based device includes an OOP that provides
a system resource that may be consumed or otherwise occupied by instructions. In some
aspects, the system resource may comprise an unordered load/store queue (ULSQ) of
a load/store unit (LSU) of the OOP. The OOP also provides an execution pipeline that
includes a decode stage for receiving and performing in-order decoding of instruction
blocks, as well as a dispatch stage for performing out-of-order dispatch of the instruction
blocks for execution. The OOP further maintains a running count that indicates an
estimated number of pending instructions that will be consuming the system resource,
as well as a resource usage threshold that indicates a maximum number of instructions
to be dispatched before a potential resource overflow may occur. In exemplary operation,
upon receiving an instruction block, the decode stage extracts a proxy value that
indicates an approximate predicted count of instructions within the instruction block
that will consume a system resource. For example, in aspects in which the system resource
is a ULSQ, the proxy value may comprise a maximum load/store identifier (LSID) of
the load instructions and/or store instructions within the instruction block, where
the value of the LSID may generally correspond to the number of load instructions
and/or store instructions within the instruction block. The decode stage then increments
the running count by the proxy value. Subsequently, the dispatch stage of the OOP
compares the running count to the resource usage threshold before dispatching any
younger instruction blocks. If the running count exceeds the resource usage threshold
(indicating that a resource overflow may be likely to occur), the dispatch stage blocks
dispatching of any younger instruction blocks until the running count no longer exceeds
the resource usage threshold. In some aspects, the execution pipeline of the OOP further
provides a commit stage configured to decrement the running count by the proxy value
of the instruction block upon committing the instruction block. Some aspects may further
provide that, if a resource overflow occurs during execution of a looped code segment,
the decode stage may dynamically reduce the resource usage threshold, and then restore
the previous value of the resource usage threshold once the looped code segment is
completed.
[0010] In another aspect, an OOP-based device is provided. The OOP-based device comprises
an execution pipeline comprising a decode stage and a dispatch stage. The decode stage
of the execution pipeline is configured to receive an instruction block and extract
a proxy value indicating an approximate predicted count of one or more instructions
within the instruction block that will consume a system resource. The decode stage
then increments a running count by the proxy value. The dispatch stage of the execution
pipeline is configured to, prior to dispatching one or more instruction blocks younger
than the instruction block, determine whether the running count exceeds a resource
usage threshold. Responsive to determining that the running count exceeds the resource
usage threshold, the dispatch stage blocks dispatching of the one or more instruction
blocks younger than the instruction block until the running count no longer exceeds
the resource usage threshold.
[0011] In another aspect, an OOP-based device is provided. The OOP-based device comprises
a means for receiving an instruction block and a means for extracting a proxy value
indicating an approximate predicted count of one or more instructions within the instruction
block that will consume a system resource. The OOP-based device further comprises
a means for incrementing a running count by the proxy value. The OOP-based device
also comprises a means for, prior to dispatching one or more instruction blocks younger
than the instruction block, determining whether the running count exceeds a resource
usage threshold. The OOP-based device additionally comprises a means for blocking
dispatch of the one or more instruction blocks younger than the instruction block
until the running count no longer exceeds the resource usage threshold, responsive
to determining that the running count exceeds the resource usage threshold.
[0012] In another aspect, a method for providing predictive instruction dispatch throttling
in OOP-based devices is provided. The method comprises receiving, by a decode stage
of an execution pipeline of the OOP-based device, an instruction block. The method
further comprises extracting a proxy value indicating an approximate predicted count
of one or more instructions within the instruction block that will consume a system
resource. The method also comprises incrementing a running count by the proxy value.
The method additionally comprises, prior to dispatching one or more instruction blocks
younger than the instruction block, determining, by a dispatch stage of the execution
pipeline of the OOP-based device, whether the running count exceeds a resource usage
threshold. The method further comprises, responsive to determining that the running
count exceeds the resource usage threshold, blocking dispatch of the one or more instruction
blocks younger than the instruction block until the running count no longer exceeds
the resource usage threshold.
[0013] In another aspect, a non-transitory computer-readable medium is provided, having
stored thereon computer-executable instructions for causing an OOP of an OOP-based
device to receive an instruction block. The computer-executable instructions further
cause the OOP to extract a proxy value indicating an approximate predicted count of
one or more instructions within the instruction block that will consume a system resource.
The computer-executable instructions also cause the OOP to increment a running count
by the proxy value. The computer-executable instructions additionally cause the OOP
to, prior to dispatching one or more instruction blocks younger than the instruction
block, determine whether the running count exceeds a resource usage threshold. The
computer-executable instructions further cause the OOP to, responsive to determining
that the running count exceeds the resource usage threshold, block dispatching of
the one or more instruction blocks younger than the instruction block until the running
count no longer exceeds the resource usage threshold.
BRIEF DESCRIPTION OF THE FIGURES
[0014]
Figure 1 is a block diagram illustrating an exemplary out-of-order processor (OOP)-based
device configured to provide predictive instruction dispatch throttling to prevent
resource overflows;
Figures 2A-2E are block diagrams illustrating exemplary operations and communication
flows within the OOP-based device of Figure 1 for predictively throttling instruction
dispatch;
Figures 3A-3D are block diagrams illustrating exemplary operations and communications
flows within the OOP-based device of Figure 1 according to some aspects for dynamically
modifying a resource usage threshold in response to a resource overflow flush while
executing a looped code segment;
Figure 4 is a flowchart illustrating exemplary operations performed by the OOP-based
device of Figure 1 for providing predictive instruction dispatch throttling to prevent
resource overflow;
Figure 5 is a flowchart illustrating exemplary operations performed by the OOP-based
device of Figure 1 in some aspects for dynamically modifying a resource usage threshold
in response to a resource overflow flush while executing a looped code segment; and
Figure 6 is a block diagram of an exemplary processor-based system that can include
the OOP-based device of Figure 1.
DETAILED DESCRIPTION
[0015] With reference now to the drawing figures, several exemplary aspects of the present
disclosure are described. The word "exemplary" is used herein to mean "serving as
an example, instance, or illustration." Any aspect described herein as "exemplary"
is not necessarily to be construed as preferred or advantageous over other aspects.
[0016] Aspects disclosed in the detailed description include providing predictive instruction
dispatch throttling to prevent resource overflow in out-of-order processor (OOP)-based
devices. In this regard, Figure 1 illustrates an exemplary OOP-based device 100 that
includes an OOP 102 that is based on a block-atomic architecture, and that is configured
to execute a sequence of instruction blocks (not shown). In some aspects, the OOP
102 may be one of multiple block-atomic processor cores, each executing separate sequences
of instruction blocks and/or coordinating to execute a single sequence of instruction
blocks. The OOP 102 may encompass any one of known digital logic elements, semiconductor
circuits, processing cores, and/or memory structures, among other elements, or combinations
thereof. Aspects described herein are not restricted to any particular arrangement
of elements, and the disclosed techniques may be easily extended to various structures
and layouts on semiconductor dies or packages.
[0017] In exemplary operation, a Level 1 (L1) instruction cache 104 of the OOP 102 may receive
instruction blocks (not shown) that were fetched from a system memory (not shown)
for execution. A block predictor 106 determines a predicted execution path of the
instruction blocks. In some aspects, the block predictor 106 may predict an execution
path in a manner analogous to a branch predictor of a conventional OOP. A block sequencer
108 within an execution pipeline 110 orders the instruction blocks, and forwards the
instruction blocks to a decode stage 112 for in-order decoding. It is to be understood
that the execution pipeline 110 may include more decode stages 112 than illustrated
in Figure 1.
[0018] After decoding, the instruction blocks are held in an instruction buffer 114 pending
execution. The instruction buffer 114 in some aspects may comprise one or more reservation
stations in which instructions are held until all input operands are available and
the instructions are ready for dispatch and execution. A dispatch stage 116 then distributes
instructions of the active instruction blocks to one of one or more execution units
118 of the OOP 102. As non-limiting examples, the one or more execution units 118
may comprise an arithmetic logic unit (ALU) and/or a floating-point unit. The one
or more execution units 118 may provide results of instruction execution to a load/store
unit (LSU) 120 comprising an unordered load/store queue (ULSQ) 122 which, in some
aspects, may operate as a hazard detection structure. Instructions that have completed
execution are committed by a commit stage 124 of the execution pipeline 110, which
updates the architectural state of the OOP 102 based on the results of execution.
The commit stage 124 according to some aspects may comprise or otherwise be referred
to as a writeback stage, a retire stage, and/or a completion stage, as non-limiting
examples.
[0019] In the example of Figure 1, the ULSQ 122 of the LSU 120 is referred to as a system
resource 126. Load instructions and store instructions within the instruction blocks
executed by the execution pipeline 110 may be stored in entries (not shown) within
the ULSQ 122. Entries within the ULSQ 122 that are occupied by load instructions are
freed when the load instructions are committed, or when a resource overflow flush
of the ULSQ 122 occurs due to all entries of the ULSQ 122 being occupied. Likewise,
entries within the ULSQ 122 that are occupied by store instructions are freed at some
point after the store instructions are committed, or when a resource overflow flush
of the ULSQ 122 occurs.
[0020] However, at the time load instructions and/or store instructions are dispatched by
the dispatch stage 116 of the execution pipeline 110, the dispatch stage 116 has no
knowledge of the occupancy status of the ULSQ 122 because the dispatch stage 116 is
decoupled from the allocation and resource freeing processes that manage the contents
of the ULSQ 122. Consequently, at an instruction dispatch time, the dispatch stage
116 is unaware of how much space remains within the ULSQ 122, or how much space within
the ULSQ 122 will be available by the time newly dispatched load instructions and/or
store instructions reach the ULSQ 122. The dispatch stage 116 thus may issue a series
of instructions that result in an older instruction encountering a ULSQ 122 that is
completely occupied by younger instructions, resulting in a deadlock that must be
resolved by a resource overflow flush of the younger instructions within the ULSQ
122 and the execution pipeline 110. This issue may prove especially problematic when
executing looped code segments containing load instructions and/or store instructions.
[0021] In this regard, the OOP 102 provides a running count 128 and a resource usage threshold
130 that are employed by the decode stage 112 and the dispatch stage 116 of the execution
pipeline 110 to prevent resource overflows. As described in greater detail below with
respect to Figures 2A-2E, the decode stage 112 is configured to receive an instruction
block and extract a proxy value that indicates an approximate predicted count of instructions
within the instruction block that will consume the system resource 126 (the ULSQ 122,
in this example). In the example of Figure 1, the decode stage 112 may examine the
load/store identifiers (LSID) of the load instructions and/or store instructions within
the instruction block, and select the LSID with the maximum value as the proxy value.
The decode stage 112 then increments the running count 128 by the proxy value. Accordingly,
at any given time, the running count 128 represents an approximate maximum count of
instructions that are predicted to occupy the ULSQ 122.
[0022] Later in the execution pipeline 110, the dispatch stage 116 of the execution pipeline
110 is configured to compare the running count 128 to the resource usage threshold
130. If the running count 128 exceeds the resource usage threshold 130, the dispatch
stage 116 blocks dispatching of any younger instruction blocks until the running count
128 no longer exceeds the resource usage threshold 130. In some aspects, the commit
stage 124 of the execution pipeline 110 is configured to decrement the running count
128 by the proxy value of an instruction block upon committing the instruction block.
Some aspects may further provide that, if a resource overflow occurs during execution
of a looped code segment, the decode stage 112 of the execution pipeline 110 may dynamically
reduce the resource usage threshold 130, thus reducing the predicted occupancy level
of the ULSQ 122 at which instruction dispatch is throttled. The decode stage 112 in
such aspects may then restore the previous value of the resource usage threshold 130
once the looped code segment is completed.
[0023] To illustrate exemplary operations and communication flows within the OOP-based device
100 of Figure 1 for predictively throttling instruction dispatch, Figures 2A-2E are
provided. Figures 2A-2E show the decode stage 112, the instruction buffer 114, the
dispatch stage 116, the LSU 120, the ULSQ 122 (i.e., the system resource 126 in this
example), and the commit stage 124 of the execution pipeline 110 of Figure 1. Figures
2A-2E also show the running count 128 and the resource usage threshold 130 of Figure
1. In the example of Figures 2A-2E, the resource usage threshold 130 is initially
set to a value of 15, indicating that dispatching of younger instructions by the dispatch
stage 116 should be blocked when more than 15 instructions are predicted to occupy
the ULSQ 122. It is to be understood that the value of the resource usage threshold
130 may be programmatically set in some aspects, or may be hardwired in some aspects
to a predefined value. As discussed in greater detail below with respect to Figures
3A-3D, the resource usage threshold 130 may also be dynamically modified during instruction
execution by the decode stage 112 according to some aspects.
[0024] Referring now to Figure 2A, a series of instruction blocks 200(0)-200(X) enters the
execution pipeline 110 for processing. In Figure 2A, the decode stage 112 receives
the instruction block 200(0) (e.g., from the L1 instruction cache 104 via the block
sequencer 108 of Figure 1). The decode stage 112 extracts a proxy value 202 such as
a maximum LSID of the load instructions and/or the store instructions within the instruction
block 200(0). The proxy value 202 in the example of Figure 2A is 10, indicating that
approximately 10 load instructions and/or store instructions may reside within the
instruction block 200(0). The proxy value 202 is added to the running count 128, resulting
in a total of 10. The decode stage 112 then performs an in-order issue 204 of the
instruction block 200(0) to the instruction buffer 114, where the instruction block
200(0) awaits dispatching by the dispatch stage 116.
[0025] In Figure 2B, a similar sequence of operations is performed with respect to the instruction
block 200(1). After the decode stage 112 receives the instruction block 200(1), the
decode stage 112 extracts a proxy value 206 of four (4) from the instruction block
200(1). The proxy value 206 is then added to the running count 128, bringing the running
count 128 to a total of 14. The instruction block 200(1) is then sent to the instruction
buffer 114 via an in-order issue 208 by the decode stage 112, where it awaits dispatching
by the dispatch stage 116.
[0026] Likewise, in Figure 2C, the decode stage 112 receives the instruction block 200(X)
and extracts a proxy value 210 of eight (8) from the instruction block 200(X) by the
decode stage 112. Note that after the proxy value 210 is added to the running count
128, the running count 128 has a value of 22, which is greater than the resource usage
threshold 130 of 15. The decode stage 112 performs an in-order issue 212 of the instruction
block 200(X) to the instruction buffer 114, where the instruction block 200(X) awaits
dispatching by the dispatch stage 116.
[0027] Referring now to Figure 2D, the dispatch stage 116 has determined that the instruction
block 200(X) (i.e., the youngest of the instruction blocks 200(0)-200(X)) is ready
to be dispatched. Before dispatching the instruction block 200(X), though, the dispatch
stage 116 first accesses the values of the running count 128 and the resource usage
threshold 130, as indicated by arrows 214 and 216, respectively. The dispatch stage
116 compares the running count 128 and the resource usage threshold 130, and determines
that the value of the running count 128 (i.e., 22) exceeds the value of the resource
usage threshold 130 (i.e., 15). Accordingly, the dispatch stage 116 blocks the dispatching
of the instruction block 200(X), as indicated by element 218.
[0028] Finally, in Figure 2E, the oldest instruction block 200(0) (not shown) has been dispatched
by the dispatch stage 116 and committed by the commit stage 124. At this point, the
commit stage 124 according to some aspects is configured to decrement the running
count 128 by the proxy value 202 (i.e., 10) of the instruction block 200(0), bringing
the running count 128 down to a value of 12. According to some aspects, the commit
stage 124 may independently determine the proxy value 202 of the instruction block
200(0), or may access the proxy value 202 stored by the decode stage 112 using intermediate
storage such as a register or other memory (not shown). Once the dispatch stage 116
determines that the value of the running count 128 no longer exceeds the value of
the resource usage threshold 130, the dispatch stage 116 is then free to resume out-of-order
dispatching of the instruction block 200(X), as indicated by arrow 220.
[0029] As noted above, resource overflow flushes may be especially problematic if they occur
during execution of looped code segments (i.e., groups of instructions or instruction
blocks that are executed repeatedly, usually for a specified number of times or until
a specified condition is met). In this regard, Figures 3A-3D illustrate exemplary
operations and communications flows within some aspects of the execution pipeline
110 of the OOP-based device 100 of Figure 1 for dynamically modifying the resource
usage threshold 130 in response to a resource overflow flush while executing a looped
code segment. As with Figures 2A-2E, Figures 3A-3D show the instruction buffer 114,
the dispatch stage 116, the LSU 120, the ULSQ 122 (i.e., the system resource 126 in
this example), and the commit stage 124 of the execution pipeline 110, as well as
the running count 128 and the resource usage threshold 130. Figures 3A-3D further
show a decode stage 300 corresponding in functionality to the decode stage 112 of
Figure 1, and further configured to perform the functionality described herein with
respect to Figures 3A-3D. Additionally, in Figures 3A-3D, instruction blocks 302(0)-302(X)
together comprise a looped code segment 304, the existence of which may be detected
by the decode stage 300.
[0030] In Figure 3A, a sequence of operations similar to those shown in Figure 2A takes
place. The instruction block 302(0) of the looped code segment 304 is received by
the decode stage 300, which extracts a proxy value 306 having a value of 10. The proxy
value 306 is added to the running count 128, resulting in a total of 10. The decode
stage 300 then performs an in-order issue 308 of the instruction block 302(0) to the
instruction buffer 114, where the instruction block 302(0) awaits dispatching by the
dispatch stage 116.
[0031] Referring now to Figure 3B, before the instruction block 302(1) is received by the
decode stage 300, a resource overflow occurs in the ULSQ 122. This triggers a resource
overflow flush indication 310, which is detected by the decode stage 300. As seen
in Figure 3C, the decode stage 300 responds by reducing the value of the resource
usage threshold 130 from its previous value of 15 to a reduced value of 10. Processing
of the instruction blocks 302(0)-302(X) then proceeds as illustrated in Figures 2B-2E,
but with the newly reduced resource usage threshold 130. It is to be understood that
the amount by which the resource usage threshold 130 is reduced may vary from that
illustrated in Figure 3C, and further that the resource usage threshold 130 may be
reduced more than once during execution of the looped code segment 304. Finally, with
reference to Figure 3D, the decode stage 300 restores the resource usage threshold
130 to its previous value of 15 upon exiting the looped code segment 304.
[0032] To illustrate exemplary operations performed by the OOP-based device 100 of Figure
1 for providing predictive instruction dispatch throttling to prevent resource overflow,
Figure 4 is provided. Elements of Figures 1 and 2A-2E are referenced in describing
Figure 4 for the sake of clarity. Operations in Figure 4 begin with the decode stage
112 of the execution pipeline 110 of the OOP-based device 100 receiving the instruction
block 200(0) (block 400). In this regard, the decode stage 112 may be referred to
herein as "a means for receiving an instruction block." The decode stage 112 next
extracts the proxy value 202 indicating an approximate predicted count of one or more
instructions within the instruction block 200(0) that will consume a system resource
126 (such as the ULSQ 122 of Figure 1, as a non-limiting example) (block 402). Accordingly,
the decode stage 112 may be referred to herein as "a means for extracting a proxy
value indicating an approximate predicted count of one or more instructions within
the instruction block that will consume a system resource." The decode stage 112 then
increments the running count 128 by the proxy value 202 (block 404). The decode stage
112 thus may be referred to herein as "a means for incrementing a running count by
the proxy value."
[0033] Prior to dispatching the one or more instruction blocks 200(1)-200(X) younger than
the instruction block 200(0), the dispatch stage 116 of the execution pipeline 110
of the OOP-based device 100 determines whether the running count 128 exceeds the resource
usage threshold 130 (block 406). In this regard, the dispatch stage 116 may be referred
to herein as "a means for, prior to dispatching one or more instruction blocks younger
than the instruction block, determining whether the running count exceeds a resource
usage threshold." If the running count 128 does not exceed the resource usage threshold
130, processing resumes at block 408. However, if the dispatch stage 116 determines
at decision block 406 that the running count 128 exceeds the resource usage threshold
130, the dispatch stage 116 blocks dispatch of the one or more instruction blocks
200(1)-200(X) younger than the instruction block 200(0) until the running count 128
no longer exceeds the resource usage threshold 130 (block 410). Accordingly, the dispatch
stage 116 may be referred to herein as "a means for blocking dispatch of the one or
more instruction blocks younger than the instruction block until the running count
no longer exceeds the resource usage threshold, responsive to determining that the
running count exceeds the resource usage threshold."
[0034] In some aspects, the commit stage 124 of the execution pipeline 110 of the OOP-based
device 100 subsequently decrements the running count 128 by the proxy value 202 upon
committing the instruction block 200(0) (block 412). The dispatch stage 116 may then
dispatch a next ready instruction block 200(1)-200(X) (assuming that the running count
128 no longer exceeds the resource usage threshold 130 after being decremented by
the commit stage 124) (block 408).
[0035] Figure 5 illustrates exemplary operations performed by the OOP-based device 100 of
Figure 1 in some aspects for dynamically modifying the resource usage threshold 130
in response to a resource overflow flush while executing the looped code segment 304.
For the sake of clarity, elements of Figure 1 and Figures 3A-3D are referenced in
describing Figure 5. In Figure 5, operations begin with the decode stage 300 receiving
a resource overflow flush indication 310 during execution of the looped code segment
304 (block 500). In response, the decode stage 300 dynamically reduces the resource
usage threshold 130 (block 502). The decode stage 300 subsequently restores the resource
usage threshold 130 to a previous value upon exiting the looped code segment 304 (block
504).
[0036] Providing predictive instruction dispatch throttling to prevent resource overflow
in OOP-based devices according to aspects disclosed herein may be provided in or integrated
into any processor-based device. Examples, without limitation, include a set top box,
an entertainment unit, a navigation device, a communications device, a fixed location
data unit, a mobile location data unit, a global positioning system (GPS) device,
a mobile phone, a cellular phone, a smart phone, a session initiation protocol (SIP)
phone, a tablet, a phablet, a server, a computer, a portable computer, a mobile computing
device, a wearable computing device (e.g., a smart watch, a health or fitness tracker,
eyewear, etc.), a desktop computer, a personal digital assistant (PDA), a monitor,
a computer monitor, a television, a tuner, a radio, a satellite radio, a music player,
a digital music player, a portable music player, a digital video player, a video player,
a digital video disc (DVD) player, a portable digital video player, an automobile,
a vehicle component, avionics systems, a drone, and a multicopter.
[0037] In this regard, Figure 6 illustrates an example of a processor-based system 600 that
may correspond to the OOP-based device 100 of Figure 1. The processor-based system
600 includes one or more central processing units (CPUs) 602, each including one or
more processors 604 (which in some aspects may correspond to the OOP 102 of Figure
1). The CPU(s) 602 may have cache memory 606 coupled to the processor(s) 604 for rapid
access to temporarily stored data. The CPU(s) 602 is coupled to a system bus 608 and
can intercouple master and slave devices included in the processor-based system 600.
As is well known, the CPU(s) 602 communicates with these other devices by exchanging
address, control, and data information over the system bus 608. For example, the CPU(s)
602 can communicate bus transaction requests to a memory controller 610 as an example
of a slave device.
[0038] Other master and slave devices can be connected to the system bus 608. As illustrated
in Figure 6, these devices can include a memory system 612, one or more input devices
614, one or more output devices 616, one or more network interface devices 618, and
one or more display controllers 620, as examples. The input device(s) 614 can include
any type of input device, including but not limited to input keys, switches, voice
processors, etc. The output device(s) 616 can include any type of output device, including,
but not limited to, audio, video, other visual indicators, etc. The network interface
device(s) 618 can be any devices configured to allow exchange of data to and from
a network 622. The network 622 can be any type of network, including, but not limited
to, a wired or wireless network, a private or public network, a local area network
(LAN), a wireless local area network (WLAN), a wide area network (WAN), a BLUETOOTH
™ network, and the Internet. The network interface device(s) 618 can be configured
to support any type of communications protocol desired. The memory system 612 can
include one or more memory units 624(0)-624(N).
[0039] The CPU(s) 602 may also be configured to access the display controller(s) 620 over
the system bus 608 to control information sent to one or more displays 626. The display
controller(s) 620 sends information to the display(s) 626 to be displayed via one
or more video processors 628, which process the information to be displayed into a
format suitable for the display(s) 626. The display(s) 626 can include any type of
display, including, but not limited to, a cathode ray tube (CRT), a liquid crystal
display (LCD), a plasma display, etc.
[0040] Those of skill in the art will further appreciate that the various illustrative logical
blocks, modules, circuits, and algorithms described in connection with the aspects
disclosed herein may be implemented as electronic hardware, instructions stored in
memory or in another computer readable medium and executed by a processor or other
processing device, or combinations of both. The master devices, and slave devices
described herein may be employed in any circuit, hardware component, integrated circuit
(IC), or IC chip, as examples. Memory disclosed herein may be any type and size of
memory and may be configured to store any type of information desired. To clearly
illustrate this interchangeability, various illustrative components, blocks, modules,
circuits, and steps have been described above generally in terms of their functionality.
How such functionality is implemented depends upon the particular application, design
choices, and/or design constraints imposed on the overall system. Skilled artisans
may implement the described functionality in varying ways for each particular application,
but such implementation decisions should not be interpreted as causing a departure
from the scope of the present disclosure.
[0041] The various illustrative logical blocks, modules, and circuits described in connection
with the aspects disclosed herein may be implemented or performed with a processor,
a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC),
a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete
gate or transistor logic, discrete hardware components, or any combination thereof
designed to perform the functions described herein. A processor may be a microprocessor,
but in the alternative, the processor may be any conventional processor, controller,
microcontroller, or state machine. A processor may also be implemented as a combination
of computing devices (e.g., a combination of a DSP and a microprocessor, a plurality
of microprocessors, one or more microprocessors in conjunction with a DSP core, or
any other such configuration).
[0042] The aspects disclosed herein may be embodied in hardware and in instructions that
are stored in hardware, and may reside, for example, in Random Access Memory (RAM),
flash memory, Read Only Memory (ROM), Electrically Programmable ROM (EPROM), Electrically
Erasable Programmable ROM (EEPROM), registers, a hard disk, a removable disk, a.CD-ROM,
or any other form of computer readable medium known in the art. An exemplary storage
medium is coupled to the processor such that the processor can read information from,
and write information to, the storage medium. In the alternative, the storage medium
may be integral to the processor. The processor and the storage medium may reside
in an ASIC. The ASIC may reside in a remote station. In the alternative, the processor
and the storage medium may reside as discrete components in a remote station, base
station, or server.
[0043] It is also noted that the operational steps described in any of the exemplary aspects
herein are described to provide examples and discussion. The operations described
may be performed in numerous different sequences other than the illustrated sequences.
Furthermore, operations described in a single operational step may actually be performed
in a number of different steps. Additionally, one or more operational steps discussed
in the exemplary aspects may be combined. It is to be understood that the operational
steps illustrated in the flowchart diagrams may be subject to numerous different modifications
as will be readily apparent to one of skill in the art. Those of skill in the art
will also understand that information and signals may be represented using any of
a variety of different technologies and techniques. For example, data, instructions,
commands, information, signals, bits, symbols, and chips that may be referenced throughout
the above description may be represented by voltages, currents, electromagnetic waves,
magnetic fields or particles, optical fields or particles, or any combination thereof.
[0044] The previous description of the disclosure is provided to enable any person skilled
in the art to make or use the disclosure. Various modifications to the disclosure
will be readily apparent to those skilled in the art, and the generic principles defined
herein may be applied to other variations without departing from the scope of the
disclosure. Thus, the disclosure is not intended to be limited to the examples and
designs described herein, but is to be accorded the widest scope consistent with the
principles and novel features disclosed herein.
1. An out-of-order processor (102), OOP-based device (100), comprising an execution pipeline
(110) comprising a decode stage (112, 300) and a dispatch stage (116), wherein:
the decode stage of the execution pipeline is configured to:
receive an instruction block (200, 302);
extract a proxy value (202, 306) indicating an approximate predicted count of one
or more instructions within the instruction block that will consume a system resource;
and
increment a running count (128) by the proxy value; and
the dispatch stage of the execution pipeline is configured to:
prior to dispatching one or more instruction blocks younger than the instruction block,
determine whether the running count exceeds a resource usage threshold (130); and
responsive to determining that the running count exceeds the resource usage threshold,
block dispatching of the one or more instruction blocks younger than the instruction
block until the running count no longer exceeds the resource usage threshold; and
wherein the decode stage of the execution pipeline is further configured to:
receive a resource overflow flush indication during execution of a looped code segment;
and
responsive to receiving the resource overflow flush indication, dynamically reduce
the resource usage threshold.
2. The OOP-based device of claim 1, wherein the execution pipeline further comprises
a commit stage configured to decrement the running count by the proxy value upon committing
the instruction block.
3. The OOP-based device of claim 1, wherein the proxy value comprises a maximum value
of one or more load/store identifiers, LSIDs, of a corresponding one or more load
instructions or store instructions within the instruction block.
4. The OOP-based device of claim 1, wherein the system resource comprises an unordered
load/store queue, ULSQ, of a load/store unit, LSU, of the OOP-based device.
5. The OOP-based device of claim 1, wherein the decode stage of the execution pipeline
is further configured to restore the resource usage threshold to a previous value
upon exiting the looped code segment.
6. The OOP-based device of claim 1 integrated into an integrated circuit, IC.
7. The OOP-based device of claim 1 integrated into a device selected from the group consisting
of: a set top box; an entertainment unit; a navigation device; a communications device;
a fixed location data unit; a mobile location data unit; a global positioning system,
GPS, device; a mobile phone; a cellular phone; a smart phone; a session initiation
protocol, SIP, phone; a tablet; a phablet; a server; a computer; a portable computer;
a mobile computing device; a wearable computing device; a desktop computer; a personal
digital assistant, PDA; a monitor; a computer monitor; a television; a tuner; a radio;
a satellite radio; a music player; a digital music player; a portable music player;
a digital video player; a video player; a digital video disc, DVD, player; a portable
digital video player; an automobile; a vehicle component; avionics systems; a drone;
and a multicopter.
8. A method for providing predictive instruction dispatch throttling in out-of-order
processor (102), OOP-based devices, comprising:
receiving (400), by a decode stage (112, 300) of an execution pipeline (110) of an
OOP-based device (100), an instruction block (200, 302);
extracting (402) a proxy value (202, 306) indicating an approximate predicted count
of one or more instructions within the instruction block that will consume a system
resource;
incrementing (404) a running count (128) by the proxy value;
prior to dispatching (406) one or more instruction blocks younger than the instruction
block, determining, by a dispatch stage of the execution pipeline of the OOP-based
device, whether the running count exceeds a resource usage threshold (130); and
responsive to determining that the running count exceeds the resource usage threshold,
blocking dispatching (410) of the one or more instruction blocks younger than the
instruction block until the running count no longer exceeds the resource usage threshold;
and
further comprising:
receiving (500), by the decode stage, a resource overflow flush indication during
execution of a looped code segment; and
responsive to receiving the resource overflow flush indication, dynamically reducing
(502) the resource usage threshold (130).
9. The method of claim 8, further comprising decrementing (412), by a commit stage of
the execution pipeline of the OOP-based device, the running count by the proxy value
upon committing the instruction block.
10. The method of claim 8, wherein the proxy value comprises a maximum value of one or
more load/store identifiers, LSIDs, of a corresponding one or more load instructions
or store instructions within the instruction block.
11. The method of claim 8, wherein the system resource comprises an unordered load/store
queue, ULSQ, of a load/store unit, LSU, of the OOP-based device.
12. The method of claim 8, further comprising restoring the resource usage threshold to
a previous value upon exiting the looped code segment.
13. A non-transitory computer-readable medium, having stored thereon computer-executable
instructions which when executed cause an out-of-order processor, OOP, of an OOP-based
device to carry out the method of any of claims 8-12.
1. Eine Einrichtung, die auf einem OOP (102) (OOP = out-of-order processor) basiert,
bzw. eine OOP-basierte Einrichtung (100), die eine Ausführungs-Pipeline (110) aufweist,
die eine Decodierstufe (112, 300) und eine Dispatch- bzw. Versendestufe (116) aufweist,
wobei:
die Decodierstufe der Ausführungs-Pipeline konfiguriert ist zum:
Empfangen eines Befehlsblockes (200, 302);
Extrahieren eines Proxy-Wertes (202, 306), der eine ungefähre bzw. approximative vorhergesagte
Zählung von einem oder mehreren Befehlen innerhalb des Befehlsblockes anzeigt, der
eine Systemressource nutzen wird; und Inkrementieren einer laufenden Zählung (128)
um den Proxy-Wert; und
wobei die Dispatch- bzw. Versendestufe der Ausführungs-Pipeline konfiguriert ist zum:
vor dem Versenden von einem oder mehreren Befehlsblöcken, die jünger als der Befehlsblock
sind, Bestimmen, ob die laufende Zählung einen Ressourcennutzungsschwellenwert (130)
überschreitet; und
ansprechend auf Bestimmen, dass die laufende Zählung den Ressourcennutzungsschwellenwert
überschreitet, blockweises Versenden des einen oder der mehreren Befehlsblöcke, die
jünger als der Befehlsblock sind, bis die laufende Zählung den Ressourcennutzungsschwellenwert
nicht mehr überschreitet; und
wobei die Decodierstufe der Ausführungs-Pipeline weiter konfiguriert ist zum:
Empfangen einer Ressourcen-Overflow-Flush-Anzeige bzw. Ressourcenüberlaufleerungsanzeige
während einer Ausführung eines schleifenförmigen Codesegments; und
ansprechend auf Empfangen der Ressourcen-Overflow-Flush-Anzeige, dynamisches Reduzieren
des Ressourcennutzungsschwellenwerts.
2. OOP-basierte Einrichtung nach Anspruch 1, wobei die Ausführungs-Pipeline weiter eine
Commit- bzw. Eingabestufe aufweist, die konfiguriert ist zum Dekrementieren der laufenden
Zählung um den Proxy-Wert auf Eingeben des Befehlsblockes hin.
3. OOP-basierte Einrichtung nach Anspruch 1, wobei der Proxy-Wert einen maximalen Wert
von einem oder mehrerem von Lade/Speicher-Identifikatoren bzw. LSIDs (LSID = load/store
identifier) von einem entsprechenden einen oder entsprechenden mehreren Ladebefehlen
oder Speicherbefehlen innerhalb des Befehlsblockes aufweist.
4. OOP-basierte Einrichtung nach Anspruch 1, wobei die Systemressource eine nicht geordnete
Lade/Speicher-Warteschlange bzw. ULSQ (ULSQ = unordered load/store queue) einer Lade/Speicher-Einheit
bzw. LSU (LSU = load/store unit) der OOP-basierten Einrichtung aufweist.
5. OOP-basierte Einrichtung nach Anspruch 1, wobei die Decodierstufe der Ausführungs-Pipeline
weiter konfiguriert ist zum Wiederherstellen des Ressourcennutzungsschwellenwertes
auf einen vorherigen Wert auf Austreten aus dem schleifenförmigen Codesegment hin.
6. OOP-basierte Einrichtung nach Anspruch 1, die in einen integrierten Schaltkreis bzw.
IC integriert ist.
7. OOP-basierte Einrichtung nach Anspruch 1, die in eine Einrichtung integriert ist,
die aus einer Gruppe ausgewählt ist, die aus Folgendem besteht: einer Set-Top-Box;
einer Unterhaltungseinheit; einer Navigationseinrichtung; einer Kommunikationseinrichtung;
einer Dateneinheit mit festem Standort; einer Dateneinheit mit mobilem Standort; einer
GPS-Einrichtung (GPS = global positioning system bzw. Globalpositionsbestimmungssystem);
einem Mobiltelefon; einem Zellentelefon; einem Smart-Phone; einem SIP-Telefon (SIP
= session initiation protocol); einem Tablet; einem Phablet; einem Server; einem Computer;
einem tragbaren Computer; einer mobilen Recheneinrichtung; einer tragbaren Recheneinrichtung;
einem Desktop-Computer; am persönlichen digitalen Assistenten bzw. PDA (PDA = personal
digital assistant); einer Überwachungseinrichtung; einer Computerüberwachungseinrichtung;
einem Fernseher; einem Tuner; einem Radio bzw. Funkgerät; einem Satellitenfunkgerät;
einer Musikabspielvorrichtung; einer digitalen Musikabspielvorrichtung; einer tragbaren
Musikabspielvorrichtung; eine rdigitalen Videoabspielvorrichtung; eine Videoabspielvorrichtung;
einem DVD-Player (DVD = digital video disc); einer tragbaren digitalen Videoabspielvorrichtung;
einem Automobil; einer Fahrzeugkomponente; Avioniksystemen; einer Drohne und einem
Multicopter.
8. Ein Verfahren zum Vorsehen von prädiktiver Befehls-Dispatch-Throttling bzw. Befehlsversendungsdrosselung
in Einrichtungen, die auf einem OOP (102) (OOP = out-of-order processor) basieren,
das Folgendes aufweist:
Empfangen (400), durch eine Decodierstufe (112, 300) einer Ausführung-Pipeline (110)
einer OOP-basierten Einrichtung (100), eines Befehlsblockes (200, 302);
Extrahieren (402) eines Proxy-Wertes (202, 306), der eine approximative vorhergesagte
Zählung von einem oder mehreren Befehlen innerhalb des Befehlsblockes anzeigt, der
eine Systemressource nutzen wird;
Inkrementieren (404) einer laufenden Zählung (128) um den Proxy-Wert;
vor Absenden (406) von einem oder mehreren Befehlsblöcken, die jünger als der Befehlsblock
sind, Bestimmen, durch eine Dispatch- bzw. Versendetufe der Ausführung-Pipeline der
OOP-basierten Einrichtung, ob die laufende Zählung einen Ressourcennutzungsschwellenwert
(130) überschreitet; und
ansprechend auf Bestimmen, dass die laufende Zählung den Ressourcennutzungsschwellenwert
überschreitet, Blockieren des Versendens (410) des einen oder mehreren Befehlsblöcke,
die jünger als der Befehlsblock sind, bis die laufende Zählung den Ressourcennutzungsschwellenwert
nicht mehr erreicht; und
das Folgendes aufweist:
Empfangen (500), durch die Decodierstufe, einer Ressourcen-Overflow-Flush-Anzeige
bzw. Ressourcenüberlaufleerungsanzeige, die dynamisch den Ressourcennutzungsschwellenwert
(130) verringert (502).
9. Verfahren nach Anspruch 8, das weiter Dekrementieren (112), durch eine Commit- bzw.
Eingabestufe der Ausführung-Pipeline der OOP-basierten Einrichtung, der laufenden
Zählung um den Proxy-Wert auf Eingeben des Befehlsblockes hin.
10. Verfahren nach Anspruch 8, wobei der Proxy-Wert einen maximalen Wert von einem oder
mehrerem von Lade/Speicher-Identifikatoren bzw. LSIDs (LSID = load/store identifier)
von einem entsprechenden einen oder entsprechenden mehreren Ladebefehlen oder Speicherbefehlen
innerhalb des Befehlsblockes aufweist.
11. Verfahren nach Anspruch 8, wobei die Systemressource eine nicht geordnete Lade/Speicher-Warteschlange
bzw. ULSQ (ULSQ = unordered load/store queue) einer Lade/Speicher-Einheit bzw. LSU
(LSU = load/store unit) der OOP-basierten Einrichtung aufweist.
12. Verfahren nach Anspruch 8, das weiter Wiederherstellen des Ressourcennutzungsschwellenwertes
auf einen vorherigen Wert auf Austreten aus dem schleifenförmigen Codesegment hin
aufweist.
13. Ein nicht transitorisches computerlesbares Medium, das darauf gespeicherte von einem
Computer ausführbaren Instruktionen hat, die, wenn sie ausgeführt werden, einen Out-of-Order-Prozessor
bzw. OOP einer OOP-basierten Einrichtung veranlassen zum Durchführen des Verfahrens
nach einem der Ansprüche 8-12.
1. Dispositif basé sur un processeur hors service (102), OOP, comprenant un pipeline
d'exécution (110) comprenant un étage de décodage (112, 300) et un étage de distribution
(116), dans lequel :
l'étage de décodage du pipeline d'exécution est configuré pour :
recevoir un bloc d'instructions (200, 302) ;
extraire une valeur de substitution (202, 306) indiquant un compte prédit approximatif
d'une ou de plusieurs instructions dans le bloc d'instructions qui consommeront une
ressource système ; et
incrémenter un compte courant (128) de la valeur de substitution ; et
l'étage de répartition du pipeline d'exécution est configuré pour :
avant de distribuer un ou plusieurs blocs d'instructions plus récents que le bloc
d'instructions, déterminer si le compte courant dépasse un seuil d'utilisation des
ressources (130) ; et
en réponse à la détermination que le compte courant dépasse le seuil d'utilisation
des ressources, bloquer la distribution des un ou plusieurs blocs d'instructions plus
jeunes que le bloc d'instructions jusqu'à ce que le compte courant ne dépasse plus
le seuil d'utilisation des ressources ; et
dans lequel l'étage de décodage du pipeline d'exécution est en outre configuré pour
:
recevoir une indication de vidage de débordement de ressource pendant l'exécution
d'un segment de code en boucle ; et
en réponse à la réception de l'indication de vidage de débordement de ressource, réduire
dynamiquement le seuil d'utilisation de ressource.
2. Dispositif basé sur un OOP selon la revendication 1, dans lequel le pipeline d'exécution
comprend en outre un étage de validation configurée pour décrémenter le compte courant
de la valeur de substitution pendant la validation du bloc d'instructions.
3. Dispositif basé sur un OOP selon la revendication 1, dans lequel la valeur de substitution
comprend une valeur maximale d'un ou de plusieurs identifiants de chargement/stockage,
LSID, d'une ou de plusieurs instructions de chargement ou instructions de stockage
correspondantes dans le bloc d'instructions.
4. Dispositif basé sur un OOP selon la revendication 1, dans lequel la ressource système
comprend une file d'attente de chargement/stockage non ordonnée, ULSQ, d'une unité
de chargement/stockage, LSU, du dispositif basé sur un OOP.
5. Dispositif basé sur un OOP selon la revendication 1, dans lequel l'étage de décodage
du pipeline d'exécution est en outre configuré pour restaurer le seuil d'utilisation
des ressources à une valeur précédente pendant la sortie du segment de code en boucle.
6. Dispositif basé sur un OOP selon la revendication 1 intégré dans un circuit intégré,
IC.
7. Dispositif basé sur un OOP selon la revendication 1 intégré dans un dispositif sélectionné
dans le groupe consistant en : un boîtier décodeur ; une unité de divertissement ;
un dispositif de navigation ; un dispositif de communication ; une unité de données
de localisation fixe ; une unité de données de localisation mobile ; un dispositif
de système de positionnement global, GPS ; un téléphone portable ; un téléphone cellulaire
; un téléphone intelligent ; un téléphone de protocole d'ouverture de session, SIP
; une tablette ; une phablette ; un serveur ; un ordinateur ; un ordinateur portable
; un dispositif informatique mobile ; un dispositif informatique portable ; un ordinateur
de bureau ; un assistant numérique personnel, PDA ; un écran ; un écran d'ordinateur
; une télévision ; un tuner ; une radio ; une radio satellite ; un lecteur de musique
; un lecteur de musique numérique ; un lecteur de musique portable ; un lecteur vidéo
numérique ; un lecteur vidéo ; un lecteur de disque vidéo numérique, DVD ; un lecteur
vidéo numérique portable ; une automobile ; un composant de véhicule ; des systèmes
aéronautiques ; un drone ; et un multicoptère.
8. Procédé destiné à fournir un étranglement prédictif de répartition d'instructions
dans des dispositifs basés sur un processeur hors service, OOP, (102), comprenant
:
la réception (400), par un étage de décodage (112, 300) d'un pipeline d'exécution
(110) d'un dispositif basé sur un OOP (100), d'un bloc d'instruction (200, 302) ;
l'extraction (402) d'une valeur de substitution (202, 306) indiquant un compte prédit
approximatif d'une ou de plusieurs instructions dans le bloc d'instructions qui consommeront
une ressource système ;
l'incrémentation (404) d'un compte courant (128) de la valeur de substitution ;
avant la distribution (406) d'un ou de plusieurs blocs d'instructions plus jeunes
que le bloc d'instructions, la détermination, par un étage de répartition du pipeline
d'exécution du dispositif basé sur un OOP, si le compte courant dépasse un seuil d'utilisation
des ressources (130) ; et
en réponse à la détermination que le compte courant dépasse le seuil d'utilisation
des ressources, le blocage de la distribution (410) du ou des blocs d'instructions
plus jeunes que le bloc d'instructions jusqu'à ce que le compte courant ne dépasse
plus le seuil d'utilisation des ressources ; et
comprenant en outre :
la réception (500), par l'étage de décodage, d'une indication de vidage de débordement
de ressource pendant l'exécution d'un segment de code en boucle ; et
en réponse à la réception de l'indication de vidage de débordement de ressource, la
réduction dynamique (502) du seuil d'utilisation de ressource (130).
9. Procédé selon la revendication 8, comprenant en outre la décrémentation (412), par
un étage de validation du pipeline d'exécution du dispositif basé sur un OOP, du compte
courant par la valeur de substitution pendant la validation du bloc d'instructions.
10. Procédé selon la revendication 8, dans lequel la valeur de substitution comprend une
valeur maximale d'un ou de plusieurs identifiants de chargement/stockage, LSID, d'une
ou de plusieurs instructions de chargement ou instructions de stockage correspondantes
dans le bloc d'instructions.
11. Procédé selon la revendication 8, dans lequel la ressource système comprend une file
d'attente de chargement/stockage non ordonnée, ULSQ, d'une unité de chargement/stockage,
LSU, du dispositif basé sur un OOP.
12. Procédé selon la revendication 8, comprenant en outre la restauration du seuil d'utilisation
des ressources à une valeur précédente pendant la sortie du segment de code en boucle.
13. Support lisible par ordinateur non transitoire, sur lequel sont stockées des instructions
exécutables par ordinateur qui, lorsqu'elles sont exécutées, amènent un processeur
hors service, OOP, d'un dispositif basé sur un OOP à mettre en œuvre le procédé selon
l'une quelconque des revendications 8 à 12.