Technical Field of the Invention
[0001] This invention relates to managing an array of direct access storage devices (DASD's),
and more particularly, for balancing the additional loading (read and write accessing)
accruing to remaining array elements when at least one of the DASD's is unavailable.
Background of the Invention
[0002] It is well known that a DASD is a cyclic track storage device attached to a computing
system by a controller or device adapter. One controller may attach a "string" of
DASD's. Any DASD in the string is selected or accessed on a mutually exclusive basis.
Brady et al, EU: 91304503.5, priority date 05/24/1990, "METHOD AND MEANS FOR ACCESSING
DASD ARRAYS WITH TUNED DATA TRANSFER RATE AND CONCURRENCY", (Applicants Ref: SA9-89-028)
discloses the mapping of a sequential file of N*K data and parity blocks, K blocks
per track per DASD, onto a two dimensional array (one spatial and one temporal dimension)
by synchronously accessing N DASD's through counterpart controllers for the duration
of one track revolution.
Ouchi and Clark Patents and Parity Blocks
[0003] Ouchi, US Pat 4,092,732, "System For Recovering Data Stored In A Failed Memory Unit",
issued May 30, 1978, discloses the spreading of data blocks from the same logical
file across a string of N-1 failure independent DASDs and recording a parity block
on the Nth DASD. According to Ouchi, the parity block is an XORing of the contents
of the N-1 other blocks. Contents from any single inaccessible DASD can be recovered
by XORing the parity blocks with the blocks stored on the N-2 remaining accessible
DASDs. A similar result can be achieved if the parity blocks are not available.
[0004] Clark et al, US Pat. 4,761,785, "Parity Spreading to Enhance Storage Access", issued
August 2, 1988, modifies Ouchi by distributing parity blocks over DASDs such that
the unavailability of one DASD (i.e the parity DASD in Ouchi) would avoid rendering
all parity blocks unavailable especially for operation in degraded mode.
Intra-block Parity And The Parity Code Block Distinguished
[0005] Typically, a parity suffix or equivalent is appended to each data block and may be
used in the detection or correction of intra-block error. Efficient intra-block codes
per se (Hamming, Cyclic Redundancy Check, Reed-Solomon) are elsewhere treated in the
literature.
[0006] In contrast, parity blocks, as described in Ouchi, are involved when one or more
of the data blocks of an N-1 sequence are unavailable. In that event, the parity block,
which a′ priori spans an N-1 block sequence, is XOR'd with the remaining blocks to
rebuild the unavailable data block.
Patterson's DASD Array Levels
[0007] Patterson et al, "A Case for Redundant Arrays of Inexpensive Disks (RAID)", ACM SIGMOD
Conference, Chicago Illinois, June 1- 3, 1988 discusses various ways of organizing
redundant data and DASD's to enhance data availability. In this regard, Patterson
describes logical record to physical track mapping and accessing onto a DASD array
in column major order thereby accommodating both small and large access requests.
Furthermore, he describes the calculation of new parity block as the XORing of old
data, new data, and old parity.
[0008] Patterson's third level or array type causes reads and writes to be made synchronously
to N DASD's. In this arrangement, N-1 DASD's contain data and one DASD contains a
parity ranging over the other data DASDs. That is, one check DASD is provided for
the group. The contents of the failed DASD can be reconstructed in the manner of Ouchi.
[0009] Patterson's fourth level improves performance with respect to small read and write
accesses. This is achieved by storing blocks along the column extent so that in a
first time slot (DASD sector 1) blocks 1 to N can respectively be stored on DASD 1
to N while in the second time slot (DASD sector 2) blocks N+1 to 2N are stored etc.
In the Kth time slot (DASD sector K) blocks [(K-1)]*N+1 to KN are stored on corresponding
devices.
[0010] In addition to column track layout, the fourth level permits access to DASD individually.
This means that small transfers can occupy few DASDs while large transfers can also
be accessed synchronously across N devices.
Parity Group Defined
[0011] According to Patterson, it is known to distribute data and parity blocks such that
in a C*S array of C controllers and S DASD's per controller string, a physical area
(or region) on each of K<C different DASDs in K<C different strings constitutes a
"parity group" where one of the areas (or region) is the parity of the data contained
in the other K-1 areas (or regions).
[0012] One consequence of spreading the blocks of a parity group among K failure independent
DASD's and strings is that the mean time between data failure (MTBF) is much higher
than that of each individual physical DASD. This is because the array can continue
to operate, in degraded mode, even after the failure of one of the individual drives
(or a controller) in the array. Additionally, data on the failed DASD can be re-created
from the data and parity on the remaining K DASDs that have not failed.
Array Loading And Operation In Degraded Mode
[0013] Array work (AW) is measured as the number of read and write requests executed by
the DASD's. The distribution of such requests is a function of the distribution of
data. In turn, the distribution of data including redundancies is a function of load
balancing, recovery, and degraded mode operation. Parenthetically, read and write
requests may each comprise multiple substituent read (R) and write (W) operations.
[0014] An array operating in "degraded mode" means that the array continues in operation
notwithstanding the fact that an element in the path to the data such as a controller
or DASD is unavailable usually due to fault, failure, or interruption. The prior art
teaches several distributions of data and parity blocks among the array DASD's and
their disks. Such distributions have resulted in significant load imbalance to the
S*C-1 or fewer other DASD once a given DASD or controller becomes unavailable. One
result of the imbalance is that some DASD are being accessed significantly more than
others with a concommitant change in the loading and performance statistics of both
the array elements and the path to data.
Disclosure of the Invention
[0015] It is an object of this invention to devise a method and means for managing DASD
array access such that in the event of the unavailability of data on one or more of
the DASD, data could be reconstructed and array could continue operation where the
additional access load per DASD would be minimized.
[0016] Accordingly the invention provides a method of balancing the frequency of read and
write accesses to the direct access storage devices (DASD) of an array comprising
C strings of S DASDs, the method comprising the steps of:
[0017] writing parity groups of data across the DASDs of the array according to a balanced
block design, each of said parity groups comprising K-1 data areas plus a parity area,
such that when the data on a DASD in one of said strings becomes unavailable, the
loading of subsequent read and write operations to said array is uniform across the
DASDs in the unaffected strings.
[0018] In this invention, parity groups of the same or disparate size are uniformly distributed
among array elements. Also, it was found that by restricting the groups to a small
size, minimal additional loading for each array DASD when operating in degraded or
failure mode can be guaranteed.
[0019] As used in this specification, the term "block" has two meanings. The first refers
to a information type of standard size say 4096 bytes used for data or parity encoding
and recording purposes. The second refers to combinatorial distribution of parity
groups including replications to effectuate uniformity of access load using deterministic
processes which are otherwise used in the design of experiments.
[0020] In this invention, a statistically balanced incomplete block design (BIBD) is used
in a preferred embodiment to guarantee a near uniform distribution and recording of
parity groups over the C*S array of C controllers and S DASD's per controller. To
satisfy the design, each DASD is partitioned into N equal sized regions or recording
areas, and, the N regions are assigned to the parity groups such that:
(1) any two regions in the same parity group are located on different DASD's;
(2) each of the N regions on each DASD is assigned to one of N different parity groups;
(3) for every pair of DASD's, there are exactly M parity groups containing regions
from both DASD's.
[0021] The balanced block designs satisfying this new use, in addition to the BIBD, include
Latin Squares, and Hadamard matrices. Also, if each parity group was kept lower than
the number of controllers i.e. K < C, then additional DASD space so used would further
reduce the loading/DASD.
[0022] If each DASD receives AW access requests, then the requests or work load per region
is AW/N. When a DASD fails or otherwise becomes unavailable, the data within a region
can be reconstructed by accessing the K-1 other regions within its associated parity
group. Since any one of the C*S-1 other DASD's has exactly M parity groups in common
with the unavailable DASD, then it will receive an additional load of M*AW/N.
[0023] A preferred embodiment of the invention will now be described, by way of example
only, with reference to the accompanying drawings.
Brief Description of the Drawings
[0024]
Figure 1 shows a distribution of a parity group of data blocks among disparate DASD's
in an array according to the prior art;
Figure 2 also shows a distribution of parity groups of data blocks across a DASD array
row vector such that all parity blocks are mapped onto a column vector of DASD's attached
to the same controller;
Figures 3 and 4 depict distributions of parity groups of data blocks across a DASD
array facilitating data reconstruction in the event of single DASD failure;
Figures 5 and 6 illustrate the load imbalances resulting from the unavailability of
a single DASD on the remaining DASD's within a string or array;
Figure 5A shows the degraded load vs the number of controllers as a parametric function
of parity group size;
Figures 7-9 set forth the principle of the invention that a balanced block design
distribution of parity groups among DASD's in an array will minimize the additional
load per DASD occasioned by the unavailability of the data on one of the DASD.
Detailed Description of the Invention
[0025] The invention arises out of a new use for a statistically balanced block design or
other uniform distribution of parity groups across a DASD array. The new use comprises
the steps of (a) forming and writing parity groups according to a balanced block design
or equivalent; and (b) accessing remaining DASD's in the array in a minimal referencing
pattern as a function of the parity group distribution given the unavailability of
a path to data (failed control unit or DASD).
[0026] Block designs are derived from the design of experiments in which test objects are
arranged in enumeratively exhaustive sub- sets (blocks) and exposed to an experimental
regime. The pattern and replication of the objects permits distinguishing various
responses to the experimental regime as being a consequence of chance or a result
of one or more co-factors. Time available, test economics, and sensitivity limit the
size and the degree of enumerative exhaustion possible. For these reasons, there are
many block design modalities yielding uniform distribution of objects. Among ones
of interest include Balanced Incompleted Block Design (BIBD), Latin Squares, and Hadamard
Matrices. Reference should also be made to Raghavarao, "Constructions And Combinatorial
Problems In The Design Of Experiments", copyright Dover Publications 1971, 1988 and
Berman and Fryer, "Introduction to Combinatorics", copyright Academic Press Inc. 1972.
[0027] The following discussion is directed to illustrating the method steps using BIBD.
Balanced Incomplete Block Design And DASD Arrays
[0028] Each object v(i) in a set of v objects termed "varieties" is replicated r times to
form a collection of v*r objects. The v*r objects are distributed into b subsets called
blocks. Each block contains k of v varieties with no block containing a variety more
than once.
Consequently,

[0029] Furthermore, each pair of varieties v(i),v(j) can occur in exactly lambda blocks.
Thus, there are v*(v-1)/2 pairs of varieties and each block contains k*(k-1)/2 of
the v*(v-1)/2 pairs. Since every variety occurs in exactly r blocks, every variety
occurs in r*(k-1) pairs. From this is derived the relation


[0030] Restated, a system of blocks satisfying (1)-(3) is termed a (b,v,r,k,lambda) BIBD.
It is an arrangement of v varieties formed into b blocks such that:
(1) each block contains exactly k varieties;
(2) every pair of varieties occurs together in exactly lambda blocks; and
(3) the varieties are replicated exactly r times.
A DASD array can be constructed from a BIBD by mapping:
- variety
- --> DASD
- block
- --> parity group
Then,
array size C*S = v,
number of parity groups = b,
size of each parity group (now a constant) K = k,
number of regions per DASD N = r, and
M = lambda.
[0031] The resulting array has the following characteristics:
(1) each parity group is associated with K distinct DASD's
(2) each DASD occurs in N parity groups
(3) any pair of DASD's occur together in M parity groups.
Distribution of Regions (Blocks) of a Parity Group Across DASD's In The Prior Art
[0032] Referring now to figure 1, there is shown a distribution of a parity group of data
blocks among disparate DASD's in an array in the manner of Patterson proposed as a
solution for providing high availability. These arrays are arranged in rectangular
structure with S DASDs/controller (strings 9,11,13; 15,17,19; 21,23,25; and 27,29,31)
and C controllers 1, 3, 5, and 7. This array has a total of C*S DASDs. A physical
area (or region) on each of K different DASDs in K different strings constitutes a
"Parity Group" where one of the areas (or region) is the parity of the data contained
in the other K-1 areas (or regions).
[0033] The processing of a read request has no adverse affect on performance of the array.
However, the processing of a write request involves multiple read and write accesses
and compute operations. Relatedly, the old data and parity blocks must be first read.
Next, the new parity is formed from the exclusive ORing of the old data block, the
new data block, and the old parity block. It is then necessary to write the new data
and parity blocks to their data DASD and parity DASD array locations. Thus, a read
request is satisfied by one access while a write request requires four accesses.
[0034] Given an array subject to R read requests + W write requests, then the R read requests
and W write requests are transformed into an array load of R+4W DASD accesses. Significantly,
if the load is uniformly spread over the C*S DASDs, then each DASD will have a load
of (R+4W)/(C*S) accesses.
[0035] Referring now to figure 2, there is shown a distribution of parity groups of data
blocks across a DASD array row vector such that all parity blocks are mapped onto
a column vector of DASD's attached to the same controller. Each array has assumed
a parity group of K<C areas. The K<C areas labeled "a", "b", etc., contain data. Other
areas, labeled "pa", "pb", "pc", ..., "pz", are used to hold parity information from
the corresponding K<C data areas. It is well known, that if all the parity information
is contained on DASD's in the same string, then the accesses will be non-uniformly
distributed such as illustrated in this figure.
[0036] Each DASD in the parity string will have an access load of (2W)/(S) accesses while
each DASD in the other strings will have an access load of (R+2W)/(S*(C-1)) accesses.
It is commonly accepted that such a non-uniformly distributed load has undesirable
performance penalties.
[0037] Referring now Figures 3 and 4 there are depicted distributions of parity groups of
data blocks across a DASD array facilitating data reconstruction in the event of single
DASD failure.
[0038] In figure 3, array controllers 101, 103, 105,and 107 respectively couple DASD strings
109,111; 115,117; 121,123; and 127,129. Illustratively, in order to balance the access
load on the system in the prior art, it was common to distribute the parity areas
in an C*S array of dimension 4X2 where DASD's 109, 115, 121 each have physical spaces
called "a" that contain data. DASD 127 has a physical space called "pa" that is used
to hold the parity information over the 3 other physical spaces called "a". In a similar
manner, the DASD's each have physical spaces called "b" that contain data and DASD
3 has a physical space called "pb". With this configuration that for K=3, 1/(K+1)
= 25% of each DASD is used for parity information.
[0039] Assuming that the array work AW is uniformly distributed over the physical data regions
in an array such as shown in Fig.-3, each data region will have (R+W)/(K*S*C) requests.
Due to the parity, this work will be transformed into (R+2W)/(K*S*C) real accesses
to each data region and 2W/(S*C) real accesses to each parity region. Thus, each DASD
will have (R+4W)/(S*C) real accesses independent of its position in the array.
[0040] Referring now to figure 4, if a DASD in an array fails, the array can still function
in degraded mode by reconstructing data from existing data plus the parity. For example,
if DASD 109 fails, then read requests for data in region "a" on DASD 109 can be satisfied
by reading the corresponding fields in regions "a" on DASD's 115 and 121, and region
"pa" on DASD 127 and then re-constructing the data that was in region "a" on DASD
109. Thus, each read request to a data region on a failed DASD gets transformed into
K=C-1=3 read requests. A write request to region "a" on DASD 109 can be satisfied
by reading the corresponding fields in regions "a" on DASD's 115 and 121, constructing
the new parity, and writing it in region "pa" on DASD 127. Thus, each write request
to a data region on a failed DASD becomes transformed into K-1=C-2=2 read requests
and one write request.
[0041] Assuming that the array work AW is uniformly distributed over the physical data regions
in an array such as shown in Fig.-4, each data region will have (R+W)/(S*C*K) requests.
If DASD 109 fails, the operation of DASD's 111, 117, 123, and 129 will be unaltered.
Also, each of those DASD's will still have (R+4W)/(S*C) real accesses.
[0042] Each read request to a data region on DASD's 115, 121, 127 results in exactly one
read request. Each write request to data regions "a", "b", or "c" on DASD's 115 ,
121, or 127 results in two read requests and two write requests as before. However,
a write request to data region "d" on DASD's 115, 121, or 127 results in only one
write request. Thus, the (R+W)/(S*C) requests to each DASD 115, 121, or 127 will be
transformed into (KR+(4K-3)W)/(S*C*K) accesses where K=C-1. Additionally, each read
request to DASD 109 generates one read request to each of DASD's 115, 121, and 127
and each write request to DASD 109 generates either one read or one write request
to each of DASD's 115, 121, and 127. Thus, the total load on each of the DASD's 115,
121, and 127 is (2KR+(5K-3)W)/(S*C*K) where K=C-1.
[0043] From the above for figure 4, the total load on an array with one DASD failure as
R+4W+(((K-1)R + (K-7)W))/(S*C)).
The Effect Of Constraining Parity Group Size According to The Method Of The Invention
[0044] Referring now to figures 5 and 6, there are illustrated the load imbalances resulting
from the unavailability of a single DASD on the remaining DASD's within a string or
array. In this regard, figure 5 depicts an array with S=2 and C=4 and K=2 to show
the data regions and parity regions. With a failed DASD 109, a request for data in
region "a" on DASD 109 results in accesses to region "a" on DASD 115 and region "pa"
on DASD 121. However, there is no access to DASD 127. Thus, the reduced load on the
array. For the example shown in figure 4, the extra load on the array is (R-5W)/8
with K=2. This should be compared to (2R-4W)/8 in the array set out in figure 3 with
K=3.
[0045] The penalty paid for going from K=3 to K=2 is that more DASD space is taken up with
parity data. 1/3 of each DASD has parity data when K=2 compared to 1/4 of each DASD
containing parity data when K=3. In general 1/(K+1) of each DASD contains parity data.
[0046] Referring now to figure 5A, there is set out the effect of S=string length, C=number
of controllers, (R/W)=read-write ratio, and K on the degraded array load for string
lengths S of 2 and 7, read-write ratios of 1 and 3, number of controllers ranging
from 3 to 128, and values of K ranging from 2 to 63. The vertical axis on each graph
is normalized to show the ratio of degraded load accesses to normal load accesses.
It should be noted that with K=C, which has been the array standard, the ratio of
degraded mode accesses to normal mode accesses is an increasing function of the number
of controllers, whereas, with a fixed value of K the rate of increase is less or it
decreases as controllers are added.
[0047] Figure 6 shows the skewed distributions of read and writes which occurs as a result
of a single DASD failure.
Uniform Distributions According to the Invention and Their Effect Upon Loading
[0048] Referring now to Figures 7-9, there is set forth the principle of the invention that
a balanced block design distribution of parity groups among DASD's in an array will
minimize the additional load per DASD occassioned by the unavailability of the data
on one of the DASD.
[0049] This invention requires that the parity groups be allocated on the DASD's such that:
1) a given parity group only appears once in any string so that a controller failure
results in only one member of each parity group affected by the failure to become
unavailable and data accesses to that string can be reconstructed from the other K
members of the parity groups on K other strings; and 2) parity groups be placed on
the DASDs such that accesses to a failed DASD on one string result in uniformly distributed
accesses to the (S*(C-1)) other DASDs on (C-1) other strings. In this way inter-string
skew can be eliminated. An example of such a parity group placement is shown in figure
7.
[0050] Referring again to figure 7, there are depicted data regions shown for K=2, S=2,
and C=4. These have the property that if all DASD's are working, then the load is
uniformly spread over the S*C DASDs in the array, and if any one DASD fails, then
the degraded load is spread as follows: the S-1=1 DASD in the string with the failed
DASD have no change in their access load; the (S*(C-1))=6 remaining DASDs in the array
have the remaining degraded load uniformly spread over the DASDs. Thus, when a DASD
fails, no string in the array has any skew. This is summarized in figure 8 for the
specific array with K=2, S=2, and C=4 and the data regions as shown. The state of
affairs is also summarized in figure 9 for any array with parity groups allocated
to disks such that the degraded load is uniformly spread over the (S*(C-1)) disks
which take the degraded load.
Illustrative Example Of A Uniform Distribution According To The Method Of The Invention
[0051] As an example of doing such a mapping, consider the following BIBD with b=15, v=10,
r-6, k=4, lambda=2.

Table 1
[0052] This mapping can be used in an array of 10 DASD's with 15 parity groups, each DASD
having 6 regions and each parity group consisting of 4 regions:

Table 2
Generation of Of The Uniform Distribution According To The Method Of The Invention
[0053] There are different methods for generating BIBD. One of these methods is the lattice
design which yields a design with

[0054] A simple lattice design is based on a square array containing the set of varieties.
As an example, for the set of 3*3=9 varieties the square array would be

[0055] A lattice design uses blocks of n varieties in groups of n blocks. The first group
consists of one block for each row of the array. The second group consists of one
block for each column of the array. The other (n-1) groups are defined by superimposing
on the n x n square array, arrays whose elements are the letters a, b, c, ..., n arranged
so that each letter occurs once in each row and once in each column. Thus the arrays

[0056] can be used to generate 2 groups of 3 blocks, each block being defined by the varieties
associated with a letter. For instance, the letter 'a' of the left hand array would
yield the block (1 5 9). The blocks of the resulting block design is as follows:

Table 3
[0057] Example of parity groupings where the parity group size is not constant, and satisfies
the condition for uniform workload distribution in the event of a DASD failure -

S = 7 N = 7 Parity Groups of size 3 and 4 M = 3
Table 4
Extensions
[0058] It should be appreciated that a balanced or uniform distribution of parity groups
can be written to the array DASD's even where at least two of the data areas occupied
by the parity groups are not equal or otherwise are incommensurable. Also, if some
of the DASD's have unassigned recording space, then such space can be conveniently
used to store a reconstructed parity group so as to read DASD rather than XORing parity
group components each time a reference is made.
1. A method of balancing the frequency of read and write accesses to the direct access
storage devices (DASD) of an array comprising C strings of S DASDs, the method comprising
the steps of:
writing parity groups of data across the DASDs of the array according to a balanced
block design, each of said parity groups comprising K-1 data areas plus a parity area,
such that when the data on a DASD in one of said strings becomes unavailable, the
loading of subsequent read and write operations to said array is uniform across the
DASDs in the unaffected strings.
2. A method as claimed in claim 1 wherein the step of writing parity groups comprises:
writing a plurality N of said parity groups such that each DASD is partitioned
into N equal sized regions or recording areas and assigning regions to the parity
groups such that (1) any two regions in the same parity group are located on different
DASD's, (2) each DASD has one region assigned to each one of N different parity groups,
and (3) for every pair of DASD's, there are exactly M parity groups containing regions
from both DASD's; the method further comprising the step of:
responsive to the unavailability of a path to addressible information on a failed
DASD, accessing the remaining (C*S)-1 other DASD in a minimal referencing pattern
as a function of the uniformity of the distribution of parity groups.
3. A method as claimed in claim 2, wherein the assignment of regions to the parity groups
results in a near uniform distribution of parity groups among the array DASD's, said
distribution being selected from the set consisting of balanced incomplete block design,
Latin Squares, and Hadamard matrices.
4. A method for balancing the frequency of read R and write W accesses to a C*S array
formed from C string controllers and S DASD/string, a parity measure (XOR) being defined
over information located on counterpart physical recording areas of K < C*S DASD's,
a "parity group" of K recording areas comprising K- 1 data areas + the parity measure,
information in any given parity group located on an unavailable DASD being reconstrufted
by logically combining (XORing) information retrieved from the K other DASD's, comprising
the steps of:
(a) forming and writing parity groups of K areas/group approximately uniformly over
counterpart DASD's in the array as generated by a deterministic process; and
(b) responsive to the unavailability of a path to addressible information on a failed
DASD, accessing the remaining (C*S)-1 other DASD in a minimal referencing pattern
as a function of the uniformity of the distribution of parity groups.
5. A method as claimed in claim 4, wherein the deterministic process is selected from
a combinatorially generative unbalanced block procedure.
6. A method as claimed in claim 4 or claim 5, wherein the size K of each parity group
is less than the number of controllers C i.e. K < C.
7. A method as claimed in any of claims 4 to 6, wherein the deterministic process is
selected from the set consisting of balanced incomplete block design, Latin Squares,
and Hadamard matrices.
8. A method as claimed in any of claims 4 to 7, wherein step (a) further comprises the
step of allocating the parity groups on the DASD's such that: (1) a given parity group
only appears once in any string, and (2) parity groups be placed on the DASDs such
that accesses to a failed DASD on one string result in uniformly distributed accesses
to the (S*(C-1)) other DASDs on (C-1) other strings.
9. A method as claimed in claim 8, wherein responsive to a controller failure, reconstructing
the data on the unavailable DASD from the other K-1 members of the parity groups on
K other strings.
10. A method as claimed in any preceding claim, wherein the number of data areas in at
least two of the parity groups are not equal.
11. A method as claimed in any preceding claim, wherein at least one of the DASD in the
array includes recording space not dedicated to a parity group, said recording space
being of size sufficient to accomodate all or part of a parity group or groups recorded
in a sparing mode.