STRUCTURED FILE ENCRYPTION PROCESS

(19)

(11)

EP 3 567 796 A1

(12)	EUROPEAN PATENT APPLICATION

(43)	Date of publication:
	13.11.2019 Bulletin 2019/46

(21)	Application number: 19168701.1

(22)	Date of filing: 11.04.2019

(51)

International Patent Classification (IPC):

H04L 9/06^(2006.01)

G06F 21/62^(2013.01)

(84)	Designated Contracting States:
	AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR
	Designated Extension States:
	BA ME
	Designated Validation States:
	KH MA MD TN

(30)

Priority:

10.05.2018 GB 201807612

(71)	Applicant: Rolls-Royce plc
	London SW1E 6AT (GB)

(72)	Inventor:
	Lapworth, Bryan Derby, Derbyshire DE24 8BJ (GB)

(74)	Representative: Rolls-Royce plc
	Intellectual Property Dept SinA-48 PO Box 31 Derby DE24 8BJ Derby DE24 8BJ (GB)

(54)	STRUCTURED FILE ENCRYPTION PROCESS

(57) There is disclosed a method and system for cryptographic transformation of a structured data set (28). The structured data set (28) is partitioned into a first subset (28A) and a plurality of further subsets (28B, 28C, 28D) for encryption in parallel. The subsets (28A-28D) are divided into a plurality of blocks of predetermined size. A first block (34) for each subset is identified as well as a location of each further block in said subset (28A-28D) relative to said first block (34) of its subset. Cryptographic transformation of the data subsets (28A-28D) is performed using a key according to a block chain process and an offset value for the first block of each subset (34) from the first block (34A) of the first subset (28A) is logged. The process allows a block chain to be broken into part way in the chain. The process may allow different partitioning to be used for decryption than was used for encryption, thereby allowing parallel processing on varying numbers of computational cores (18, 20, 22, 24).

Description

BACKGROUND OF THE INVENTION

[0001] The present disclosure concerns file encryption and decryption processes, particularly for files having a predetermined data structure/topology.

[0002] Engineering geometry and simulation tools, such as computational fluid dynamics (CFD) and finite element analysis (FEA) software, require definition of a domain to be modelled, e.g. as a computer aided design (CAD) model representing the geometry of an area/volume to be studied. CFD and FEA techniques require discretisation of the domain, i.e. the geometric definition of the relevant area/volume, into a network of adjoining cells/elements. A complex simulation (numerical analysis) for the domain as a whole is split into individual calculations for each cell/element, with the output of each being fed to the next until as solution of the domain has been attained.

[0003] The process of domain discretisation is referred to as 'meshing' the domain/geometry. Various structured, unstructured or hybrid meshing approaches are available, using different shapes of cells, each of which result in a significant number of points over the domain defining locations within the mesh. Each point may be represented as a location in a 2D or 3D coordinate system.

[0004] When a numerical analysis is run on the meshed geometry/domain, this results in additional data being generated at each point corresponding to the variables being calculated, such as forces/pressures, temperatures, velocities, etc.

[0005] There is a clear need to be able to share such engineering geometry files and simulation results securely.

[0006] AES (Advanced Encryption Standard) was established by the US National Institute of Standards and Technology in 2001 as the recognised standard required by the United States federal government. It has also become an international encryption/decryption standard.

[0007] AES works on 128bit blocks of plain text and performs several 'rounds' to create the encrypted cipher text. The user must provide a secret key for the encryption. AES is a symmetric cipher and the same secret key is used for decryption.

[0008] A cryptographic weakness of AES is that the secret key is used with each 128bit block and so patterns in the original plain text are visible in the cipher text. Information about the encrypted file contents may be inferred from the pattern if the recipient has knowledge about the nature of the data therein.

[0009] One additional complexity with engineering simulation tools, such as CFD and FEA, is that they are computationally expensive. There is often a need to share the simulation job over a number of processors in order to achieve a solution in a practical manner. Simulation codes such as CFD adopt a domain decomposition approach to running a job in parallel. If the code is to be executed on N cores, then the CFD mesh is divided in N roughly equal domains and each domain is allocated to one of the N cores. Here, N can range currently from order 10's to 1,000,000's of cores/domains, the number increasing with the development of larger computers. Each core reads and writes its data to and from a master file which contains the entire mesh.

[0010] At the time of encryption, the number of cores on which the job will be run may not be known. The number of cores used to encrypt the data may differ from the number of cores used to decrypt the data. Using conventional encryption methods, it is a problem to divide up the encrypted file in an ad-hoc manner and decrypt the individual parts thereof.

[0011] It is an aim of the disclosure to provide a cryptographic transformation processes which mitigates or resolves one or more of the above-identified problems.

BRIEF SUMMARY OF THE INVENTION

[0012] According to a first aspect of the present disclosure there is provided a method of encryption of a structured data set comprising: partitioning the structured data set into a first subset and a plurality of further subsets; dividing the subsets into a plurality of blocks of predetermined size; identifying a first block for each subset and a location of each further block in said subset relative to said first block of its subset; encrypting the data subsets using a block chain encryption process, and logging an offset value for the first block of each subset from the first block of the first subset.

[0013] According to a second aspect of the present disclosure there is provided a method of decryption of an structured encrypted data set comprising: partitioning the structured encrypted data set into a first subset and a plurality of further subsets; dividing the subsets into a plurality of blocks of predetermined size; identifying a first block for each subset and a location of each further block in said subset relative to said first block of its subset; identifying an offset value for the first block of each subset from the first block of the first subset; and, decrypting each block of each data subset using a block chain decryption process, wherein the first block of each subset is decrypted according to said offset value.

[0014] The block chain encryption and/or decryption process for each subset may be performed in parallel, e.g. using a number of cores. The process may permit decryption in parallel, e.g. using a number of subsets which is the same as or different from the number of subsets used for encryption.

[0015] The method may comprise identifying a number of available cores for the encryption/decryption process and partitioning the structured data set into the number of subsets based on said number of cores.

[0016] The method may comprise assigning one or more subset to each of a plurality of cores for cryptographic transcription. The method may comprise dividing the data set based on the available number of cores and/or substantially equally over the available number of cores.

[0017] The offset value may comprise a location/position of each subset in the data set structure relative to the first subset. The offset value may comprise or indicate a location/position of the first block of each subset relative to the first subset and/or the first block of the first subset.

[0018] The plurality of blocks may collectively define the entire data of the subset. The blocks may be adjoining and/or sequential.

[0019] Each block in a subset may be sequentially identifiable from a preceding block in the subset and/or the first block in that subset. Each block in a subset may comprise a sequential identifier.

[0020] The first block in a subset may or may not comprise the first/leading block of the sequence of blocks in the subset. The first block may be mid-way in the sequence of blocks in the subset.

[0021] The first block in each further subset may comprise a sequential identifier which follows in sequence from the identifier of a last block of a preceding data subset.

[0022] The sequential identifier may comprise a count or counter value. The offset value may comprise the counter value and/or the counter value may be additional to the subset offset value.

[0023] A partition counter value may be logged for each partition/subset, e.g. in addition to a block counter value for each block in that subset.

[0024] The partitions and/or subsets may be sequential and or may define an adjoining series of subsets in the structured data set.

[0025] Each data subset may comprise a plurality of rows or lines of the data set.

[0026] The data subsets may be substantially equal in size.

[0027] An initialization vector may be used, applied and or logged for the first block of each subset. The initialization vector may be used for cryptographic transformation. The initialization vector may be a nonce.

[0028] A cipher block chain encryption/decryption mode may be used.

[0029] A counter block chain encryption/decryption mode may be used.

[0030] The decryption process may comprise identifying/finding and decrypting one or more first block contained in each subset, e.g. according to the offset values. The process may comprise logging the decrypted/plain text for each identified first block. The process may comprise finding a sequentially preceding and/or following block relative the first block.

[0031] The method may comprise identifying a last block of each subset.

[0032] The method may comprise dividing each subset into a whole number of blocks of predetermined size.

[0033] The method may comprise padding one or more block of each subset in order to meet the predetermined block size.

[0034] The data set may comprise geometric data, e.g. geometric model data. The data set may comprise a physics-based model data set. The data set may comprise a CAD model, CFD model, FEA model, or similar model type. The data set may comprise geometric mesh data.

[0035] The data set may comprise point data, e.g. coordinate data for a suitable 2D or 3D coordinate system, such as a Cartesian, cylindrical, spherical coordinate system, etc. The point data may comprise 1,000s, 10,000s, 100,000s, 1,000,000s, 10,000,000s, 100,000,000s or 1,000,000,000 of points or more.

[0036] The data set may comprise point data and data for one or more variable at each point.

[0037] The data set may comprise mesh data and/or model solution data.

[0038] The method may comprise decrypting the data set, running numerical analysis or processing of the decrypted data, e.g. over a plurality of cores, and re-encrypting the processed data, e.g. using the same plurality of cores as for the numerical analysis/processing.

[0039] According to a third aspect of the present disclosure, there is provided a data carrier or data storage medium comprising machine readable instructions for one or more processor to perform the method of the first or second aspects.

[0040] The one or more processor may control a cryptographic transformation process. The one or more processor may comprise one or more first processor for partitioning the structured data set and assigning the subsets to a plurality of further processors for performing the cryptographic transformation process on the subset assigned to it by the first processor.

[0041] According to a fourth aspect of the invention, there is provided a system for performing a cryptographic transformation process in parallel across a plurality of computer processors wherein the plurality of processors operate in accordance with the method of the first or second aspect, or under the control of the machine readable instructions of the third aspect.

[0042] The skilled person will appreciate that except where mutually exclusive, a feature described in relation to any one of the above aspects may be applied mutatis mutandis to any other aspect. Furthermore, except where mutually exclusive any feature described herein may be applied to any aspect and/or combined with any other feature described herein.

[0043] Embodiments will now be described by way of example only, with reference to the Figures, in which:

Figure 1 is a three-dimensional view of a geometric domain of a computational model;

Figure 2 is a schematic view of the partitioning of the geometric domain for computational processing in parallel;

Figure 3 is a schematic of the partitioning of a data set/file into subsets for parallel processing;

Figure 4 shows the process of encrypting, storing and retrieving a partitioned data set;

Figure 5 shows a process of block definition and encryption;

Figure 6 shows an example method of cryptographic transformation using a first mode;

Figure 7 shows an example method of cryptographic transformation using a second mode;

Figure 8 shows further detail of the block chain encryption and decryption process associated with Figure 7;

Figure 9 shows an example method of cryptographic transformation using a third mode

Figure 10 shows further detail of the block chain encryption and decryption process associated with Figure 9;

Figure 11 shows an example data set or file encrypted in parallel using a first number of partitions but decryptable using a different number of partitions;

Figure 12 shows the decryption of the encrypted output stemming from Figure 11 but using a second number of partitions;

Figure 13 shows an encrypted output file or data set output generated after the decryption of Figure 12;

Figure 14 shows an example of the data and metadata for separate encryption and/or storage;

Figure 15 shows a computational system for managing a cryptographic transformation process according to an example of the disclosure.

[0044] The following description refers to cryptographic transformation of data, which may otherwise be referred to as cryptographic translation or encryption/decryption.

[0045] Turning firstly to Figure 1, there is shown a geometric domain 10 of a computational model, in this case a CAD model 11 that is prepared for CFD analysis. The domain 10 defines the geometric boundaries, i.e. the extent of the domain as represented by the end, side, upper and lower walls of the domain. The domain in this example is therefore a three-dimensional domain defined according to a Cartesian coordinate system. In other examples, a two-dimensional domain could be used as well as other types of conventional coordinate systems.

[0046] Within the domain, there is defined the geometry of one or more body 12, in this example taking the form of one or more aerofoil 12 within the domain. The surfaces of the body 12 are geometrically defined such that the solid body interior is captured and the surrounding space in the domain. The surrounding space is modelled in this example as a fluid medium for CFD analysis.

[0047] The fluid portion of the domain 10 is discretised into multiple adjoining cells by application of a mesh 14 there-over. The mesh is thus an intersecting framework of links/edges defining a grid-like structure. Depending on the size of the domain and the level of accuracy required, the cell count could be of an order anywhere between thousands and billions (10⁹) or trillions of cells. The cells may be considered akin to voxels for a three-dimensional domain.

[0048] In the example shown it can be seen that the cell density need not be uniform over the whole domain 10 and greater numbers of cells may be used where greater accuracy/fidelity is needed, e.g. where a more complex flow regime is expected.

[0049] The mesh is defined by point data within the domain. The points provide locations at which numerical analysis will be performed. The numerical analysis or computational modelling in this example comprises solving a plurality of physics-based equations representing the flow behaviour through the domain (e.g. conservation of mass, energy, momentum equations). For a set of input boundary conditions, the numerical analysis is performed for each cell in order to arrive at a solution for the whole domain. Typically an iterative approach is used to converge on the solution.

[0050] The relevant data may thus comprise the geometric domain data itself, or else the solution data, comprising the domain data and the values for the relevant variables at each point of the domain.

[0051] In Figure 1, a plurality of partitions 16 have been defined in order to allow the domain 10, including the mesh 14, to be broken up into a number of subdomains or subsections 10A, 10B, 10C and 10D. The subdomains 10A-D may be of substantially equal size or cell count but this is not essential.

[0052] Figure 2 shows how the different subdomains of the geometric domain 10 can be assigned to different processors/cores 18, 20, 22 and 24 of a computational system/network 26 for computational processing. Sharing the computational job in this manner can greatly increase the speed with which the numerical analysis can be performed.

[0053] Figure 3 shows a data set/file 28 for the domain 10 shown in Figure 1 and how the partitions 16 are used to divide up the data set 28 into subsets 28A, 28B, 28C, 28D corresponding to the domain subdomains 10A-D.

[0054] The data subsets 28A-D are assigned to the cores 18-24.

[0055] In the examples shown, only three partitions 16, i.e. four domain subdomains and data sets, are shown for simplicity. It will be appreciated by the skilled person that one or two partitions could be used if desired but that typically many more partitions would be used. Similarly, the computational processing system is represented as a four-core CPU 16 but could include any number of cores/processors operable in parallel, e.g. as a single computer or multiple computers connected over a network.

[0056] Turning to Figure 4, there is shown an example of use of the methods and systems disclosed herein. The data subsets 28A-D are encrypted according to a suitable encryption standard using the available cores 28A-D and written to a data store 30 but it is later desirable to read and decrypt/process the data from the data store using a different number of cores 32. It may be desirable to encrypt, e.g. using AES, the mesh and also the solution which is decomposed using the same domains as the mesh. It is desirable to decrypt the mesh as it is read and encrypt the solution as it is written.

[0057] It would be desirable if successive encryption/decryption cycles for the same geometric domain and/or solution data could make use of different numbers and locations of partitions 16.

[0058] The examples of the disclosure below are based upon the application of four common principles as follows:

(i) The data set must be divisible into blocks of a common size, e.g. 128bit
(ii) Partitioning must fall on a block boundary
(iii) Enhanced/secure block cipher modes are applied to each domain subsection independently
(iv) Decryption must be possible using a different number of domains than were used to encrypt the data.

[0059] Whilst serial streams of data to be encrypted do not need to be exact multiples of the encryption block size (128 bits), it is requirement (iv) that causes this disclosure to work in multiples of the block size.

[0060] In the examples described hereinbelow, it can be assumed that the secret key for cryptographic transformation has been securely obtained by each core.

[0061] A structured data set 28 is used herein having rows and/or columns of data, e.g. having a number of lines of data and a predetermined number of bits per line. The present disclosure uses a grid based approach to partitioning the data set such that the position of a partition is defined by a number of lines/rows of data at which the partition occurs. For example, a partition may be defined as being located at 'n' lines such that the partition is made between n and n+1 lines. The n^th line would thus be contained in one data subset and the next line would be contained in the next data subset.

[0062] Whilst blocks of data are defined as multiples of 128 bits for AES algorithm, it will be appreciated that different predetermined block sizes for different encryption/decryption standards could be devised according to the methods disclosed herein.

[0063] It is also noted that during decryption, data may be held solely in the RAM/volatile memory of the computer (or distributed over the RAM of many computers). It may be required that decrypted data is never stored on the hard disk of an external or shared computer. That is to say the decryption and processing of the computational model may not be permanently stored to a non-volatile memory by the available cores, but may instead be communicated back to a secure facility for storage.

Alignment of block boundaries

[0064] In order to ensure that data cannot be partitioned so as to sever the data set in a way that makes the data, or parts of the data, unusable, the data is aligned with a data block structure to be used for the cryptographic transformation process. The data blocks may be aligned with the lines of data, i.e. to provide a whole number of data blocks per line.

[0065] The data held on CFD mesh and solution files are typically 32 or 64bit integers or reals. Whilst these are integer divisors of 128, an arbitrary partitioning of the data may create partitions that are not multiples of 128 bits. However, the CFD data that needs to be protected is vector data. The mesh is stored as (x,y,z) triplets. If 64bit precision is used, each triplet is 192 bits long. The flow file is typically a sextuplet or septuplet. In double precision, this equates to 384 or 448bits per entry.

[0066] Each domain section reads a full triplet, sextuplet etc. for each mesh point that belongs to the domain section. So, to ensure block alignment each triplet is padded to 256 bits and septuplet to 512bits. Sextuplets are an integer multiple of 128 and do not require padding.

[0067] An example is shown in Figure 5, in which the 192 bits for a Cartesian coordinate triplet is padded with 64 bits to form 256 bits of data (represented as 24 x 8 characters and 8 x 8 bits padding) which can be divisible into two 128 bit blocks for AES encryption in order to form an encrypted file that is written to a data store and/or communicated safely as required.

[0068] The padding in examples herein may follow the PKCS7 standard but other padding methods could be used if desired.

[0069] The above method therefore makes use of the prior knowledge that 3 x 64 bits are used for grid/mesh points (i.e. a collection of common data) and 6 x 64 bits for a flow solution (e.g. 5 of which are for flow parameters + 1 is for a separate turbulence value). The above method requires only that each vector, array or data set is stored with a fixed number of bits per line of data. The method will pad to the next multiple of 128 bits, if needed.

[0070] In various examples of the disclosure, the same data file may contain multiple vectors/arrays, e.g. of different types, each with a different number of bits per line. For example, whilst the mesh/grid point data and the solution/flow data represent the majority of the data, other associated data may, and typically will, be encrypted/decrypted at the same time. In the example of physics-based model data, this could comprise boundary data (e.g. boundary conditions), connectivity data, cords and the like. Alternatively, the metadata structure, e.g. described below in relation to Figure 14, also allows for a mixture of encrypted and unencrypted data within the same file. This may be used, for example where there is an overhead in encrypting non-sensitive data.

Independent application of block cipher modes

[0071] The original AES mode which reuses an unmodified secret key is called Electronic Codebook (ECB) mode. The electronic cookbook mode can be trivially applied independently to each 128bit block as shown in Fig. 6 but this is cryptographically weak.

[0072] To combat this block cipher modes of operation were introduced in accordance with the present disclosure. These fall into two categories:

(i) Counter (CTR) mode - as shown in Figures 7 and 8
(ii) Block chaining and the related feedback modes - as shown in Figures 9 and 10

[0073] Turning to Figure 7, there is shown an implementation using CTR. The cipher key for the first block is supplemented by a random initial vector (IV) or nonce (called the counter). The counter is then advanced by 1 for each subsequent block such that the counter maintains a sequential count associated with each block.

[0074] The next keystream block is generated by encrypting the successive values of the counter. The conventional CTR mode encryption and decryption procedure is shown in Figure 8.

[0075] Below are two ways in which the CTR mode could be employed as part of the present disclosure:

(i) Start the counter at the first point in the mesh and share that with all the partitions so that the counting is continuous across all partitions. This means the partitions cannot perform the encryption independently - they need to know the counter for the first point before they can work out the counter value for the data they own. Once the value for the first point is known for each partition, encryption of the data subsets/subdomains can be performed in parallel. A continuous counter makes decryption straightforward. Once encryption has been performed sequentially for the whole data set, the block count (N1, N2, N3) for each data subset is known and thus the first block, or any other block, in each data subset can be identified along with its associated key based on the position in the sequential count order.
(ii) Let each partition count independently of the others using its own initial counter/vector. The initial counters are not secret and can be stored on the file. This is simple when the same partitioning is used to read and write the data. If the data has been written using one set of partitions and read with a different set then every grid point must know the initial counter from which it is offset. This may be implemented by way of a search amongst the list of initial vectors. Such a search can be performed quickly/efficiently since the list is many times smaller than the array being decrypted.

[0076] Turning to Figures 9 and 10, there is shown a cipher block chain (CBC) transformation process. The block chain mode is fundamentally sequential so cannot work with a single initial vector for the entire data set. That is to say, with a single IV, each partition would not be able to start until the final block of the previous partition had been encrypted/decrypted. So, in the example of Figure 9 each partition uses its own IV which is stored on the file, remembering that this is not a secret.

Independence of encryption and decryption processes

[0077] A key observation is that whilst block chaining is sequential in Figures 9-10, there will be multiple initial vectors distributed throughout the assembled data, each IV correlating to the first block in sequence of a different data subset defined during the original partitioning.

[0078] The presence of multiple IVs distributed throughout the assembled data only contaminates the adjacent block. So, if a different partitioning is applied, the use of the block chain mode would produce a set of blocks that have been incorrectly decrypted. Effectively there would be as many incorrect blocks as there were partitions used to encrypt the data.

[0079] However, with a simple data structure we can store both the location of the/those blocks and the initial vector used to encrypt them. It is then a matter of locating those blocks and decrypting them with the correct initial vector. The necessary data is stored on the file and can be read by all the partitions so they can perform the decryption independently. The incorrect blocks can thus be compared to the available IVs using a search procedure in order to determine the correct IV for each relevant block. As discussed above, this search is efficient because the list of IVs is significantly smaller than the size of the data set (i.e. the mesh data).

[0080] In different examples, the order of the IV table/list (e.g. an ascending order of file position) can be used to implement advanced search algorithms (e.g. tree search) to maintain efficiency for large numbers of cores.

[0081] Turning to Figures 11 to 13, the relevant location in the data structure of a partition 16 or the first block 34 for each data subset (e.g. the first block 34A of the data set as a whole or else the first block 34 immediately following a partition 16) is logged along with the associated initial vector (IV0, IV1, IV2, IV3, etc) in a table 36 or another relevant format accompanying the file. The table 36 can be searched or otherwise interrogated in order to determine the relevant IV for a known partition/block offset value.

[0082] In Figure 11, there is shown a schematic of a file encrypted using four data subsets, i.e. three partitions 16. All blocks except the first ones 34, 34A can be decrypted without an IV. Those block locations can be stored on file with the IV.

[0083] When decrypting in serial, the following process can be used:

Find and decrypt the first blocks using IVs
Store plain text for the first blocks 34
Decrypt entire file
Replace the first blocks with stored plain text

[0084] Turning to Figure 12, there is shown a process for decrypting in parallel, e.g. using different a different number of partitions 16A and data subsets than those used for the encryption process in Figure 11. The location of the partitions 16A will therefore differ to the partitions 16.

[0085] In this example, only the first block 34A of the data set may be common with the first block 34A when encoded. The first block 34B of each subsequent partition/data subset was encrypted using last block 34C from previous line. The first blocks 34B are different in number and location to the blocks 34.

[0086] When decrypting in parallel using new partitioning 16A, the following may be performed for each new data partition or it associated data subset:

Find and decrypt the blocks 34 using IVs (i.e. the blocks that were the previous first blocks of the subsets used during encryption). This may be achieved using the known global offsets in table 36.
Store plain text for blocks 34
The partition boundary 34B lies in the interior of a subset used during encryption. The first block was encrypted using the ciphertext of the previous block which corresponds to the final 128 bits of the preceding row of data. This is effectively the IV for subset starting at partition 34B. However, this data now belongs to another subset so it must be read from the file in order to make it available to the current subset.
Read cipher text of blocks 34C from file and set as IV
Decrypt entire partition/subset
Replace blocks 34 with stored plain text

[0087] When the application using the partitioning shown in Figure 12 comes to write an encrypted dataset as shown in Figure 13, the version of table 36 used to decrypt the file can be discarded and the encrypted data is provided with a new table 36A that is appropriate to its partitioning. It is cryptographically weak to re-use IVs, so even if the partition boundaries have not moved, new IVs can be generated and stored in table 36A.

[0088] In the above manner, the table 36 is updated with the new entries for the IV accompanying the offset values for the blocks 34B

[0089] Turning to Figure 14, there is shown an example of how a file structure for performing the read/write and encryption/decryption processes described herein. The encryption data is stored in a separate metadata array or set 37 which gives the type and width of the unencrypted data. This cannot be discerned from the encrypted data as this is just a sequence of 128 bit encrypted blocks. The IVs and their offsets are also stored in the metadata 37.

[0090] The file structure is therefore flexible in that it can comprise a core set of encrypted, structured data as well as other types of encrypted or unencrypted data. Different sets of encrypted data could have different keys as necessary.

[0091] The existence of the metadata array may be used as a flag to indicate that the corresponding data array is encrypted. This allows both encrypted and unencrypted data to be stored on the same file. Some data such as mesh connectivity is not sensitive and processing time can be saved by only encrypting sensitive data such as the grid coordinates.

[0092] The use of metadata further allows the easy extension of the file input/output layer or library in an application code to be easily extended to add encryption capability whilst maintaining backwards compatibility with prior non-encrypted files. To a user the code execution would remain as before including the specification of the number of cores to be used for the calculation.

[0093] Turning now to Figure 15, there is shown a flow chart for managing the cryptographic transformation process, and/or processing of the decrypted data. A single instance is shown but the relevant steps of Figure 15 can be performed for each core. It will be appreciated by the skilled person that during the encryption process the relevant cryptographic keys can be stored in a secure data store 42 for later retrieval.

[0094] Application code 38 is run and can communicate with a memory/key manager, depicted in Figure 15 by a memory read/write function 40 in communication with a data store 42.

[0095] At step 44 the application code receives a file and one or more suitable identifier, such as a file name/handle and a name/identifier for the data set/array contained in the file.

[0096] At step 46 a check is performed to determine whether any cryptography meta data is present on the received file. The metadata could comprise any or any combination of the initialisation vector data, offset data or any other partition data described herein, such as the encryption metadata disclosed in Figure 14. The check could be performed for the metadata itself, its identifier/handle and/or its data structure.

[0097] If the requisite cryptographic meta data is identified, a key request is made at 47 and the key manager obtains from the data store 42 the relevant key(s) according to the file/data identifier(s) provided. The request for the key can be made in combination with the name, or other properties, of the array, allowing for a file to be encrypted with a multiplicity of keys. This means an attacker would have to break multiple keys to gain full access to the contents of the file.

[0098] The cipher data is read and decrypted using the key(s) at 48 , e.g. in conjunction with a further/third-party cryptographic library 50, and the resulting plain text/data 52 is then processed according to the application code 38.

[0099] As also shown in Figure 15, where no meta data is present for a data block, portion or subset, the application code 38 may read the plain/unencrypted data directly at 54. The whole plain data file 52 may be constructed from unencrypted data and encrypted data that has been decrypted using the retrieved cipher/key data.

[0100] Using the above described system, key callback routines and data processing can be run for multiple cores in parallel. A cryptography manager may perform core-related operations and error checking, for example maintaining activity and error log entries.

[0101] A skilled person will appreciate that the process described in Figure 15 can be used to write encrypted files with the same level of granularity between encrypted and unencrypted data and choice of one or many keys. The key callback structure shown in Figure 15 may allow for fine grain control of encryption keys, e.g. to the point that each vector/array can have a different key, giving protection that an attacker would have to break multiple keys to gain full access to the contents of the file.

[0102] The advantages offered by examples of the present disclosure comprise:

Data sets may be partitioned at any location, e.g. according to the available number of cores;
Successive encryption/decryption cycles may use different partitions;
Strong cryptographic modes can be used;
The auxiliary data structure accompanying the file may be relatively simple (e.g. comprising a list/table of blocks and initial vectors); and/or
Integrity of the encrypted file is maintained so it can always be decrypted
Different data sets may be encrypted with a different key making it harder for an attacker to gain full access to the file.

[0103] The use of an auxiliary data structure for the purpose of facilitating block chaining in parallel, e.g. with the ability to encrypt and decrypt files using different partitions, is believed to be novel.

[0104] It will be understood that the invention is not limited to the embodiments above-described and various modifications and improvements can be made without departing from the concepts described herein. Except where mutually exclusive, any of the features may be employed separately or in combination with any other features and the disclosure extends to and includes all combinations and subcombinations of one or more features described herein.

Claims

1. A method of cryptographic transformation of a structured data set (28) comprising:

partitioning the structured data set (28) into a first subset (28A) and a plurality of further subsets (28C-28D);

dividing the subsets into a plurality of blocks of predetermined size;

identifying a first block for each subset and a location of each further block in said subset relative to said first block of its subset;

cryptographically transforming the data subsets using a key according to a cipher block chain process; and

logging an offset value for the first block of each subset from the first block of the first subset (28A).

2. A method according to claim 1, wherein the cipher block chain process is performed in parallel for each subset using a plurality of processors.

3. A method according to claim 2, comprising identifying a number of available processors for the cryptographic transformation process and partitioning the structured data set into the number of subsets based on said number of available processors, wherein one or more subset is assigned to each processor.

4. A method according to any preceding claim, wherein the offset value comprises a location/position in the data set structure relative to the first block of the first subset.

5. A method according to any preceding claim, comprising using a different initialisation vector for the first block in each subset and logging a record of said initialisation vectors.

6. A method according to claim 5, comprising maintaining a list or table correlating each initialisation vector to an offset value, and updating an existing instance of the list or table with new offset values and/or initialisation vectors when a current partitioning the structured data set differs from a previous partitioning of the structured data set.

7. A method according to any preceding claim wherein the method comprises an encryption method and the first block in each subset is the leading block in an ordered sequence of the blocks in the subset based on the proximity to the first block of the first subset.

8. A method according to any one of claims 1 to 6, wherein the method comprises a decryption method and the first block in each subset is mid-way in the subset in an ordered sequence of the blocks in said subset based on the proximity to the first block of the first subset, wherein the first block of each subset is decrypted according to said offset value.

9. A method according to any preceding claim wherein the first block in each further subset comprises a sequential identifier which follows in sequence from the identifier of a last block of a preceding data subset; wherein the sequential identifier comprises a block count value and the cipher block chain process comprises a counter mode.

10. A method according to any preceding claim wherein cryptographically transforming the data subsets comprises decrypting the data by:

identifying and decrypting one or more first block contained in each subset using an initialisation vector corresponding to the offset value for the first block;

storing the plain text for each identified first block;

decrypting the blocks for each subset; and

replacing the decrypted first block with the stored plain text.

11. A method according to any preceding claim comprising: partitioning the data set to define a first number of data subsets; encrypting the data set in parallel using a number of cores corresponding to the first number of data subsets; subsequently partitioning the encrypted data set using a second number of data subsets, wherein the second number is different from the first number of data subsets; and decrypting the data set in parallel using a number of cores corresponding to the second number of data subsets.

12. A method according to any preceding claim, wherein the cryptographically transforming the data subsets comprises decrypting the blocks of one or more data subset, identifying incorrectly decrypted blocks, matching each incorrectly decrypted block with a stored initialisation vector and decrypting said blocks with said initialisation vectors.

13. A method according to any preceding claim, comprising dividing each line of data in the data set into a whole number of blocks of predetermined size and padding the data of one or more block in order to meet the predetermined block size.

14. A method according to any preceding claim, wherein the data set comprises geometric point data of a geometric model and/or one or more variable at each point of a physics-based computational model.

15. A system for performing a cryptographic transformation process in parallel across a plurality of computer processors wherein the plurality of processors operate in accordance with the method of any one of claims 1 to 14, or under the control of machine readable instructions on a data carrier or data storage medium, the machine readable instructions for one or more processor configured to: partition a structured data set into a first subset and a plurality of further subsets; divide the subsets into a plurality of blocks of predetermined size; identify a first block for each subset and a location of each further block in said subset relative to said first block of its subset; cryptographically transform the data subsets using a key according to a cipher block chain process; and store in a non-volatile memory an offset value for the first block of each subset from the first block of the first subset.

Drawing

Search report

Search report