A SECURE PROCESS FOR VALIDATING MACHINE LEARNING MODELS USING HOMOMORPHIC ENCRYPTION TECHNIQUES

(19)

(11)

EP 4 095 769 B1

(12)	EUROPEAN PATENT SPECIFICATION

(45)	Mention of the grant of the patent:
	18.10.2023 Bulletin 2023/42

(21)	Application number: 21175794.3

(22)	Date of filing: 25.05.2021

(51)

International Patent Classification (IPC):

G06N 20/00^(2019.01)

H04L 9/00^(2022.01)

(52)	Cooperative Patent Classification (CPC):
	G06N 20/00; H04L 9/008

(54)

A SECURE PROCESS FOR VALIDATING MACHINE LEARNING MODELS USING HOMOMORPHIC ENCRYPTION TECHNIQUES

SICHERES VERFAHREN ZUR VALIDIERUNG VON MASCHINENLERNMODELLEN MITTELS HOMOMORPHER VERSCHLÜSSELUNGSTECHNIKEN

PROCÉDÉ SÉCURISÉ DE VALIDATION DE MODÈLES D'APPRENTISSAGE MACHINE À L'AIDE DE TECHNIQUES DE CHIFFREMENT HOMOMORPHE

(84)	Designated Contracting States:
	AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

(43)	Date of publication of application:
	30.11.2022 Bulletin 2022/48

(73)	Proprietor: Unify Patente GmbH & Co. KG
	81739 München (DE)

(72)	Inventor:
	Brochonski, Michael 82620-265 Curitiba (BR)

(74)	Representative: Schaafhausen Patentanwälte PartGmbB
	Prinzregentenplatz 15 81675 München 81675 München (DE)

(56)

References cited: :

LIU XINBO ET AL: "A Privacy-Preserving Principal Component Analysis Outsourcing Framework", 2018 17TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS/ 12TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING (TRUSTCOM/BIGDATASE), IEEE, 1 August 2018 (2018-08-01), pages 1354-1359, XP033398816, DOI: 10.1109/TRUSTCOM/BIGDATASE.2018.00187 [retrieved on 2018-09-05]
KYOOHYUNG HAN ET AL: "Efficient Logistic Regression on Large Encrypted Data", IACR, INTERNATIONAL ASSOCIATION FOR CRYPTOLOGIC RESEARCH , vol. 20180710:073755 10 July 2018 (2018-07-10), pages 1-31, XP061025994, Retrieved from the Internet: URL:http://eprint.iacr.org/2018/662.pdf [retrieved on 2018-07-10]
JIANFEI CUI ET AL: "Federated machine learning with Anonymous Random Hybridization (FeARH) on medical records", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 25 December 2019 (2019-12-25), XP081818157,

Note: Within nine months from the publication of the mention of the grant of the European patent, any person may give notice to the European Patent Office of opposition to the European patent granted. Notice of opposition shall be filed in a written reasoned statement. It shall not be deemed to have been filed until the opposition fee has been paid. (Art. 99(1) European Patent Convention).

Description

[0001] The present invention relates to a method and a system for secure validation of machine learning models and parallel validation data using homomorphic encryption.

[0002] Various cryptographic methods are known from the state of the art which can be used for different purposes. One of these purposes is to present a secure process to execute transactions with two assets e. g. Machine Learning models which comprise code or algorithms and datasets containing useful information for training, classification and data analysis. Typically, in such a transaction process a user is interested in acquiring a machine learning model from a vendor or provider with which the user can evaluate or validate specific data.

[0003] For example, X.Liu et al: "A Privacy-Preserving Principal Component Analysis Outsourcing Framework"; K.Han et al: "Efficient Logistic Regression on Large Encrypted Data"; and J. Cui et al: "Federated machine learning with Anonymous Random Hybridization (FeARH) on medical records", disclose efficiency optimizations for machine learning.

[0004] However, the process of a secure transaction is important to all parties in several ways. For the user interested in acquiring the machine learning model it is of importance to ensure that pre-trained machine learning models can be validated with the dataset of the interested user, ensuring that the target model has the desired accuracy and efficiency. The data set used for the validation of the machine learning model should remain encrypted to ensure that it is not modified. In addition, the results should be encrypted so that they cannot be manipulated.

[0005] The user is particularly interested in the fact that sensitive information can remain securely in the data set without having to be removed or masked. This is necessary in environments where particularly sensitive information needs to be protected, for example in healthcare or financial data.

[0006] For the machine learning model provider, it is important that while the interested user can validate or test the model, no implementation details are disclosed during this phase of the transaction.

[0007] In the state of the art, there are various approaches making such a transaction process more secure. For instance, cryptography may be applied to a pre-trained machine learning model and dataset may be used for validation of said model. Zero-knowledge technique applied on arguments can be used for this purpose. With this technique, an unencrypted validation dataset from an interested user, who wants to acquire a machine learning model is given to the provider of the model. The model provider can use said zero-knowledge arguments to prove that the pre-trained machine learning model is contained within. This can be done as there is one specific result as output without revealing the machine learning model parameters.

[0008] In a different scenario Secure Multiparty Computation can be used as a secure measure to protect the machine learning model parameters. However, protecting the whole model is non-trivial. To achieve this, both interested parties share the machine learning system structure, the model's weights and validation dataset remain hidden to each other.

[0009] Zero-Knowledge und Multiparty Computation are machine learning models that were trained unencrypted with an unencrypted dataset, the encryption is applied after training the models. This also applies to the dataset that is going to be used for validation.

[0010] In other techniques cryptography being applied on a model that is going to be trained and a dataset being used for training said model. These models go through their whole training phase with encrypted model and encrypted dataset. Typically, the training dataset is generally much larger than a validation dataset. This entails in practice that these techniques are computationally much more expensive than Zero-Knowledge or Multiparty Computation. Costs grow the more complex the machine learning model becomes and the larger the dataset is. This means for many models or said datasets that these techniques may become infeasible as a limiting factor for cost, time or both.

[0011] The US 20190332814 A1 describes training of machine learning systems with just the model fully encrypted while trying to use hardware-specific nodes to reduce the high computational cost this ensues.

[0012] However, the known state-of-the-art techniques for secure processes for validating machine learning models each have some drawbacks as discussed below in more detail.

[0013] In the case of Zero-Knowledge parameters the interested user who owns the dataset must disclose the information without cryptography for the processing to be possible. The issue in this manner is that only one of the parties is protected which is the provider who owns the machine learning model. End-to-end protection is not possible for both parties.

[0014] The Secure Multiparty Computations approach holds two main drawbacks, one related to communication, the second regarding to machine learning model limitations. When using this approach both interested parties must be online, that is, having a permanent connection while the processing/validation is being performed. This discards the possibility of doing the validation on an offline/intranet environment which may be necessary if one of the parties wishes to assess performance or can only test in one such environment e. g. Embedded System. The second issue goes into limitations of this approach to hide some elements of the machine learning model. It is a non-issue to hide weights and the validation dataset information. But hiding the whole machine learning model (topology, hyperparameters) is not trivial and may not even be possible depending on which approach and modeling the machine learning model in question uses for solving a certain problem. Another issue when not applying cryptography to the whole model is the higher sensitivity to extraction attacks.

[0015] Prior art as shown, for example, in US 20190332814 A1 do not protect input data entering the machine learning model. All data involved in the transaction process is not protected with encryption. As stated, solutions described in US 20190332814 A1 where the machine learning is embedded need specific hardware to function as concerns with received data from an Internet of Things device. The edge node mentioned in US 20190332814 A1 is needed to enforce encryption, signature verification and decryption that is with specific proprietary hardware needs to accommodate the solution. Involved parties would need to have this hardware or use a third-party which possesses the hardware to execute the process.

[0016] Therefore, the present invention is based on the object to overcome the limitations of the state of the art and to provide a method and a system for a cost-effective and secure validation of machine learning models and parallel validation data.

[0017] This object is solved by a method having the features according to claim 1 and a corresponding system having the features of claim 10. Preferred embodiments of the invention are defined in the respective dependent claims.

[0018] According to the invention, a method for secure validation of machine learning models and parallel validation data using homomorphic encryption is provided, the method comprising the steps of:

providing a machine learning model by a provider and providing validation data by a user;

encrypting, by the provider, the machine learning model;

sending, by provider, public encryption parameter to user;

selecting, by the user and the provider, a unifying encoding method;

encrypting, by the user, the validation data;

sending, by the user, the encrypted validation data;

processing the encrypted validation data with the encrypted machine learning model; providing encrypted results of said processing to the provider and the user;

decrypting the results and evaluating whether the performance of the machine learning model is satisfactory with the given valuation data of the user.

[0019] According to a preferred embodiment, the method in the step of encrypting the machine learning model by the provider further comprises generating public-, secure- and/or functionality homomorphic encryption parameters; and wherein the method further comprises sending, by the provider, the homomorphic encryption parameters to user.

[0020] According to another preferred embodiment, the public homomorphic encryption parameters comprising a scheme defining the precision and efficiency of the subsequent processing of the encrypted validation data with the encrypted machine learning model, wherein the scheme is a Brakerski-Fan-Vercauteren (BFV) or a Cheon-Kim-Kim-Song (CKKS) scheme.

[0021] BFV relies on modular and exact vectors of numbers. Ciphertext (plaintext) data is represented as modulo of the integers in a vector (each modulo defined as t). The computation refers to integer arithmetic circuits based on modulo t. The computational cost is lower than Fast Fully Homomorphic Encryption over the Torus (TFHE), albeit still higher than CKKS. This method is ideal for applications that need a precise response with no errors i. e. financial data with the trade-off of some additional computational cost. CKKS relies on approximate vectors of numbers. Ciphertext (plaintext) data is represented as real numbers (and complex numbers). The computation refers to floating point arithmetic. The computational cost is lower than Fast Fully Homomorphic Encryption over the Torus (TFHE) or BFV. This method is ideal for applications where very high precision is not paramount i. e. statistical models, medical data, machine learning models and most applications where high precision for floating point is not obligatory. Currently, one other scheme that could be used is Fast Fully Homomorphic Encryption over the Torus (TFHE). This method is not currently emphasized because of its high computational cost. However, according to TFHE bits are evaluated on an arbitrary Boolean circuit composed of binary gates over encrypted data, without revealing any information on this data. In short, ciphertext (plaintext) is presented as bits and computation as logic circuits. This process enables as BFV and CKKS to process the data with decryption, albeit with a higher computational cost.

[0022] According to still another preferred embodiment the functionality homomorphic encryption parameters comprising one of cyclotomic ring, modulus (modulo) and/or level depth. The cyclotomic ring or polynomial ring is a polynomial whose coefficients are chosen from the integers, where these polynomials are computed from (Xⁿ + 1). In a cyclotomic ring R all the polynomials have a degree of at most (n-1), defining formally:

[0023] Any term which is included on (Xⁿ + m), where m is an integer, this term will be reduced to modulo n plus 1. In the formal example

as m equals zero, and the value is just flipped around the ring.

[0024] There is also the modulo ring, and this is where the modulo takes its part, which is computed as:

[0025] Its coefficients are computed as modulo Q. Coefficients at this point are no longer an integer but represented as a set of integers that are zero balanced. Zero balancing is important for computational efficiency when working with polynomials. Coefficients, in this case modulo (Q) can be chosen as a parameter. Putting into context the cyclotomic ring R is the polynomial obtained from the input data, in this scenario either the machine learning model or the input dataset for validation. Without reduction, this polynomial is computationally too expensive to be practical, so the modulo ring R_Q is a reduction from the original ring R. This reduction is based on the security desired for the input into the homomorphic encryption and is defined by:
The size of the data input encoding (e. g. 128 bits) with its degree (n) and the modulo (Q) chosen. So, the degree of security is based upon the encoding chosen with its modulo (Q) define security hardness and how computationally expensive is to realize this operation. The level depth can be increased and decreased by adjusting the modulo parameter (Q). This defines the amount of noise that is going to be present along the ciphertext (encrypted message). Defining the ideal modulo (Q) is hard as it is very dependent on the encryption technique being used, the size of the input (as well as the consequent polynomial generated from it from the cyclotomic ring) and the encoding chosen (i. e. 128 bits). The usual approach is to encode with a smaller Q, test its output with the unencrypted content and compare the error. If it is zero, the ideal modulo has been found for this encryption method with the desired encoding complexity.

[0026] Further, according to a preferred embodiment the unifying encoding method uses a block size of n bits or of n = 256 bits or a block size of n = 128 bits. This encoding of size also named encoding complexity n or simpler block size n must be proportionally larger than the bigger modulo (Q) is. That is, for achieving a desired security level which can be calculated with n log (Q), where n is the size of the encoding and Q the modulo, n must be of a target size. This standard for security can be used, as homomorphic encryption does have this metric which has been accorded and defined and can be found on https://homomorphicencryption.org/standard/. As a rule, 256 bits is applied by default unless the computational cost is too prohibitive. This ensures by a large margin that even with a large Q, the encrypted contents remain safe.

[0027] According to still another preferred embodiment the secure homomorphic encryption parameters are tightly linked to the other homomorphic encryption parameters, from these two are key, the modulo (Q) and the encoding complexity n.

[0028] According to yet another preferred embodiment, the method in the step of encrypting, by the user, the validation data further comprises generating, by user, public keys that are going to be used during the transaction process and wherein the method further comprising the steps:
sending, by user, the encrypted validation data, and the generated public keys to the provider and processing the encrypted validation data with the encrypted machine learning model, wherein the public keys of the user are used.

[0029] The step of processing the encrypted validation data with the encrypted machine learning model is repeated with another unifying coding method in case the result of the machine learning model does not meet the requirements for accuracy and efficiency.

[0030] According to still another preferred embodiment, the number of repeated processing of the machine learning model is limited to a predetermined threshold n that is to avoid the risk of extraction attacks. This threshold n is highly dependent on the complexity of the encrypted model. For simpler models a very low threshold n is advised (i.e. lower than 10 tries). In High complexity models it can scale up to the hundreds of tries. Realistically, a validation scenario would encompass one dataset or small set of datasets, so the advised limit or threshold n would be close to the number of datasets being tested times two.

[0031] Further, according to a preferred embodiment, a neural network watermarking is used to trace the machine learning model if a redistribution of the provided machine learning model is not to occur.

[0032] According to yet another preferred embodiment, the method is executed on an online external system, a public cloud solution and/or a private offline system.

[0033] According to the invention, a system for secure validation of machine learning models and parallel validation data using homomorphic encryption is provided, wherein the system is configured to perform the method according to the claims 1 to 10.

[0034] According to an embodiment of the invention, the system comprising at least one of online external system, public cloud solution system and/or private offline system. According to another preferred embodiment, the system further comprising a local system, a network system and/or cloud system configured to perform the encryption and/or decryption of the validation data.

[0035] According to another preferred embodiment, the system further comprising a local system, a network system and/or cloud system configured to perform the encryption and/or decryption of the machine learning model.

[0036] According to another preferred embodiment, the system further comprising a local system, a network system and/or cloud system configured to perform the processing of the encrypted validation data with the encrypted machine learning model.

[0037] According to the present invention, need of a specific hardware is not required. Furthermore, the present invention can, but is not limited to receiving data from Internet of Things devices. Data according to the proposed invention is also fully encrypted, or partially encrypted according to need.

[0038] The present invention aims on resilience at software level, giving flexibility for both parties executing the process on a platform of choice without depending on specific "tamper resistant hardware" and offers end-to-end encryption both to the machine learning model and input data.

[0039] A user interested on acquiring a machine learning model has several advantages by the present invention. For instance, that a pre-trained machine learning model can be validated with the dataset of the user, ensuring that the target model has the desired accuracy and efficiency. The dataset that is being used for machine learning model validation can remain encrypted, ensuring that it is not modified. And further, the results are encrypted and cannot be tampered with. Any possibly sensitive information on the dataset remains secure with no need to remove or obfuscate it. This is especially useful on environments that prize especially sensitive information e. g. healthcare, financial data.

[0040] Furthermore, a Machine Learning pre-trained model can be tested on the system it is going to be deployed, confirming whether the hardware being used for processing is accordingly scaled, either as an internal system or a system in the cloud.

[0041] Moreover, costs of the data encryption/decryption can be accounted for, as the overhead of the encryption can be estimated using an unencrypted machine learning model and an unencrypted dataset with a similar batch size, wherein comparing it with its encrypted counterparts provides the desired estimation.

[0042] As both dataset and pre-trained machine learning model remain encrypted during the evaluation process, validation can be done on an external system for both parties such as a public cloud solution or the evaluation can be done on a private offline system accessible by just one of the parties.

[0043] A provider interested on supplying the machine learning model has several advantages by the present invention for instance that the Machine Learning Model can be validated by an interested user on acquiring the model without disclosing implementation details.

[0044] Furthermore, validation can occur on remote cloud systems or offline systems, if precautions are made to avoid reverse engineering or extraction attacks. After gaining access to said machine learning model for evaluation an interested user cannot copy or easily acquire implementation details of the model in a feasible way due to several mechanisms like: Machine Learning model is encrypted; Extraction attacks are not possible by limiting the number of validations/queries to the model. Details of the model are only known after the acquisition transaction has been confirmed and the interested user has access to the unencrypted machine learning model.

[0045] Moreover, after the interested user has access to the machine learning model, safety mechanisms may be in place within the model to avoid unauthorized redistribution of said model with techniques such as neural network watermarking.

[0046] The previous remarks show the advantages for both interested parties on a process to acquire machine learning models in a secure manner. They assure that the model will fit the interests of the user in terms of efficiency and accuracy for a given task and with the data available for processing on the target model. Also assured is the safety of the model as no details are given about algorithms used, parameter tuning, and model's topology. Results tampering on both sides are also restricted as both ends (Dataset and pre-trained machine learning model) are encrypted. This prevents situations such as selling a machine learning model that is not ideal for the interested user on buying it or disclosing details about the machine learning model without guarantees of a sale, exposing it to be leaked, copied or redistributed.

[0047] The invention and embodiments thereof will be described below in further detail in connection with the drawing.

Fig. 1A to 1B: show flowcharts of the method for secure validation of machine learning models and parallel validation data using homomorphic encryption according to an embodiment of the invention,
Fig. 2: shows a graphical scheme of system components configured to perform steps of the method for secure validation of machine learning models and parallel validation data using homomorphic encryption according to another embodiment of the invention.

[0048] Fig. 1A to 1B show flowcharts of the method 100 for secure validation of machine learning models and parallel validation data using homomorphic encryption according to an embodiment of the invention. In accordance with a typical transaction process, an interested user (Potential Buyer) has an interest in acquiring a machine learning model from a provider. As is the user needs assurance that the model provided by provider cover its needs, that is, the machine learning model is efficient and accurate enough with the available processing data for the user. The user wants to validate the machine learning model of interest with the provider. In the procedure the user needs a secure process where there is assurance that the results with the provided evaluation data for a model evaluation are not tampered with while also having its evaluation data kept secure if it contains privacy-sensitive information (e. g. healthcare information of patients, financial data etc.). The provider needs a process where it is possible to disclose the efficiency and accuracy of the trained machine learning model with the evaluation data provided by the user without disclosing details about said model. That is, without revealing details about its implementation, techniques and algorithms used which may lead to an intellectual loss for the provider. The flow of the process for executing a successful transaction with its technical is going to be demonstrated in FIG. 1A. Both parties, provider and user agree to do an evaluation test of the machine learning model provided by the provider 101, with the validation data from the user 102. To initiate the process, both machine learning model and validation data must be encrypted. During validation both machine learning and evaluation will remain encrypted, that is, during the processing of information and afterwards nothing but the result will be visible for both provider and user. The provider executes a step known as setup in homomorphic encryption 103, where the public homomorphic encryption parameters are created which include a scheme. The scheme defines the precision and efficiency of the computation, being BFV (Brakerski-Fan-Vercauteren) where exact arithmetic vectors of numbers are used at the cost of computational cost. This may be mandatory if machine learning model deals with high precision data and cannot afford a loss in accuracy. Or the scheme being CKKS (Cheon-Kim-Kim-Song) which deals with approximate arithmetic on vectors of numbers, more efficient computationally and ideal for application where a small accuracy loss is not prohibitive. This parameter choice must be accorded beforehand with both parties as it will be dependent on the data being input by user and the precision needed for the task. Moreover, the provider can additionally create Security Parameters or Functionality Parameters like Cyclotonic ring, modulus, and/or level depth. The Public Parameters created by the provider are sent to the user 104. In this step or a subsequent step an encoding method 105 must also be chosen by both parties, as it heavily impacts performance. If a stricter encoding is used (block size of 256 bits), preferred methods such as Galois keys (block size of 128 bits), cannot be used. The choice depends on how strict both interested parties want the key security to be. Next user runs a key generation method for creating the public keys (public evaluation keys) 106 that are going to be used during the transaction process. The user now sends the evaluation data in its encrypted form to the provider along with the public evaluation keys 107. After that, the provider can now perform a homomorphic inference 108 on the evaluation data sent by the user using its public evaluation keys. That is the process of validating and processing the encrypted evaluation data sent by the user, while this is fed to the encrypted machine learning model envisioned by the provider. The encrypted results of this processing are sent to both interested parties 109, provider and user. Both parties can now decrypt the results and evaluate whether the performance of the model is satisfactory with the given input data delivered by the user 110. Note that both accuracy and time efficiency in execution are affected by the encryption method and the parameters chosen 105. However, it is possible to estimate if the provider can execute the machine learning model with input data that has a similar batch size (in samples, parameters, fields etc.) that the one provided encrypted by the user. If this information (batch size) can be purveyed, the provider can beforehand provide an error rate on accuracy that using encryption incurs and, also, a delta on how much time is added to processing to the user. This information can be especially useful if accuracy or efficiency is key for user's target use of the machine learning model.

[0049] FIG. 1B shows that the number of times the evaluation 105, 106, 107, 108, 109, 110 may be run must be limited for several reasons, if the provider wants to avoid the risk of extraction attacks, executions/queries to the model must be limited, or a very tight encoding must be chosen 105 to make such extraction attacks computationally infeasible due to cost. The former is preferred as it is a simpler solution which attends the needs of both, provider and user if evaluating the model with given data while encrypted is the target. On occasion that the user offers interest to buy the model provided by the provider after obtaining said results, a deal can be made 111, and the unencrypted machine learning model can be provided to the user. The user can validate with the same unencrypted evaluation data if the results of accuracy and efficiency of the model is on par with said evaluated encrypted machine learning model. On occasion that the results of accuracy and efficiency of the model is not on par with said evaluated encrypted machine learning model the step of processing by the machine learning model may be repeated with another unifying coding method 112. Thereby, the number of repeated processing of the machine learning model may be limited to a predetermined threshold n that is to avoid the risk of extraction attacks. Further steps can be taken by the provider if redistribution of the provided machine learning model is not to occur, with the use of neural network watermarking. Neural network watermarking can trace, if a said model is the same as a target machine learning model that could not be legally distributed. All said steps describe a process that secures information for both parties, the detainer of data and the detainer of machine learning model where both have data encrypted. This ensures that the disclosed machine learning model does not have its implementation details and inner optimizations revealed. It also ensures that the input data for the machine learning model is also protected if any privacy-sensitive information is contained.

[0050] Fig. 2 illustrates putative system components configured to perform certain steps of the method for secure validation of machine learning models and parallel validation data using homomorphic encryption according to another embodiment of the invention. In this example, a system 200 comprises a local system 210 which belongs to an owner of a dataset to be validated. Further this system comprises a local system 220 which belongs to an owner of a trained machine learning model. However, these systems may also be network systems or cloud systems and may comprise several components like server, gateways, storages, databases etc. but may also be a single computer or workstation. Within the two systems 210 and 220, the data to be validated and the machine learning model are encrypted. As described in the method according to the invention, there is an exchange of information in certain steps with which parameters the encryption has to be carried out, so that the machine learning model can do anything at all with the encrypted validation data (for the sake of clarity, these steps are not shown in Fig. 2). After the respective encryption by the two systems 210 and 220, the encrypted components are transferred to another system 230. This validation system 230 includes, for example, a validation system server and performs validations of the encrypted data with the encrypted machine learning model. Occasionally, this system 230 can also be used to exchange information about the appropriate encryption parameters. The system 230 then also encrypts the results and sends them to the systems 210 and 220. The systems 210 and 220 can then decrypt and evaluate the results. However, it is also possible that the system 230 is within the sphere of the owner of the machine learning model or is even a component of the system 220.

[0051] The scope of protection is defined by the claims.

Claims

1. Method (100) for secure validation of machine learning models and parallel validation data using homomorphic encryption, comprising the steps:

- providing, by a provider, a machine learning model (101) and providing, by a user, validation data (102);

- encrypting, by the provider, the machine learning model (103);

- sending, by the provider, a public encryption parameter to the user (104);

- selecting, by the user and provider, a unifying encoding method (105);

- encrypting, by the user, the validation data (106);

- sending, by the user, the encrypted validation data (107);

- processing the encrypted validation data with the encrypted machine learning model (108);

- providing encrypted results of said processing to the provider and the user (109);

- decrypting the results and evaluating whether the performance of the machine learning model is satisfactory with the given valuation data of the user (110),

characterized in that,
the step of processing the encrypted validation data with the encrypted machine learning model (108) is repeated with another unifying coding method (112) in case the result of the machine learning model does not meet the requirements for accuracy and efficiency.

2. The method (100) according to claim 1, wherein the step of encrypting, by the provider, the machine learning model (103) further comprises generating, by the provider, public, secure and/or functionality homomorphic encryption parameters (103a); and
wherein the method (100) further comprises sending, by provider, the homomorphic encryption parameters to user (104a).

3. The method (100) according to claim 2, wherein the public homomorphic encryption parameters comprising a scheme defining the precision and efficiency of the subsequent processing of the encrypted validation data with the encrypted machine learning model, wherein the scheme is a Brakerski-Fan-Vercauteren (BFV) or a Cheon-Kim-Kim-Song (CKKS) scheme.

4. The method (100) according to claim 2 or 3, wherein the functionality homomorphic encryption parameters comprising one of cyclotonic ring, modulus (modulo) and/or level depth.

5. The method (100) according to any one of the previous claims, wherein the unifying encoding method use a block size of n bits or of n=256 bits or a block size of n=128 bits.

6. The method (100) according to any of the previous claims, wherein the step of encrypting, by the user, the validation data (106) further comprises generating, by user, public keys that are going to be used during the transaction process (106a); and wherein the method further comprising the steps:

- sending, by user, the encrypted validation data and the generated public keys to the provider (107a); and

- processing, the encrypted validation data with the encrypted machine learning model, wherein the public keys of user are used (108a).

7. The method (100) according to claim 1, wherein the number of repeated processing of the machine learning model (112) is limited to a predetermined threshold n that is to avoid the risk of extraction attacks.

8. The method (100) according to any one of the previous claims, wherein a neural network watermarking is used to trace the machine learning model if a redistribution of the provided machine learning model is not to occur.

9. The method (100) according to any one of the previous claims, wherein the method is executed on an online external system, a public cloud solution, and/or a private offline system.

10. System for secure validation of machine learning models and parallel validation data using homomorphic encryption, wherein the system is configured to perform the method according to the claims 1 to 9.

11. The system according to claim 10, wherein the system comprising at least one of online external system, public cloud solution system and/or private offline system.

12. The system according to claim 10 or 11, wherein the system further comprising a local system, a network system and/or cloud system configured to perform the encryption and/or decryption of the validation data.

13. The system according to claim 10 to 12, wherein the system further comprising a local system, a network system and/or cloud system configured to perform the encryption and/or decryption of the machine learning model.

14. The system according to claim 10 to 13, wherein the system further comprising a local system, a network system and/or cloud system configured to perform the processing of the encrypted validation data with the encrypted machine learning model.

Ansprüche

1. Verfahren (100) zur sicheren Validierung von maschinellen Lernmodellen und parallelen Validierungsdaten unter Verwendung von homomorpher Verschlüsselung, umfassend die Schritte:

- Bereitstellen eines maschinellen Lernmodells (101) durch einen Anbieter und Bereitstellen von Validierungsdaten (102) durch einen Benutzer;

- Verschlüsseln des maschinellen Lernmodells (103) durch den Anbieter;

- Senden, durch den Anbieter, eines öffentlichen Verschlüsselungsparameters an den Benutzer (104);

- Auswählen einer vereinheitlichenden Verschlüsselungsmethode durch den Benutzer und den Anbieter (105);

- Verschlüsseln der Validierungsdaten durch den Benutzer (106);

- Senden der verschlüsselten Validierungsdaten (107) durch den Benutzer;

- Verarbeiten der verschlüsselten Validierungsdaten mit dem verschlüsselten maschinellen Lernmodell (108);

- Bereitstellen verschlüsselter Ergebnisse dieser Verarbeitung an den Anbieter und den Benutzer (109);

- Entschlüsseln der Ergebnisse und Bewerten, anhand der gegebenen Bewertungsdaten des Benutzers, ob die Leistung des maschinellen Lernmodells zufriedenstellend ist (110),

dadurch gekennzeichnet, dass,
der Schritt der Verarbeitung der verschlüsselten Validierungsdaten mit dem verschlüsselten maschinellen Lernmodell (108) mit einer anderen vereinheitlichenden Codierungsmethode (112) wiederholt wird, falls das Ergebnis des maschinellen Lernmodells die Anforderungen an Genauigkeit und Effizienz nicht erfüllt.

2. Verfahren (100) nach Anspruch 1, wobei der Schritt des Verschlüsselns des maschinellen Lernmodells (103) durch den Anbieter ferner das Erzeugen öffentlicher, sicherer und/oder funktionaler homomorpher Verschlüsselungsparameter (103a) durch den Anbieter umfasst; und
wobei das Verfahren (100) ferner das Senden der homomorphen Verschlüsselungsparameter an den Benutzer (104a) durch den Anbieter umfasst.

3. Verfahren (100) nach Anspruch 2, wobei die öffentlichen homomorphen Verschlüsselungsparameter ein Schema umfassen, das die Präzision und Effizienz der nachfolgenden Verarbeitung der verschlüsselten Validierungsdaten mit dem verschlüsselten maschinellen Lernmodell definiert, wobei das Schema ein Brakerski-Fan-Vercauteren (BFV) oder ein Cheon-Kim-Kim-Song (CKKS) Schema ist.

4. Verfahren (100) nach Anspruch 2 oder 3, wobei die Parameter der funktionalen homomorphen Verschlüsselung einen der folgenden Parameter umfassen: zyklotonischer Ring, Modulus (Modulo) und/oder Verschachtelungstiefe.

5. Verfahren (100) nach einem der vorhergehenden Ansprüche, wobei das vereinheitlichende Kodierungsverfahren eine Blockgröße von n Bits oder von n = 256 Bits oder eine Blockgröße von n = 128 Bits verwendet.

6. Verfahren (100) nach einem der vorhergehenden Ansprüche, wobei der Schritt des Verschlüsselns der Validierungsdaten (106) durch den Benutzer ferner das Erzeugen von öffentlichen Schlüsseln durch den Benutzer umfasst, die während des Transaktionsprozesses (106a) verwendet werden sollen; und wobei das Verfahren ferner die Schritte umfasst:

- Senden der verschlüsselten Validierungsdaten und der erzeugten öffentlichen Schlüssel durch den Benutzer an den Anbieter (107a); und

- Verarbeiten der verschlüsselten Validierungsdaten mit dem verschlüsselten maschinellen Lernmodell, wobei die öffentlichen Schlüssel des Benutzers verwendet werden (108a).

7. Verfahren (100) nach Anspruch 1, wobei die Anzahl der wiederholten Verarbeitung des maschinellen Lernmodells (112) auf einen vorbestimmten Schwellenwert n begrenzt ist, um das Risiko von Extraktionsangriffen zu vermeiden.

8. Verfahren (100) nach einem der vorhergehenden Ansprüche, wobei ein neuronales Netzwerk-Wasserzeichen verwendet wird, um das maschinelle Lernmodell nachzuverfolgen, wenn eine Weitergabe des bereitgestellten maschinellen Lernmodells nicht erfolgen soll.

9. Das Verfahren (100) nach einem der vorhergehenden Ansprüche, wobei das Verfahren auf einem externen Online-System, einer Public Cloud-Lösung und/oder einem privaten Offline-System ausgeführt wird.

10. System zur sicheren Validierung von maschinellen Lernmodellen und parallelen Validierungsdaten unter Verwendung homomorpher Verschlüsselung, wobei das System so konfiguriert ist, dass es das Verfahren nach einem der Ansprüche 1 bis 9 durchführt.

11. System nach Anspruch 10, wobei das System mindestens eines der folgenden Systeme umfasst: externes Online-System, Public Cloud-Lösungssystem und/oder privates Offline-System.

12. System nach Anspruch 10 oder 11, wobei das System ferner ein lokales System, ein Netzwerksystem und/oder ein Cloud-System umfasst, das so konfiguriert ist, dass es die Verschlüsselung und/oder Entschlüsselung der Validierungsdaten durchführt.

13. System nach einem der Ansprüche 10 bis 12, wobei das System ferner ein lokales System, ein Netzwerksystem und/oder ein Cloud-System umfasst, das so konfiguriert ist, dass es die Verschlüsselung und/oder Entschlüsselung des maschinellen Lernmodells durchführt.

14. System nach einem der Ansprüche 10 bis 13, wobei das System ferner ein lokales System, ein Netzwerksystem und/oder ein Cloud-System umfasst, das so konfiguriert ist, dass es die Verarbeitung der verschlüsselten Validierungsdaten mit dem verschlüsselten maschinellen Lernmodell durchführt.

Revendications

1. Procédé (100) de validation sécurisée de modèles d'apprentissage automatique et de données de validation parallèles à l'aide d'un chiffrement homomorphique, comprenant les étapes suivantes :

- fournir, par un opérateur, un modèle d'apprentissage automatique (101) et fournir, par un utilisateur, des données de validation (102) ;

- chiffrer, par l'opérateur, le modèle d'apprentissage automatique (103) ;

- envoyer, par l'opérateur, un paramètre de chiffrement public à l'utilisateur (104) ;

- sélectionner, par l'utilisateur et l'opérateur, un procédé de codage unificateur (105) ;

- chiffrer, par l'utilisateur, les données de validation (106) ;

- envoyer, par l'utilisateur, les données de validation chiffrées (107) ;

- traiter les données de validation chiffrées avec le modèle d'apprentissage automatique chiffré (108) ;

- fournir les résultats chiffrés dudit traitement à l'opérateur et à l'utilisateur (109) ;

- déchiffrer les résultats et évaluer au regard des données d'évaluation de l'utilisateur (110) si la performance du modèle d'apprentissage automatique est satisfaisante,

caractérisé en ce que
l'étape consistant à traiter les données de validation chiffrées avec le modèle d'apprentissage automatique chriffré (108) est répétée avec un autre procédé de codage unificateur (112) dans le cas où le résultat du modèle d'apprentissage automatique ne répondrait pas aux exigences de précision et d'efficacité.

2. Procédé (100) selon la revendication 1, dans lequel l'étape de chiffrage, par l'opérateur, du modèle d'apprentissage automatique (103) comprend également la génération, par l'opérateur, de paramètres de chiffrement homomorphes publics, sécurisés et/ou de fonctionnalité (103a) ; et
dans lequel le procédé (100) comprend également l'envoi, par l'opérateur, des paramètres de chiffrement homomorphes à l'utilisateur (104a).

3. Procédé (100) selon la revendication 2, dans lequel les paramètres de chiffrement homomorphique public comprennent un schéma définissant la précision et l'efficacité du traitement ultérieur des données de validation chiffrées avec le modèle d'apprentissage automatique chiffré, le schéma étant un schéma Brakerski-Fan-Vercauteren (BFV) ou un schéma Cheon-Kim-Kim-Song (CKKS).

4. Procédé (100) selon la revendication 2 ou 3, dans lequel les paramètres de chiffrement homomorphique de fonctionnalité comprennent un des éléments suivants: un anneau cyclotonique, un module (modulo) et/ou une profondeur de niveau.

5. Procédé (100) selon l'une des revendications précédentes, dans lequel le procédé de codage unificateur utilise une taille de bloc de n bits ou de n=256 bits ou une taille de bloc de n=128 bits.

6. Procédé (100) selon l'une des revendications précédentes, dans lequel l'étape de chiffrage, par l'utilisateur, des données de validation (106) comprend également la génération, par l'utilisateur, des clés publiques qui seront utilisées au cours du processus de transaction (106a) ; et dans lequel le procédé comprend en outre les étapes suivantes :

envoyer, par l'utilisateur, les données de validation chiffrées et les clés publiques générées à l'opérateur (107a) ; et

- traiter les données de validation chiffrées avec le modèle d'apprentissage automatique chiffré, en utilisant les clés publiques de l'utilisateur (108a).

7. Procédé (100) selon la revendication 1, dans lequel le nombre de traitements répétés du modèle d'apprentissage automatique (112) est limité à un seuil n prédéterminé afin d'éviter le risque d'attaques par extraction.

8. Procédé (100) selon l'une des revendications précédentes, dans lequel un filigrane de réseau neuronal est utilisé pour tracer le modèle d'apprentissage automatique si une redistribution du modèle d'apprentissage automatique fourni ne doit pas avoir lieu.

9. Procédé (100) selon l'une des revendications précédentes, dans lequel le procédé est exécuté sur un système externe en ligne, une solution de cloud public et/ou un système privé hors ligne.

10. Système de validation sécurisée de modèles d'apprentissage automatique et de données de validation parallèles utilisant le chiffrement homomorphique, dans lequel le système est configuré pour appliquer le procédé selon les revendications 1 à 9.

11. Système selon la revendication 10, comprenant au moins un des éléments suivants : système externe en ligne, système de solution de cloud public et/ou système privé hors ligne.

12. Système selon la revendication 10 ou 11, comprenant également un système local, un système de réseau et/ou un système de cloud configuré pour effectuer le chiffrage et/ou le déchiffrage des données de validation.

13. Système selon l'une des revendications 10 à 12, comprenant également un système local, un système de réseau et/ou un système de cloud configuré pour effectuer le chiffrage et/ou le déchiffrage du modèle d'apprentissage automatique.

14. Système selon l'une des revendications 10 à 13, comprenant également un système local, un système de réseau et/ou un système de cloud configuré pour effectuer le traitement des données de validation chiffrées à l'aide du modèle d'apprentissage automatique chiffré.

Drawing

Cited references

REFERENCES CITED IN THE DESCRIPTION

This list of references cited by the applicant is for the reader's convenience only. It does not form part of the European patent document. Even though great care has been taken in compiling the references, errors or omissions cannot be excluded and the EPO disclaims all liability in this regard.

Patent documents cited in the description

US20190332814A1 [0011] [0015] [0015] [0015]

Non-patent literature cited in the description

X.LIU et al.A Privacy-Preserving Principal Component Analysis Outsourcing Framework, [0003]
K.HAN et al.Efficient Logistic Regression on Large Encrypted Data, [0003]
J. CUI et al.Federated machine learning with Anonymous Random Hybridization (FeARH) on medical records, [0003]