[0001] The invention relates to a recommendation method and system for recommending a setting or value of an adjustable parameter of an object for an application of the object. For example, the object can be a device of an industrial facility comprising a plurality of devices and the application might correspond to an operating state of the facility. Therein, different devices of the facility might be of a different device type, e.g. pumps, valves, motors etc. Each device might have at least one, but typically a plurality of adjustable parameters, each of which can be adjusted to have a desired setting. Such desired setting of the respective adjustable parameter is provided by the recommendation method and system as recommended setting which, in the end, depends on the particular object and device, respectively, and on the considered application and operating state, respectively.
[0002] Specific settings of adjustable parameters of certain objects and for particular applications or situations often requires domain knowledge as well as the consideration of big amounts of data. Moreover, it is often required or at least beneficial if such setting and changes of settings can be performed automatically by a control system. Especially, as a first example, the settings of configuration parameters of devices of an industrial facility depend on the type of the particular device itself and on an actual operating state of the facility. As soon as a new operating state is applied or -even more complicated- when an operating state shows up which is not known from before, e.g. a new error state, the configurations of the devices would have to be adapted. Due to the complexity of such facilities, the decision about which settings have to be applied for which adjustable parameter for which device and in which operating state is a complicated task which has to be addressed automatically to fulfill the requirement of fast and automatic implementation of the settings. As a second example taken from a business environment, customer and product specific adaption of business transactions requires domain knowledge that often involves years of training and experience and knowledge about features of the products dealt with. It becomes even more challenging as the data collected from customers and transactions increase in both quantity and complexity. Currently, decisions about business transactions require extraction and evaluation of known, historic data from source systems. Once these historic data are prepared in a suitable way, new data have to be considered additionally, based on experience and anticipation of future transactions. All this information often has not full transparency and depends on the individual knowledge of a persons involved in such transactions. Moreover, for example pricing decisions usually do not follow any predetermined rules or analytical processes, but are rather based on individual behavior and are therefore not transparently reproduceable.
[0003] Often, most of such decisions on identifying optimal settings of devices, on product pricing, and in other related scenarios are made by professionals according to their education, training, and working experiences and by following rule-based guidelines. These sources are prone to become quickly outdated in a dynamic environment. Therefore, it is appealing to provide a decision support system and a recommendation approach, respectively, that is data driven and capable of self-optimizing with increasing data size. Thus, in each one of those examples, a recommender system and method is required which is configured to recommend a suitable setting of the adjustable parameter. For example, a similar system is proposed in
EP20167174.
[0004] Therefore, a solution is required which serves the need to recommend a setting of an adjustable parameter automatically, accurately, and fast. This is solved by the recommendation method suggested in claim 1, by the recommender system as per claim 12, and by the control unit of claim 13.
[0005] A computer implemented recommendation method RM for providing a recommended setting S for a given adjustable parameter PA to be adjusted, comprises a first step SR1 of providing a modeling function DCOM, a second step SR2 of providing a plurality
n with
n ≥ 2 of given input variables
VAR_{v} with
v = 1,...,n to a prepared or trained, respectively, recommender system RS as an input, and a third step SR3 of processing the input with the provided modeling function DCOM to determine the recommended setting S by the recommender system RS. Therein, different input variables
VAR_{v} correspond to different variable types
VART_{v}, for a particular variable type
VART_{v} a plurality
T_{v} of respective variables
VART_{v,t} is available with
t =
1,...,T_{v}, and for each variable type
VART_{v} only one variable
VART_{v,t} is provided as input variable
VAR_{v} = VART_{v,t}. The modeling function DCOM is a function trained based on a Gaussian process with
DCOM∼N(0,K^{coreg} +
σ^{2}I) defined by a characterizing covariance matrix
K^{coreg} and a corresponding characterizing kernel
[0006] The characterizing kernel
is a separable kernel, being defined by a product
of sub-kernels
k_{e}.
[0007] Each sub-kernel
k_{e} is designed such that it measures the similarity between two variables
VART_{e,t1}, VART_{e,t2} with
t1, t2 ∈ [1,..,T_{e}] of the same variable type
VART_{e}.
[0008] For example, each sub-kernel
k_{e} can based on a Radial Basis Function (RBF).
[0009] The recommendation method RM comprises an optional preceding preparation step SP and a subsequent recommendation step SR, wherein the recommendation step SR comprises the first step SR1, the second step SR2, and the third step SR3. The preparation step SP, if executed, performs a training to optimize the modeling function DCOM.
[0010] The function DCOM to be provided in the first step SR1 of the recommendation method RM is a Gaussian process based decomposition function which is trained in advance, i.e. before the first step SR1, in the preparation step SP on an n-dimensional settings database SDB. Therein, in the preparation step SP the settings database SDB is provided in a step SP1 containing known and/or assumed settings of the adjustable parameter. Therein, "assumed" settings can be settings which have not been observed or applied, but which can be assumed to be applicable or reasonable for the respective adjustable parameter in the respective scenario. The dimensions
DIM_{d} with
d = 1,...,
n of the settings database SDB correspond to the variable types
VART_{d}, i.e. one dimension
DIM_{d} corresponds to one variable type
VART_{d} while different dimensions
DIM_{d} correspond to different variable tpyes
VART_{d}. An iterative optimization of the decomposition function DCOM is performed in a step SP2 of the preparation step SP in a plurality of optimization steps OPTs, preferably by maximizing a log-likelihood with respect to trainable parameters of the decomposition function DCOM.
[0011] In each optimization step an initial function DCOM
_{ini} is provided, e.g. a default function in the very first optimization step or the resulting function from the preceding optimization step, and parameters defining the provided decomposition function DCOM
_{ini} are varied to define an actual decomposition function DCOM
_{act}. The settings database SDB is decomposed by applying the actual decomposition function DCOM
_{act} on the settings database SDB, resulting in a latent representation
LAT_{d} for each variable type
VART_{d} and the latent representations
LAT_{d} are joined to generate a reconstructed settings database SDB_rec, e.g. again utilizing the function DCOM
_{act}. The reconstructed settings database SDB_rec is compared with the provided settings database SDB, wherein the variation of the parameters of the decomposition function from DCOM
_{ini} to DCOM
_{act} aims at minimizing the difference between the settings database SDB and the reconstructed settings database SDB_rec.
[0012] In a concrete embodiment, a first variable type
VART_{1} corresponds to different operating states OS of an industrial facility, a second variable type
VART_{2} corresponds to different devices DEV of the facility, and in case n≥3 a third variable type
VART_{3} corresponds to different adjustable parameters PA of the devices DEV of the facility.
[0013] In that scenario, the settings database SDB contains known and/or assumed settings S for observed and/or assumed combinations of adjustable parameters
PA, devices
DEV, and operating states
OS. The recommendation method RM provides, upon receipt of input variables
VAR_{1} = OS1, VAR_{2} = DEV1, VAR_{3} =
PA1, a recommended setting S for a given adjustable parameter
PA1 for a given device
DEV1 for a given operating state
OS1 of the facility.
[0014] In a different embodiment a first variable type
VART_{1} corresponds to different customers CST of an industrial product provider, a second variable type
VART_{3} corresponds to different products PRD of the industrial product provider, and in case n≥3 a third variable type
VART_{3} corresponds to different purchase features PCF for purchasing the products PRD.
[0015] In that scenario, the settings database SDB contains known and/or assumed settings S for observed and/or assumed combinations of customers CTS, products PRD, and purchase features PCF. The recommendation method RM provides, upon receipt of input variables
VAR_{1} = CST1, VAR_{2} = PRD1, VAR_{3} = PCF1, a recommended setting S for a given purchase feature
PCF1 for a given product
PRD1 for a given customer
CST1.
[0016] The recommender system for providing a recommended setting S for an parameter PA to be adjusted comprises a computer configured to execute the method sketched above.
[0017] A control unit of an industrial facility which facility can be operated in a plurality of operating states OS is configured to control and adjust, respectively, settings S(OS,DEV,PA) of adjustable parameters PA of devices DEV of the facility. The setting S(OS,DEV,PA) of a particular device DEV depends on an actual operating state OS of the facility, and the control unit is configured to execute the recommendation method RM as sketched above to determine a recommended setting S(OS1,DEV1,PA1) upon receipt or provision of given input variables
VAR_{1} = OS1, VAR_{2} = DEV1, VAR_{3} = PA1.
[0018] Thus, a computer implemented recommendation method RM is suggested for providing a recommended setting S for an adjustable parameter PA, for example to be applied for a device DEV of a facility in a given operating state OS, i.e. the adjustable parameter has a certain value which can be adjusted such that said value corresponds to the recommended setting S. Another application could be the setting or value of a certain purchase feature for a product to be purchased by a customer.
[0019] The proposed approach is based on utilization of collaborative filtering with matrix decomposition, sparse variational Gaussian process, and coregionalization. In comparison to conventional matrix and tensor decomposition methods for collaborative filtering, the method proposed herein produces not only a point estimate but also a predictive variance, which indicates the uncertainty of the prediction. This uncertainty information adds more trustworthiness to the data driven solution and enables meaningful interactions between an AI system and the human responsible for the decision-making process. The invention can generate a predictive variance since it models the values in the matrix as a Gaussian distribution which is parameterized by the point estimate as expectation and the predictive variance. Due to the Gaussian process module in the proposed model, the approach is by nature more data efficient than, e.g. multi-way neural networks or other known approaches. In other words, GP needs less training samples since the GP focuses on deriving predictions from similarity between data samples instead of learning a mapping from arbitrary input to output. Furthermore, the proposed model consists of much fewer trainable parameters, resulting in the model being less prone to over-fitting. The data efficiency also originates from the choice of GP regression model.
[0020] It is to be understood that the elements and features recited in the appended claims may be combined in different ways to produce new claims that likewise fall within the scope of the present invention. Thus, whereas the dependent claims appended below depend from only a single independent or dependent claim, it is to be understood that these dependent claims can, alternatively, be made to depend in the alternative from any preceding or following claim, whether independent or dependent, and that such new combinations are to be understood as forming a part of the present specification.
DESCRIPTION OF THE FIGURES
[0021] In the following, possible embodiments of the different aspects of the present invention are described in more detail with reference to the enclosed figures. The objects as well as further advantages of the present embodiments will become more apparent and readily appreciated from the following description of the preferred embodiments, taken in conjunction with the accompanying figure in which:
- FIG 1
- shows an industrial facility with a plurality of devices;
- FIG 2
- shows a settings database SDB;
- FIG 3
- shows a flow chart of the recommendation method RM in a simple implementation;
- FIG 4
- shows a different visualization of the flow chart of the recommendation method RM.
DETAILED DESCRIPTION
[0022] FIG 1 shows a first exemplary embodiment for which the invention might be applied. An industrial facility 100, for example a production facility, comprises a plurality of devices 101-110 which are utilized to produce a product. For example, the devices 101-103 are device controllers, the devices 104-105 are electric motors, the devices 106-107 are pumps, and the devices 108-109 are valves. One 110 of the devices 101-110 can be a central control unit 110 of the facility 100 for controlling the other devices 101-109. Furthermore, the facility 100 comprises a sensor system 120 consisting of a plurality of sensors which are distributed across the facility 100 and the devices 101-109 to observe the facility 100 and its devices 101-109, at least with regard to their actual state of operation and potential error states. The sensors of sensor system 120 are not shown individually, but the sensor system 120 is represented as a whole by a box with dashed lines, surrounding the facility's 100 components. However, it should be clear that the arrangement of components 101-110, 120 in FIG 1 is only a symbolic presentation, but does not represent the realistic arrangement of devices in a facility.
[0023] In the following, devices 101-109 are referred to as DEV and DEV1, DEV2,..., as the case may be, to make the explanations more general and independent from the particular realization as motor, pump, valve, controller etc.
[0024] The facility 100 can operate in various known operating states OS, for example a first normal operating state OS1, e.g. to produce a first product, a second normal operating state OS2, e.g. to produce a second product, a maintenance state OS3, a shutdown state OS4, a first error condition state OS5, and a second error condition state OS6. The known first and second error conditions OS5, OS6 might differ in the error scenarios, e.g. with regard to particular devices DEV which might be damaged under the error conditions. For example, in the first error condition state OS5 a certain pump 106 of the production facility 100 might be damaged, while in the second error condition state the operation of a particular motor 104 could be erroneous. Additional or alternative operating states might be, for example, "start-up", "high-load", "low-load" state etc. Actually, the concrete selection of operating states is depending on the type of the industrial facility and, again for example, in case the facility is a production facility the selection of suitable operating states OS would be different than in case the industrial facility is a turbine or a magnetic resonance tomograph.
[0025] The operating states OS1-OS2 are shown in FIG 1 to be integrated in the central control unit 110 to illustrate that the central control unit 110 can be, but doesn't have to be, configured to observe and/or setup the operating states OS of the facility 100.
[0026] Each device DEV can have one or more adjustable parameters PA. For each adjustable parameter PA a corresponding value or setting S is required to operate the respective device DEV wherein such setting S depends on the actual operating state OS of the facility (for the sake of brevity, in the following only the term "setting" will be applied instead of "setting or value".). For example, the adjustable parameter of a device DEV="pump" might be PA="actual rotational speed" and the corresponding setting might be S="1000rpm" in the first normal operating state OS1 while it could be S="0rpm" in the shutdown state OS4 (therein, rpm stands for "revolutions per minute"). Besides the "actual rotational speed", a realistic facility and its devices would have several other adjustable parameters PA to be set for operation of the facility. For example, the adjustable parameters PA might comprise the "actual rotational speed" as a first adjustable parameter PA1, an "on/off" state as a second adjustable parameter PA2, an "actual torque" as a third adjustable parameter PA3, a "maximum rotational speed" as a fourth adjustable parameter PA4, a "maximum operation temperature" as a fifth adjustable parameter PA5, a "minimum power output" as a sixth adjustable parameter PA6, a "maximum power output" as a seventh adjustable parameter PA7, and an "orifice extension" as an eighth adjustable parameter PA8 etc.
[0027] It should be clear that the particular selection of devices DEV in the embodiments of the facility 100 is only exemplary and introduced in this limited scope for the sake of brevity of the description. A realistic facility would comprise significantly more devices and a broader bandwidth of device types than introduced above to achieve its technical purpose, e.g. to produce a product or to reasonably operate a turbine or a tomograph. However, the limited scope of introduced devices and device types "motor", "pump", "valve", and "controller" is sufficient to explain the invention and can be scaled up easily to a realistic technical facility by a person skilled in the art without deviating from the invention. In the same way, in a realistic scenario not only one or two error condition states and not only one or two normal operating states etc. might be imaginable, but a much more diversified plurality of operating states OS. The same is applicable for the scope of adjustable parameters PA. All this typically depends on the complexity of the facility 100 and its devices DEV etc. However, the limited scope of introduced operating states OS and adjustable parameters PA is again sufficient to explain the invention and can be scaled up easily to a realistic scenario by a person skilled in the art without deviating from the invention.
[0028] As a summary, the industrial facility 100 comprises a plurality of devices DEV and it can be operated under various operating states OS. Each one of the concerned devices DEV has at least one adjustable parameter PA which requires a setting S to operate the respective device DEV. Such setting S for a particular adjustable parameter PA1 of the particular device DEV1 depends on the actual operating state OS of the facility 100.
[0029] In case of changing conditions, e.g. a change of the operating state OS, settings S for devices DEV might have to be adapted to the changed scenario. For that purpose, it is beneficial when settings S are recommended by a computer implemented recommender system RS which can be, for example, implemented in and executed by the central control unit 110.
[0030] The recommender system RS provides a recommendation for a setting S depending on input variables
VAR_{v} with
v = 1,...,n. In the embodiment introduced above the input variables for the recommender system RS of the facility 100 are the actual operating state
OS=
VAR_{1} as a first dimension, the device DEV=
VAR_{2} for which a setting S of its adjustable parameter PA=
VAR_{3} is required as a second dimension, and the particular adjustable parameter PA=
VAR_{3} for which a setting S is sought as a third dimension. Correspondingly, the input variable space is three-dimensional including OS=
VAR_{1} as a first dimension ranging from OS1 to OSo with o>1, DEV=
VAR_{2} as a second dimension ranging from DEV1 to DEVd with d>1, and PA=
VAR_{3} as a third dimension ranging from PA1 to PAp with p>1. Thus, the first dimension DIM
_{1} corresponds to a first variable type VART
_{1}, in this case different operating states OS, the second dimension DIM
_{2} corresponds to a second variable type VART
_{2}, in this case different devices DEV, and the third dimension DIM
_{3} corresponds to a third variable type VART
_{3}, in this case different adjustable parameters PA of the devices DEV.
[0031] Different input variables
VAR_{v} for the recommender system RS correspond to different variable types
VART_{v}, wherein one of the variable types, e.g.
VART_{1}, might correspond to the operating states OS, another one of the variable types, e.g.
VART_{2}, might correspond to devices DEV, and a further one of the variable types, e.g.
VART_{3}, might correspond to adjustable parameters PA. For a particular variable type
VART_{v} a plurality
T_{v} of respective variables
VART_{v,t} is available with
t = 1,...,
T_{v}. However, for each variable type
VART_{v} only one variable
VART_{v,t} is provided as input variable
VAR_{v} =
VART_{v,t}.
[0032] Just for example, when the recommender system RS receives as input variables
VAR_{1}, VAR_{2}, VAR_{3} a particular operating state
VAR_{1}=OS1, a particular device
VAR_{2}=DEV1, and a particular adjustable parameter
VAR_{3}=PA1, e.g. OS1 = "first normal operating state", DEV1 = pump 107, and PA1 = "actual rotational speed", it provides as an output y(
VAR_{1}, VAR_{2}, VAR_{3}) the recommended setting S = y(
VAR_{1}, VAR_{2}, VAR_{3}) = "1000rpm" for the "actual rotational speed" of DEV1. Still as an example, in case the recommender system RS receives as input variables a particular operating state
VAR_{1}=OS2, a particular device
VAR_{2}=DEV1, and a particular adjustable parameter
VAR_{3}=PA1, e.g. OS2 = "second normal operating state" and DEV1 = pump 108 as well as PA1 = "actual rotational speed" as above, it provides as an output the recommended setting S = "2000rpm" for the "actual rotational speed" of DEV1 in the second normal operating state OS2. In the same way, other combinations of OS, DEV, PA would be characterized by certain settings S.
[0033] It should be clear that this is again only exemplary and that additional and/or other input variables than OS, DEV, and/or PA might be selected to be considered by the recommender system RS to provide a recommendation for the setting S of the adjustable parameter PA, depending on the particular application. Therefore, in a more abstract and general view the recommender system RS considers input variables
VAR_{1}, VAR_{2}, ...,
VAR_{n} to come to a recommendation for the setting S. For example and referring to the embodiment introduced above, input variables
VAR_{1} might represent the operating states OS, input variables
VAR_{2} might represent the devices DEV, and input variables
VAR_{3} might represent different adjustable parameters PA in case the devices DEV should have more than one adjustable parameter. In this scenario, the input variable space spanned by the variables
VAR_{1}, VAR_{2},... is three-dimensional.
[0034] The invention proposes a recommender system RS which is a system trained on known and/or assumed combinations of input variables
VAR_{1}, VAR_{2}, ...,
VAR_{n} as introduced above. For certain combinations of variables
VAR_{v} with v=1,...,n, the corresponding settings S are considered to be known, e.g. from history, and/or to be assumable based on experience. Such known data
VAR_{1}, VAR_{2}, VAR_{3}, and S from different known scenarios can be utilized to build up a settings database SDB which can have, for example, the form of an n-dimensional tensor with n corresponding to the dimension of the input variable space, i.e. n=3 in this example. This is schematically presented in FIG 2. Thus, the tensor SDB includes a first dimension
VAR_{1}, a second dimension
VAR_{2}, and a third dimension
VAR_{3} with the first dimension VAR according to all the possible operating states OS, the second dimension
VAR_{2} according to all the devices DEV, and the third dimension
VAR_{3} according to all the devices adjustable parameters PA. Thus, operating states OS1, OS2, ..., OSo are listed along the first dimension
VAR_{1} of the tensor SDB, devices DEV1, DEV2, ..., DEVd are listed along the second dimension
VAR_{2} of the tensor SDB, and adjustable parameters PA1, PA2, ..., PAp are listed along the third dimension
VAR_{3} of the tensor SDB.
[0035] Thus, in general terms an entry S at particular coordinates
VAR_{1_}0
, VAR_{2_}0
, VAR_{3_}0 in the tensor SDB represents a previously known or assumed setting
S(
VAR_{1}_0
, VAR_{2_}0
, VAR_{3_}0) of the adjustable parameter
VAR_{3_}0 for a particular second variable
VAR_{2}_0 for a particular first variable
VAR_{1_}0
. In more concrete words and with regard to the embodiment and to the exemplary situation of pump 107 mentioned above, in case
VAR_{1_}0=OS1 stands for "first normal operating state",
VAR_{2_}0=DEV1 represents pump 107, and
VAR_{3_}0 stands for the adjustable parameter "actual rotational speed", then the entry of the tensor and database SDB at position {
VAR_{1_}0
, VAR_{2_}0
, VAR_{3_}0} would be S(
VAR_{1_}0,
VAR_{2_}0
, VAR_{3_}0)="1000rpm". Correspondingly, the settings database SDB comprises entries S for all known or assumed combinations of operating states, devices, and adjustable parameters, with each entry S representing a setting of the respective adjustable parameter. However, in a realistic scenario with a typically huge amount of possible operating states, devices, and adjustable parameters and with possibly even more additional dimensions, i.e. n>3, and/or other variables it can be expected that most of the entries S are empty. I.e. the tensor and database SDB, respectively, is large and sparse, because many entries might either not be known or might not make sense because, for example, it would not make sense to assign a setting to an adjustable parameter "actual rotational speed" of a valve or to assign a setting to an adjustable parameter "orifice extension" for a motor.
[0036] Nevertheless, the data available in the settings database SDB can be utilized to realize a recommendation method RM for providing a recommended setting S. Before detailing this, a second embodiment and application opportunity, respectively, of the invention shall be described. After that, the recommendation method RM will be illustrated in a general way which is applicable for both the first and the second embodiment as well as for other applications.
[0037] As described in more detail above, in the first exemplary embodiment of the invention the entries S in the settings database SDB are values to be applied for adjustable parameters of devices of the facility in certain operating states of the facility. However, it is clear that the invention can be applied in numerous applications besides the one explained in the context of the first exemplary embodiment, concerning the industrial facility. In the second embodiment of the invention, which is again only another example for an application of the invention, the first variable
VAR_{1} can represent a plurality of customers CST of certain products PRD, i.e.
VAR_{1}=CST, wherein a plurality of such products PRD is represented by the second variable
VAR_{2}, i.e.
VAR_{2}=PRD. The third variable
VAR_{3} can represent purchase features PCF, i.e.
VAR_{3}=PCF, which describe conditions of a purchase of a product PRD by a customer CST.
[0038] I.e. the corresponding tensor and settings database SDB, respectively, would again be three-dimensional (n=3) with the customers CST1, CST2, ..., CSTcmax listed along the first dimension
VAR_{1,} the products PRD1, PRD2, ..., PRDrmax listed along the second dimension
VAR_{2}, and the purchase features PCF1, PCF2, ..., PCFfmax listed along the third dimension
VAR_{3}. A particular entry S (CSTi,PRDj,PCFl) of the tensor SDB with 1≤i≤cmax, 1≤j≤rmax, and 1≤l≤fmax would describe the setting of the corresponding particular purchase feature PCFl for the corresponding particular product PRDj for the corresponding particular customer CSTi. For example, one of the purchase features PCF might be the final price of a product PRD for a certain customer CST wherein the final price might be differing from the original recommended retail price for that product PRD. I.e. the final price PCFl might be the result of negotiations between the particular customer CSTi and the provider of the product PRDj and might include a corresponding discount. Alternatively, instead of the final price one of the purchase features PCF might represent the achieved discount instead of the final price. Thus, in case the purchase feature PCFl represents such a discount, the particular entry S (CSTi,PRDj,PCFl) represents the discount, e.g. in EUR or USD or in %, granted to customer CSTi for the purchase of the product PRDj. Other purchase features might be, for example, a purchase frequency, an amount of particular purchased products, a time interval of purchases etc.
[0039] As a summary of the above, both in the first and in the second embodiment a three-dimensional settings database SDB is available which includes data and settings, respectively, known from experience, from commissioning phases, from earlier deals, from maintenance activities etc. or assumed based on experience. Of course, the database SDB might have more or less dimensions than n=3, depending on the underlying application.
[0040] The availability of the respective database SDB in the first and in the second embodiment, respectively, each time allows in a simple approach to recommend a suitable setting S of a certain adjustable parameter
VAR_{3}_1=PA either for a certain device
VAR_{2_}1=DEV in a certain operating state
VAR_{1_}1=OS in the first embodiment or for a certain product
VAR_{2_}1=PRD for a certain customer
VAR_{1_}1=CST in the second embodiment, in case such a combination
VAR_{1}_1,
VAR_{2_}1
, VAR_{3_}1 has an entry S (
VAR_{1_}1,
VAR_{2_}1
, VAR_{3_}1) in the settings database SDB. In that situation, the recommendation can simply be the respective entry in the database SDB.
[0041] However, in case a setting S of a configuration parameter PA is required for a combination of variables
VAR_{1}',
VAR_{2}',
VAR_{3}' which does not have an entry in the database SDB, the recommendation method RM as depicted in FIG 3 can be applied to recommend such a setting S (
VAR_{1}',
VAR_{2}',
VAR_{3}'). Such a situation in which a combination of
VAR_{1}', VAR_{2}', and
VAR_{3}' is applicable for which the settings database SDB does not have an explicit entry S, might occur simply because that specific combination of
VAR_{1}', VAR_{2}', and
VAR_{3}' has not yet been implemented. However, that situation might also occur in various scenarios, for example in the first embodiment because a previously unknown operating state OS emerges and/or because a new device DEV has been integrated in the facility 100, and for example in the second embodiment because a new customer CST wants to purchase a product PRD or because a new product PRD is offered which has not been available before.
[0042] The recommendation method RM applies a recommender system RS which is a system which has been trained based on the knowledge from the settings database SDB as described below. Therewith, the recommender system RS is configured and trained to provide as an output a suitable setting S (
VAR_{1}',
VAR_{2}',
VAR_{3}') as a reply to an input {
VAR_{1}',
VAR_{2}',
VAR_{3}'}
, even in case such combination
VAR_{1}',
VAR_{2}', VAR_{3}' is not known and does not have an entry in the settings database SDB.
[0043] The invention is based on the concept to model the settings database SDB utilizing the approach of Gaussian processes. Therein, it is proposed to combine Gaussian processes with representation learning for collaborative filtering via matrix decomposition. In the following, basic concepts of collaborative filtering and Gaussian processes will be addressed before the recommendation method RM itself will be introduced.
[0044] Collaborative filtering provides a powerful solution to recommender systems. For example, recommending a setting for an adjustable parameter for a device and for a certain operating state is based on the assumption that similar devices or devices of similar device type, respectively, require similar settings of their adjustable parameters in similar operating states. A database describing such interactions between devices, operating states, adjustable parameters etc. often takes the form of a matrix or a tensor as described above (SDB), where each entry S describes the interaction between one operating state, one device, and one adjustable parameter, wherein, in this example, the interaction is expressed as the setting S. The overall pattern of an operating state can therefore be described by the corresponding row in such a matrix or tensor. However, since there are typically large numbers of operating states, devices, and adjustable parameters in the database, i.e. a large number of variables
VAR_{1}, VAR_{2}, VAR_{3},..., and for each variable
VAR_{v} with v=1,...,n usually only a small subset of the other variables is applicable, the tensor is often large and sparse. Therefore, it is advantageous to derive abstract feature vectors to represent the variables. This understanding inspired a wide variety of low-rank matrix decomposition models such as non-negative matrix decomposition, biased matrix decomposition, and non-parametric decomposition. These methods aim at learning low dimensional representations for all variables, allowing for the prediction of the unobserved interaction between a new combination of variables. The high dimensionality and sparsity of the observed data become even more challenging with the number of variables
VAR_{v}, i.e. if the observed data are represented by a tensor with more than two dimensions. Such tensors are often used to describe the adjacency of entities in a knowledge graph, or different aspects of an interaction between two or more variables. In order to predict the existence of an unobserved entry, scalable tensor decomposition methods have been proposed, including "RESCAL", "TransE", and multi-way neural networks. These tensor decomposition methods, originally designed for sparse 3-way tensors, can be also adapted to 2-way matrix cases, which enables building large recommender systems, as well as to model non-linear interactions.
[0045] Gaussian processes belong to a class of models that can generate uncertainty prediction along with point estimates via its kernel function, which measures the similarity between data samples. Gaussian processes can be applied to perform probabilistic matrix decomposition, also resulting in dimensionality reduction, e.g. via the "Gaussian Process Latent Variable Model" (GPLVM). Here, a Gaussian process is used as a prior distribution for a function that maps a low-dimensional latent space to the high-dimensional data matrix. The individual components of this mapping function are modelled as independent draws from a Gaussian process, assuming all dimensions of the data matrix are independent. This independence assumption might be problematic in the context of collaborative filtering, where modelling the interdependency of both users and items is considered beneficial. While GPLVM has been used as matrix factorization method for collaborative filtering in the past, as for other matrix decomposition approaches, practical applicability can be limited also for computational reasons. Being designed primarily as a dimensionality reduction method for data with few missing values, it is not amenable to the triple-based learning paradigm that facilitates efficient modeling of large and highly sparse datasets and typically requires storing the dense data matrix in memory.
[0046] However, coming back to the concrete invention, the recommendation method RM combines Gaussian processes with representation learning for collaborative filtering via matrix decomposition as mentioned above. The method RM intrinsically includes learning of a latent representation vector for each entity type, i.e. in the given embodiment operating states, devices, and adjustable parameters, jointly with a multi-output Gaussian process that predicts not only a point estimate, i.e. the recommended setting S, but also its predictive variance, i.e. the certainty or reliability of the recommendation. This design is furtheron motivated by connections to coregionalization with separable kernels in Gaussian processes as well as to the GPLVM.
[0047] For the sake of brevity and in order to reduce complexity, it is assumed in the following that the input variable space is only two-dimensional, i.e. n=2, including only
VAR_{1} and
VAR_{2} as input variables. This is other than in the embodiments introduced above which included a three-dimensional input data space
VAR_{1}, VAR_{2}, VAR_{3}, namely operating states OS=
VAR_{1}, devices DEV=
VAR_{2}, adjustable parameters PA=
VAR_{3} in the first embodiment and customers CST=
VAR_{1}, products PRD=
VAR_{2}, purchase features PCF=
VAR_{3} in the second embodiment. In the concrete embodiment introduced above the two-dimensionality could be interpreted such that, for example, each device has only one adjustable parameter so that it would not be required to provide the adjustable parameter as an input variable to the recommender system RS and recommendation method RM, respectively.
[0048] In any case, the proposed approach for recommending a setting S for an adjustable parameter PA as introduced below is scalable and can be varied from the exemplary two-dimensional input variable space to any n-dimensional input variable space with n>1.
[0049] Again referring to collaborative filtering and now assuming the two-dimensional input variable space with variables
VAR_{1}, VAR_{2}, a dataset often takes the form of a matrix
Y in
assuming
IV1 different variables
VAR_{1}, e.g. different operating states OSi with i=1,...,
IV1, and
JV2 different variables
VAR_{2}, e.g. different devices DEVj with
j=1,...,JV2. Each entry S (
VAR_{1_}i
, VAR_{2_}j), e.g. representing a setting S(OSi,DEVj) of an adjustable parameter for particular device DEVj for particular operating state OSi, describes the interaction between variable
VAR_{1_}i and variable
VAR_{2_}j
. Such a matrix Y is typically large and sparse as mentioned above.
[0050] In order to derive a new recommendation for a combination of variables
VAR_{1}', VAR_{2}' which has not been observed before and/or for which no interaction has taken place yet, i.e. for which no entry S exists in the matrix Y, such entry has to be predicted and collaborative filtering is a suitable approach to such prediction. A solution to this collaborative filtering task is based on a matrix decomposition approach.
[0051] During training, one fits a decomposition model function DCOM that can recover all known entries S in the matrix
Y to a certain extent, i.e. the training process includes an iterative optimization of model parameters defining the decomposition model DCOM until such recovering is achieved. Such training process itself is generally known and shall not be addressed herein. Instead, the invention concerns the function to be trained, i.e. the decomposition model DCOM.
[0052] As a simple, general illustration, the matrix Y can be modeled as
Y ≈ AB^{T}, with
and
with
r being the rank of the matrix Y. Equivalently, this can be expressed as
for all
(i,j) ∈
[1, IV1] ×
[1,JV2]. Therein,
a_{i} and
b_{j} denote the i-th and j-th row in
A and
B, respectively.
[0053] At inference time, any entry value that is not available in the training data, i.e. in SDB, can be predicted by reading the corresponding location from a reconstructed matrix
Ŷ = AB^{T}. The generalization power of the decomposition model DCOM lies in the low-rank matrices
A and
B applied for the reconstruction of
Ŷ, wherein the factor matrices
A and
B can be interpreted as latent representations of the variables
VAR_{1}, VAR_{2}.
[0054] In order to optimize the modeling performance for decomposing
Y, it is possible to apply more complex functions
g with the decomposition model DCOM than, for example, the dot product to join the latent representations. Expressing the model definition as
y(i,j) = g(a_{i,}b_{j}) for all
(i,j) ∈
[1,IV1] ×
[1,JV2], the function
g can be realized in various ways, depending on the decomposition approach, e.g. via singular value decomposition, biased matrix decomposition, multi-way neural networks, or GPLVM.
[0055] GPLVM, for example described in "
Gaussian process latent variable models for visualisation of high dimensional data" by N.D. Lawrence in "Advances in neural information processing systems", pp. 329-336, 2004, and in "
Learning for larger datasets with the gaussian process latent variable model" by N.D. Lawrence in "Artificial Intelligence and Statistics", pp. 243-250, 2007, solves the matrix decomposition task by fitting JV2 independent GP regression models on unobserved latent variables
x_{i} with
Therein,
y_{*,j} stands for the j-th column of
Y and covariance matrix
captures the covariances between each pair of latent representations
(a_{i}, a_{i}') defined by the covariance function or kernel
k(a_{i}, a'_{i}). Such kernel function
k(a_{i}, a'_{i}) is the kernel function applied on two rows in
A and it quantifies the similarity between the two rows. In the same way, kernel function
k(
b_{j},b'_{j}) would be the kernel function applied on two rows in
B.
[0056] In other words, the GPLVM can be interpreted as a multiple-output GP regression model where only the output data are given and the unobserved inputs are being optimized.
[0057] In general, GPLVM treats all output dimensions in the matrix
Y as independent. This is an assumption that may not always hold in a realistic scenario, e.g. the adjustable parameters of the devices could be correlated. To capture such dependencies between output dimensions, the invention proposes a new coregionalization kernel
k to perform multi-output Gaussian Process regression for unsupervised matrix decomposition in the spirit of the GPLVM.
[0058] The proposed kernel
k is separable and can be written as the product of two individual kernels
k^{A} and
k^{B}. Correspondingly, the proposed kernel and the corresponding covariance matrix can be written as
with
and
[0059] For the sake of symmetry, the same covariance function is chosen for
k^{A} and
k^{B}, e.g. based on the known "Radial Basis Function" (RBF) . As for matrix
k^{A}, each element in
k^{B} measures the similarity between a pair of output dimensions (or devices)
(b_{j}, b_{j}'). As the terms
k_{A}(i,i') and
k_{B}(j,j') of the equations above imply, the latent variables in matrices
A and
B can be treated as hyper parameters in a specific kernel that takes as input the indices. From a functional perspective, the proposed kernel
k^{coreg}((i,j), (i',j')) = k_{A}(i, i').
k_{B}(j,j') measures the similarity between pairs of
a and
b, i.e.
(a_{i},b_{j}) (or (i,j)) and
(a_{i}',b_{j}') (or (i',j')), i.e. pairs of first and second variables
VAR_{1}, VAR_{2}. The measurement is performed out by evaluating the two kernel functions
k_{A}, k_{B} at (i,i') and (j,j'), respectively, and calculating the product of both kernel values. The training samples can be thus formulated using only the indices and corresponding entries in the matrix, i.e.
(i,j,y_{i,j}) for all
i,j ∈
[1,IV1] ×
[1,JV2]. With such "triple" formulation, the approach achieves much higher computational efficiency compared to other uncertainty-aware matrix decomposition methods such as GPLVM. The input to the model takes the form of the indices
(i,j) and the training target is a scalar
y_{i,j}, in case of matrix decomposition. The GPLVM, on the other hand, always attempts to predict the entire row in the matrix
Y from the entire latent vector
a_{i}. This can be a severe issue if the matrix to be decomposed is large, which might not always fit into the computer memory. Thus, this "triple" data format is especially advantageous in case of large number of items and high sparsity in matrix
Y since it does not require storing the data in dense matrix format.
[0060] Eventually, the decomposition model DCOM can be written as
DCOM∼N(0, K^{coreg} +
^{σ2}I) with
The vector
DCOM has a length of
IV1·JV2, indicating that it consists of the outcome of all possible combinations of
IV1 first variables
VAR_{1}, e.g. different operating states, and
JV2 second variables
VAR_{2}, e.g. devices.
[0061] In a different perspective, the proposed kernel
k is still separable and can be written as the product of two individual kernels. Here the first kernel measures similarity of samples in the input space and the second kernel captures the similarity between each pair of output dimensions. More formally, this kernel and covariance matrix, respectively, takes the form
or equivalently,
K =
Therein,
k^{A} and
k^{W} are scalar kernels on
A ×
A and
[1,...,JV2] ×
[1,...,JV2], respectively, and
W_{i,j} = k^{W}(i,j) is a symmetric and positive semi-definite matrix, which models the dependency between each pair of outputs. While coregionalization is usually performed in the context of supervised regression, the inputs
a_{i} are unobserved in the approach proposed herein and, as for the GPLVM model, need to be optimized.
W being the identity matrix implies independence between outputs and the model falls back to the standard GPLVM. There is a variety of approaches for choosing
W, ranging from the design of a symmetric and positive semi-definite matrix based on suitable regularizers to the choice of covariance functions for the different output components. However, herein it is chosen to replace the coregionalization matrix
W by a kernel on latent representations of variables
VAR_{2} in vector space
B, such that the covariance matrix can be written as a Kronecker product of the covariance matrix of the latent variables representing the second variable
K^{B} and the covariance matrix of the latent variables representing the first variable
K^{A} Thereby representation learning is combined with coregionalization in multi-output GPs for collaborative filtering via unsupervised matrix factorization.
[0062] In summary, the model proposed herein performs a sparse matrix decomposition, where the representation vectors of the first variable
VAR_{1} and the second variable
VAR_{2} are consumed by a GP regression with specifically designed kernel. The kernel measures the similarity between two pairs of the first and the second variable
VAR_{1}, VAR_{2} in form of
(i,j) and
(i',j'). At inference, the model generates a Gaussian distribution the mean of which serves as point estimate, therewith representing the aspired recommended setting, and the variance of which serves as corresponding prediction uncertainty.
[0063] Coming back to the first embodiment, the recommendation method RM and system RS can be implemented in the central control unit 110 of the facility 100. Thus, the central control unit 110 is configured to process an actual operating state OSact of the facility 100 to select an optimal setting S o each adjustable parameter PA of each device DEV of the facility. The actual operating state OSact can either be selected manually by an operator of the facility 100 or it can be determined automatically by the central control unit 110. The automatic determination of the actual operating state OSact can either be based on a regular schedule of the facility 100, e.g. in case a first product has to be produced in a first time span and a second product, requiring a different operating state, shall be produced in a subsequent time span, or it can be determined based on an regular, ongoing observation of the facility 100. Such observation can utilize data of sensors which might be distributed across the facility 100 wherein such sensor data might give insights the actual states of the devices DEV, into the overall status of the facility 100 etc. Especially, the sensors would be configured and arranged to detect an error state of the facility 100 so that the central control unit 110 can trigger and implement a suitable operating state OSact upon detection of such an error state. However, the central control unit 110 is configured to process the actual operating state OSact of the facility 100 to select an optimal setting S of each adjustable parameter PA1,...PAp of each device DEV DEV1,...DEVd of the facility. This happens by providing the triplet consisting of OSact, PAi, DEVj with i=1,...,p and j=1,...,d for each I and each j to the recommender system RS. Upon receipt, the recommender system RS provides recommended settings S(OSact,PAi,DEVj) utilizing the recommendation method RM which has been trained as described above. In a possible embodiment, the recommended settings S are then implemented on the concerned devices DEVi by the central control unit 110 which has, at least for this purpose, full control over those devices DEVi of the facility 100.
[0064] FIG 4 shows another visualization of the recommendation method RM. The method RM is composed of a recommendation step SR and an optional preparation step SP, preceding the recommendation step SR. Therein, the preparation step SP is "optional" in so far as that the training procedure of the function DCOM, which is the subject of the preparation step SP, does not necessarily have to be executed each and every time the recommendation method RM is applied, but only, for example, when the settings database SDB is extended by additional data. However, the recommendation step SR comprises a step SR1 of providing a modeling function DCOM, e.g. received from the preparation step SP, a step SR2 of providing a plurality
n with
n ≥ 2 of given input variables
VAR_{v} with
v =
1,...,n the recommender system RS as an input, and a step SR3 of processing the input with the provided modeling function DCOM to determine the recommended setting S.
[0065] The preparation step SP, if executed, performs the training to optimize the modeling function DCOM to be provided in the first step SR1 of the recommendation method RM. As mentioned, DCOM is a Gaussian process based decomposition function which is trained in advance, i.e. before the first step SR1, in the preparation step SP on the n-dimensional settings database SDB. In the preparation step SP the settings database SDB is provided in a step SP1. An iterative optimization of the decomposition function DCOM is performed in a step SP2 in a plurality of optimization steps OPTs, preferably by maximizing a log-likelihood with respect to trainable parameters of the decomposition function DCOM. In each optimization step OPTs, an initial function DCOM
_{ini} is provided, e.g. a default function in the very first optimization step or the resulting function from the preceding optimization step, and parameters defining the provided decomposition function DCOM
_{ini} are varied to define an actual decomposition function DCOM
_{act}. The settings database SDB is decomposed by applying the actual decomposition function DCOM
_{act} on the settings database SDB, resulting in a latent representation
LAT_{d} for each variable type
VART_{d}. The latent representations
LAT_{d} are joined to generate a reconstructed settings database SDB_rec, the reconstructed settings database SDB_rec is compared with the provided settings database SDB, wherein the variation of the parameters of the decomposition function from DCOM
_{ini} to DCOM
_{act} aims at minimizing the difference between the settings database SDB and the reconstructed settings database SDB_rec. As soon as the difference falls below a targeted threshold or achieves a saturation, the optimization procedure is ended and the then actual decomposition function DCOM
_{act} is provided to the recommendation step RS.
[0066] However, coming back to vector
DCOM and its typically remarkable length of
IV1·JV2, fitting such a model can be challenging in practice due to the size of the covariance matrix
since there are typically large numbers of first and second variables
VAR_{1}, VAR_{2}. Computing the inverse of the covariance matrix, which is required for computing the log marginal likelihood, has a complexity of
where
num is the number of training samples, i.e.
num =IV1·JV2 in the example introduced herein. This challenge can be addressed via sparse GP approaches. A sparse GP model introduces
m << num so called inducing points, denoted as
Z_{m} that represent the entire dataset.
[0067] Such inducing points are optimized as additional parameters in a GP model. For example, "Sparse Pseudo-input Gaussian Process" (SPGP) as introduced in "
Sparse gaussian processes using pseudo-inputs" E. Snelson et al. in "Advances in neural information processing systems", pp. 1257-1264, 2006, proposes to treat them as pseudo-inputs, i.e. parameters in the kernel definition, in a similar fashion as the index kernels
k_{A} and
k_{B} are defined herein. On the other hand, the "
Sparse Variational Gaussian Process" model (SVGP) as introduced by M. Titsias in "Variational learning of inducing variables in sparse gaussian processes" in "Artificial Intelligence and Statistics", pp. 567-574, 2009, augments the model with inducing variables
u with a Gaussian prior
p(u) that contains the values of a function f, e.g. DCOM, at inducing points
Z_{m}. With this augmentation, a variational lower bound of the marginal log likelihood can be derived:
[0068] Therein,
q(u) is the variational posterior on the inducing variables
u and
is the covariance function evaluated between all the inducing points and
K_{num,m} is the covariance function between all inducing points and training points.
[0069] The essential motivation of applying inducing points is to avoid calculating and inverting the full covariance matrix
K in
Instead, one only calculates the kernel between a training sample and all inducing points as in Kmn, and the kernel between two inducing points as in
K_{numm,m}, which is defined as
in the specific case of coregionalization.
[0070] Therefore, SVGP reduces the computation complexity from
O((IV1·
JV2)^{3}) to
O((IV1·JV2) ·
(m_{A} ·
m_{B})^{2}). The triple-based formulation
(i,j, y_{i,j}) introduced above means that the set of training samples consists only of observed triples. By making use of the SVGP approximation, it is only required to compute the kernel between such a training sample and all inducing points or between pairs of inducing points. This in turn means that for matrices with a sparsity SPR, which is the fraction of observed interactions between first and second variables
VAR_{1}, VAR_{2}, the computational complexity reduces to
O(SPR ·
(IV1 · JV2) ·
(m_{A} ·
m_{B})^{2}).
[0071] In practice, the computational complexity can be further reduced if the same number of inducing points
m is chosen in
A and
B. This allows to learn coupled pairs of inducing points in both spaces. By tying the parameters of both sets of latent representations, the effective size of
K_{m,m} can be reduced to
m × m and further reduce computational complexity to
O(SPR ·
(IV1 · JV2) · m^{2}).
[0072] The proposed method is not limited to collaborative filtering in two-dimensional matrices, e.g. comprising the first and the second variable
VAR_{1}, VAR_{2}. It can be easily extended to perform tensor completion tasks for an n-dimensional tensor with n>2. For example, in case of n=3, representing a three-dimensional input variable space with first, second, and third variable
VAR_{1}, VAR_{2}, VAR_{3}, the target tensor is
Y ∈
and its entries describe the interactions among three types of entities
VAR_{1}, VAR_{2}, VAR_{3}, e.g.
VAR_{1} representing different operating states OS,
VAR_{2} representing different devices DEV, and
VAR_{3} representing different adjustable parameters PA. In this application, the "interactions" would again be the settings S for the respective adjustable parameters. Alternatively, the tensor could be an adjacency tensor that represents a knowledge graph, where the first and second dimensions represent all entities, and the third dimension represents multiple types of relations between entities.
[0073] In such data situations, the training samples become
(i,j,l, y_{i,j,l}) and correspondingly, the coregionalization kernel becomes
k^{coreg}((i,j,l), (i',j',l')) =
k_{A}(i,i') ·
k_{B}(j,j') · k_{C}(l,l') by introducing one more kernel
k_{C} defined in the third dimension with hyper parameter C, which contains the latent representations of the third dimension, e.g. the adjustable parameters PA. Even more general and considering an arbitrary number of dimensions n, the separable kernel is defined by a product
of sub-kernels
k_{e}, wherein each sub-kernel
k_{e} is designed such that it measures the similarity between two variables
VART_{e,t1}, VART_{e,t2} with
t1,t2 ∈
[1,...,T_{e}] of the same variable type
VART_{e}.
[0074] While the present invention has been described above by reference to various embodiments, it should be understood that many changes and modifications can be made to the described embodiments. It is therefore intended that the foregoing description be regarded as illustrative rather than limiting, and that it be understood that all equivalents and/or combinations of embodiments are intended to be included in this description. Thus, the invention is not restricted to the above illustrated embodiments but variations can be derived by a person skilled in the art without deviation from the scope of the invention.