Field of the Invention
[0001] The invention relates to a method for providing an agent for creating a graph neural
network architecture, a method for creating, by an agent, a graph neural network architecture,
an agent and a unit for providing an agent.
Further, the invention relates to a computer program product and computer readable
storage media.
Background
[0002] During the design or configuration of a complex system, engineers have to find a
solution, so that the complex system provides a sufficiently good performance. By
use of their domain knowledge a system that satisfies a number of functional and non-functional
requirements is developed. For cost and efficiency reasons this development is done
in a first step on a computer, before prototypes are produced. The found solution
is, e.g. a corresponding component of a car, is provided from the engineering service
to a customer, e.g. a car manufacturer, where it is -depending on the manufacturer's
decision- realized, e.g. as prototype, which might then- possibly after amendments,
put into practice.
[0003] An example of a complex system could be a hybrid vehicle where a function requirement
is speech recognition capability and a non-functional requirement of reaching 100
kilometers per hour from a standstill in 4 seconds without consuming more than 20
ml of fuel. In this context, function requirements specify what a system should or
should not do, e.g., to have speech recognition or acceleration capability, whereas
non-functional requirements specify how it should be done, e.g. such that the consumption
restriction is obeyed. As there are many options to adapt the complex system, this
task relies on the engineer's experience who must consider a multitude of possible
discrete configurable options, e.g., electric motor and/or internal combustion engine,
and continuous options, e.g., engine size or battery capacity. The number of possible
systems that can be generated by varying the options grows exponentially and very
often only a small portion of those possibilities satisfies the requirements.
[0004] Engineers leverage their experience to decide which design are most promising, but
still require feedback from simulation environments to determine if the requirements
are satisfied.
By simulation is meant that a behaviour of the complex system is approximated by a
deterministic model in which depending on input data, output data which are of interest,
are determined. However, simulations are often time consuming as e.g., many interdependencies
must be considered and reflected in the respective algorithms used for the simulation.
Sometimes also the interdependencies are not known and can therefore not enter the
simulation.
[0005] Therefore, to provide a quick feedback to the engineer, during the recent years as
an alternative the use of neural networks has been proposed. By a neural network the
behaviour of the complex system can be modeled. For this, the neural network is trained
by using training data in order to reflect the complex system's behaviour. During
the training, feedback has to be given on the results obtained by the neural network
in order to gradually develop a sufficiently good representation of the complex system
by the neural network.
[0006] This "training" or feedback process requires a lot of time and work. Moreover, there
are of course many different complex systems or problems to be explored in the complex
systems and the above described process must be performed for every single complex
system or problem. Therefore, a lot of time and money is required to perform this
engineering task to find an optimum design e.g. for a hybrid car or a production line.
[0007] It is therefore one object of the invention to offer a possibility for making the
design process more efficient and thus providing the possibility to improve the design
process.
Brief Summary of the Invention
[0008] This is solved by what is disclosed in the independent claims. Advantageous embodiments
are subject of the dependent claims.
[0009] The invention relates to a method for providing an agent, which can create a graph
neural network architecture. The invention further relates to a method for creating
a graph neural network architecture by an agent, an agent, a unit for providing such
agent, a computer program product and storage media with such computer program product.
[0010] It is one consideration in the context of the invention to create a graph neural
network not by hand, i.e. by an engineer using his/her technical domain knowledge
and knowledge about how to implement this while using neural networks, but rather
have this graph neural network created by a software adapted to this task by machine
learning.
In particular, this has the advantage of reducing costs. Further, the task can be
performed faster. Even further, more possibilities for the graph neural network can
be explored, which leads to better predictions by the created graph neural network.
The predictions contain technical indicators, e.g. fuel consumption, of a complex
system, e.g. a hybrid car. Therefore, the design process for the complex system can
be significantly improved. Even further, thus unknown dependencies between a large
variety of systems can be considered.
[0011] An agent is a unit which can perform actions to create the graph neural network architecture
autonomously. It may be a piece of software which has undergone the process of machine
learning.
[0012] A graph neural network architecture is a combination of neural network components
of which at least one is a graph neural network layer. In graph neural networks (GNNs)
dependencies in graphs can be captured. By a graph which contains nodes and edges
to interconnect the nodes a complex system, i.e. a system having multiple components
and interdependencies can be described.
[0013] For the iterative process in which the agent learns how to modify a GNN in an advantageous
way, a variety of system designs of complex systems is provided to the agent. The
variety may be chosen such as to provide different system designs in the same technical
field. In the example of the hybrid car, they may comprise different implementations
of hybrid cars. Alternatively or additionally, the variety may comprise system designs
of cars with only electrical drive technology. Additionally, system designs of purely
combustion enginebased drive technologies may be added. Further, other systems using
electric motors such as quadrucopter may be part of the variety.
[0014] According to an advantageous embodiment the iterative process is a machine learning
process, by which a feedback loop in the iterative process can be speed up. In particular
the machine learning may be a reinforcement machine learning.
[0015] An iterative process, in particular a machine learning process, is started by the
agent. In this iterative process an initial starting graph neural network architecture
is modified in at least one aspect, e.g. adding a graph convolutional layer, and thus
a different, intermediate graph neural network architecture is obtained.
[0016] This intermediate graph neural network architecture is then trained with training
data for the chosen system design describing a complex system, e.g. a specific hybrid
car.
[0017] For example, the training may be finished, if a certain amount of training data provided
for the hybrid car has been used or if a predefined set of indicators determined at
the end of a loop the is within a certain range around the value expected from the
actually measured data or also after having used a certain amount of time.
According to an advantageous embodiment, the training may have the same training parameters
such as e.g. batch size, for each loop. Thus, the results of the determination of
the indicator is straightforward comparable.
[0018] After the training is finished, a prediction of at least one indicator for the complex
system and its quality is determined. For example, quality may be determined by accurateness
or/and required training time.
[0019] From the quality of the prediction, a reward value is calculated. For example, the
reward value depends on how much improvement in the quality of the prediction has
been achieved with the current intermediate graph neural network in comparison to
the previous intermediate graph neural network.
[0020] According to an advantageous embodiment, for the first loop the reward function has
a positive value, e.g. by ensuring that an initial quality value is zero or a negative
number. In this way it can be proceeded to the next loop even if there was no previous
graph neural network
[0021] This process of modifying, training, determining, evaluating until deriving a reward
function is repeated until an exit criterion is met, i.e. the predictions meet a predefined
property, e.g. are within a predefined accuracy range.
[0022] The agent then chooses a new system design and starts to perform a new iterative
process until again a suitable graph neural network architecture is achieved for the
newly picked system design.
[0023] This is repeated for a plurality of system designs until an agent exit criterion
for the agent is met.
According to an advantageous embodiment the agent exit criterion is that all available
system designs out of the variety have been used in the iterative process. Thus, it
can be ensured that al available information has been used and that the agent could
gain the widest possible "experience". According to another advantageous embodiment
the agent exit criterion is met if the iterative process has gone through a predefined
subgroup, e.g. all system designs for hybrid vehicles. This reduces the time needed
to obtain an efficient agent. Moreover, most of the information contained in the system
design, which is relevant for hybrid cars should be collected in this way if there
are no technical overlaps with system designs for e.g. production systems.
[0024] The thus obtained agent is able to create a suitable graph neural network architecture
also for unknown system designs, e.g. a new hybrid car.
The suitable graph neural network architecture is obtained when the agent has finished
the iterative process of modifying the graph neural network architecture, i.e. the
exit criterion is met. Then the last intermediate graph neural network is taken as
the suitable graph neural network architecture.
Advantageously then predictions of indicators, e.g. key performance indicators for
this new hybrid car can be made in a shorter time than if it was designed by a machine
learning expert and moreover more or predefined information can be used for this design.
This can make the predictions more reliable and also more understandable. Furthermore,
the predictions become more accurate, e.g. because more relevant information and their-
before unknown- interdependencies are considered.
[0025] A unit that provides such an agent may be a computing system with a collection of
software, e.g. Siemens TIA or the Siemens Simcenter
™. The computing system comprises at least one processing unit, storage unit and communication
interface. For example, the agent may be executed at a computer system on the provider's
side and a customer enters via the communications means its system design and receives
the desired predictions.
[0026] According to an advantageous embodiment, the agent may be downloaded from there or
provided as computer program otherwise, e.g. on computer readable storage media
Brief description of the drawings:
[0027] Further embodiments, features, and advantages of the present invention will become
apparent from the subsequent description and dependent claims, taken in conjunction
with the accompanying drawings of which show:
Fig.1 an example of how actual complex systems, i.e., realworld systems are translated
into knowledge graphs by using a standardized modelling language;
Fig. 2 and 3 an example of the standardized network model of Fig.1, centre, in more
detail;
Fig. 4 shows the knowledge graph of the hybrid vehicle at the top right centre of
Fig.1 in more detail;
Fig. 5 a high-level overview of possible components used in a neural network architecture
which is obtained by machine learning and which describes the complex system;
Fig. 6 a schematic overview of an iteration step for a machine learning process for
obtaining a neural network architecture which describes a complex system;
Fig. 7 a schematic overview of creating a graph neural network architecture by an
agent for a specific system design;
Fig. 8 a schematic overview of a reinforcement machine learning process in which an
agent is created, which is able to find a suitable graph neural network architecture
for an unknown system design.
Fig. 9 a result neural network architecture achieved by an agent trained by a machine
learning method for a system design that has not been part of the training data.
Technical Field
[0028] Software suites, i.e. collections of software available to support the design or
and configuration of complex systems, are offered for various applications such as
construction tasks, industry automation designs or chemistry. Examples at Siemens
are e.g., SimCenter
™ or TIA (totally integrated automation) portal.
[0029] These tools can be used to create a wide variety of systems ranging from hybrid vehicles
and quadcopters to factory automation systems.
[0030] Given the diversity of the application domains, it is a challenge to design a single
machine learning model that is capable of learning under all application scenarios.
The typical approach is to design a separate model for each application area, however,
since the provider of the software suit is often unaware of all the system types a
user might design this standard approach is inadequate. Furthermore, designing a separate
machine learning model for each system type is very time consuming and requires the
input from a machine learning expert.
[0031] The context of the invention comprises using
- a highly flexible network structure to represent a system design, which describes
a complex technical system such as a hybrid car, a quadrucopter, a production line
etc.
- graph neural networks (GNN) which are capable of operating on these network structures,
i.e. can extract relevant system design information.
- a neural network architecture search which can automatically construct optimal graph
neural networks for a wide variety of domains. By a domain a certain group of system
designs is meant, e.g. a group of hybrid cars or a group of production systems. In
the context of the invention a machine learning approach is proposed to speed up the
process of constructing the "optimal" graph neural network.
- a standardized language (i.e., ontology) for describing the elements of a complex
system as a knowledge graph that makes the neural architecture search more efficient.
[0032] Up to now often simulations have been used to provide, for a given system design,
a simulation environment which provides the feedback to the design process regarding
performance properties of the system.
[0033] As an addition, modeling of the complex system by a neural network has been used.
However, for each complex system, e.g., hybrid vehicle, a separate machine learning
model needs to be developed and trained by an ML (machine learning) expert.
Knowledge Graph Representation of Complex Systems (Fig.1)
[0034] A representation of complex systems by knowledge graphs and attributes is also described
in the application
EP 20191767.1 by the same inventors in particular in the description of Figures 1a (representation
of a complex system by a graph) and 1b (numerical attributes).
[0035] Investigations resulted, that in view of the diversity of complex system types, a
standard tabular representation which is used in machine learning according to the
prior art, does not provide the necessary flexibility to capture all the nuances of
the problem domain. For this reason, according to the invention, it is proposed to
model all systems as a knowledge graph composed of entities and relations.
[0036] To translate the complex system designs into a knowledge graph an ontology, i.e.,
standardized language that describes elements of complex systems and relationships
between the elements, is advantageously used. E.g., by describing all motors, electrical
components, hardware interfaces using a common language, the machine learning solution
can learn to leverage the commonalities between differing system types and more efficiently
find performant machine learning models as will be set out below with regard to the
figures.
[0037] With regard to
Fig.1 it is explained how from a complex system CS a knowledge graph KG is produced. Descriptions
of complex systems CS are translated into knowledge graphs KG using a standardized
modelling language. On the left are representations of a particular design for three
different complex systems CS.
A system design SD is then composed of a knowledge graph KG and optionally attributes
ATT, which serves as input data ID for the graph neural network architecture GNN,
see Fig.5. A system design SD can result from actual systems in the real world or
from simulated data. For a variety of systems as used for training the agent systems
designs from the real world, simulated system designs or/and combinations thereof
can be used. According to an advantageous embodiment also for an individual system
design SD part of the underlying information is taken from actual measurements and
other parts from simulations.
[0038] According to an advantageous embodiment, there is at least one group of attributes
ATT which refers to a subset of the nodes, e.g. a motor property refers only to nodes
related to the motor. Alternatively or additionally, there is a first group of attributes
which refers to a first subset of nodes and a second group of attributes which refer
to a second subset of nodes etc.
[0039] According to an advantageous embodiment the attributes ATT are combined with graph
encoded data only after the graph encoding. Thus, for attributes ATT not relating
to all nodes, these can be adequately considered. Using them already for the node
encoding would it make necessary to set the value of the attributes e.g. to 0, but
as 0 could have the meaning "non applicable" or "value is zero" this would lead to
ambiguities or sparsity issues when doing the node encoding.
[0040] The shown system type on the top is a hybrid vehicle HV with its electric and fuel
driven power train, in the middle there is a quadrucopter QC with its 4 rotors, and
on bottom there is a transmission unit TU.
[0041] These are only exemplary systems, the system could be as well a manufacturing unit,
a robot, a chemical substance or molecule, a computer system, a smart energy system
in a house etc.
[0042] The data relating to these complex systems CS are used as input data for a standardized
network model SNM at the centre of Fig.1 which uses a standardized modelling language
SML to describe the respective complex system CS with its components and their relationships.
[0043] As output data the knowledge graphs KG on the right are produced, on top the knowledge
graph for the hybrid vehicle KG_HV , at the centre the knowledge graph KG_QC for the
quadrucopter QC and at the bottom the knowledge graph KG_TU for the transmission unit
TU.. These represent the system design on the left using a standardized modeling language
SML for the elements of the design and their relations. These elements can be e.g.,
motor types, electrical components, hardware interfaces etc.
[0044] A knowledge graph KG depicts nodes and edges between the nodes, the edges can be
unidirectional or bidirectional. A node may represent an element and an edge a relation
between elements. This will be further explained in relation with Fig. 4.
[0045] By a knowledge graph KG the data structure of the complex system CS is described.
The data structure is a formal representation of the engineering specification, which
may be provided by a customer, e.g. a car manufacturer who needs in return a description
of a specific design for producing prototypes, real car components, cars etc. This
multi-relational engineering specification comprises heterogenous information about
components, component properties and how and which of the components are related.
From this specification, nodes and further information describing the nodes, e.g.,
a type of the node or an attribute of the node and the edges, i.e. connections between
source and target nodes, can be extracted, e.g. by using graph regressors, and form
the knowledge graph, which serves as input data for a graph neuronal network GNN,
e.g. a graph convolutional neural network.
[0046] Fig 2 depicts the standardized network model SNM (centre of Fig. 1), in which the
system design of the complex system CS is described using a standardized modeling
language SML in more detail.
[0047] In
Fig. 2, as an example, direct relations between the axles with other components of the vehicle
such as gearbox, torque coupler are shown as well as indirect relations, i.e. relations
via another component to the motor or generators. The squared units at the edges of
the components denote a possible way of relation, e.g. electrically or mechanically.
[0048] In
Fig. 3 attributes of some of the components of
Fig. 2 are shown as rectangular boxes. These attributes denote numerical values of a component,
such as the number of front or rear tires in the component "vehicle", the mass or
the number of axles. In the shown example all of the components have attributes.
[0049] These relationships and attributes are used when translating or encoding the description
of the complex system CS into a knowledge graph KG.
[0050] Knowledge graphs KG have, e.g., the following advantage: The usual representation
of an engineering design is a table. The disadvantage of tables is that complex relations
thus as multi-relations cannot be captured. Another disadvantage is that a table,
if it should be used as input data, needs always to have the same structure, e.g.
the same number properties, e.g. that the number describes the number of axles needs
e.g. to be contained in a column. In this case the number of columns needs then to
be the same.
Hence, the graphical representation of the technical system, e.g. the hybrid car,
is much more flexible, e.g. if the number and types of components vary, do not require
an ordering which would lead to permutation variant representations and moreover can
contain the plurality of relations that exists. Further details are described with
relation to Fig.3 of application
EP 20191767.1 by the same inventors. Fig.3a and 3b of
EP 20191767.1 show an example of isomorphic data structures. For the machine learning it is important
that identical or isomorphic structures do not lead to different input information.
The advantage of applying the proposed graph convolutional formulation is that it
is permutation invariant, as long as the node features are not encoded with the node
ID. This is overcome by using the node type, e.g. motor, battery, etc. Then e.g. in
a matrix H
(0), i.e. a matrix before application of a Graph Convolutional neural network, all features
of e.g. node 1 would be grouped together in a column, but it would not be required
e.g. that it is column 1. As said before, from the data topology or structure a sort
of adjacency matrix
à describing the link structure of the data architecture is derived and enters as input
value the GCNN. When deriving a matrix from the data structure a numbering of nodes
has to be introduced in order to put e.g. connections starting from node 1 to other
nodes in row 1 and connections leading to node 1 in column 1 and so forth for columns
2, 3, 4 and 5. If the numbering would be changed, the result would be a different
representation, i.e. a different matrix. In other words, by permutations different
matrixes are obtained which describe the same data structure. This fact hampers the
machine learning process, because all the possible permutations which can be a high
number would have to be used as training data. Therefore, a permutation invariant
representation of the data structure is used.
[0051] To obtain such a permutation invariant representation of the data structure, the
attributes of each node are used. By adding the attributes to individual nodes, they
are made different from each other so that they are not exchangeable anymore. Then,
for two isomorphic designs, even if the node orderings are different, the machine
learning model recognizes the designs as identical since the graph convolutions are
designed to be permutation invariant.
[0052] In Fig. 3a and 3b of
EP 20191767.1 two node structures are depicted. As said, if such a node structure is represented
by a matrix, an ordering has to be given to the nodes, e.g. each node is given a number.
The node structures shown in Figure 3a and 3b of
EP 20191767.1 are isomorphic, i.e. unambiguously reversible, just the nodes are numbered differently.
By adding the attributes a structure as shown in Fig. 3 a and b of
EP 20191767.1 is not any more represented by the same matrix as the structure in Fig. 3b of
EP 20191767.1, because the nodes are not interchangeable any more, e.g. node 5 is not the same
as node 3, because it has e.g. different features, e.g. node type, attributes, ports
etc.
[0053] In
Fig. 4 the knowledge graph of the exemplary hybrid vehicle HV in Fig.1 top on the right
side is shown.
[0054] In the graphical representation there are nodes, which describe components, assets
and ports and which are identified by a node type. The root or central node or architecture
node HV describes a specific architecture.
[0055] For a planned hybrid vehicle HV, various data architectures can be created so that
different embodiments of the vehicle are described which differ e.g. in one or more
components.
[0056] The specific hybrid vehicle HV has several components, e.g. motor 0 MO and motor
1 M1, battery 0 B0, vehicle chassis V0 and internal combustion machine ICE 0.
[0057] In
Fig. 4 these components are depicted with a circle and are directly connected by an edge
to the root of the graph representing the hybrid vehicle HV. Optionally, there are
some components that are not varied during the current task, e.g. number of tyres,
axles, gearbox etc. In Fig. 4 such invariable components or "assets" are depicted
with a square.
[0058] In the following invariable components or assets and components are referred to as
"components".
[0059] A component has one or more ports across which a relation to other components is
established, e.g. electrically, mechanically, e.g. rotationally or via the chassis,
via a specific throttle of the internal combustion machine etc. These possible relations
via the ports are depicted by a triangle and form again nodes in the data topology.
[0060] A port represents a facility for an, e.g., mechanical or electrical interaction.
Each edge represents a correlation between the two connected components, e.g. a mechanical
coupling constant between two chassis parts or torque coupler between an internal
combustion engine and a front or rear axle, an electromagnetic coupling between components
of an electric motor.
[0061] These relations may lead via one edge from one component to another. This is depicted
in Fig. 4 between motor_1 and motor 0 with one edge via their port for rotational
interactions.
[0062] The components, invariable components or assets, and ports constitute nodes of a
data topology centered around a root node denoting a specific architecture of the
technical system, e.g. the hybrid vehicle. The nodes of the data topology are connected
by edges denoting a correlation between two nodes. This correlation can be unidirectional
or bidirectional.
Application of Graph Neural Network for Performance predictions in complex systems
described by knowledge graphs (Fig.5)
[0063] Based on the knowledge graphs KG, performance predictions should be made using a
suitable graph neural network architecture.
It is an object of the invention to obtain such a suitable neural network architecture
by using a machine learning method.
[0064] In
Fig. 5 a high-level description of the elements used for designing a suitable network architecture,
which is capable of predicting a performance characteristic of a complex system, is
shown. This design is done by machine learning and results in a suitable graph neural
network GNN architecture.
[0065] The shown suitable graph neural network GNN architecture for describing a complex
system CS is advantageously comprised of a node encoding module NEM, into which the
input data ID are fed.
[0066] The input data ID is the knowledge graph KG. Optionally the knowledge graph KG contains
attributes ATT to individual nodes. By knowledge graph KG and optionally the attributes
ATT a specific system design of a complex system can be described.
NEM (Node Encoding Module)
[0067] Data obtained from the node encoding module NEM are fed into a graph encoding module
GEM. The thus processed data then enter an output module OM.
[0068] The data obtained from the node encoding module NEM (NEM data) represent "low level"
features, i.e., features or properties solely referring to a specific node. In other
words, everything relevant for the identity of a specific node in the given complex
system is contained. For example, the NEM data may represent a motor with its weight,
electrical or mechanical connection possibilities. In the example of the quadrucopter
QC, e.g., it may further represent rotational speed and direction, bus ports. In the
example of a transmission unit TU further connections to gearset, brake or clutch
may be considered. In other words, the intermediate graph neural network GNN learns
vector representations for all nodes which capture the structural identity of each
node (e.g., motors, batteries etc.) by encoding adjacency information.
[0069] In another example of industrial automation, it may represent a specific robot in
a production line and its properties. In another example of material science, it may
describe properties of an individual molecule.
GEM (Graph Encoding Module)
[0070] The data obtained from the graph encoding module GEM are referred to as GEM data
and represent information on the overall system, e.g. the effect of connections between
various nodes.
In the example of the hybrid vehicle HV it may represent oscillations that travel
over the whole vehicle due to the various masses of the components motor, battery
and the couplings strengths, e.g. stiffnesses, of the relevant connections. In the
example of the quadrucopter QC it may represent the interaction between the four motors
and the shape of the wings, so that e.g. impacts on the direction it moves can be
deduced.
[0071] In another example of industrial automation, it may represent the impact a robot
at the entry of the production line may have on a further processing device somewhere
else in the production line. In another example of material science, it may represent
a property of a substance composed of various molecules as a whole (and not of single
molecules), e.g. its viscosity.
[0072] In the shown example of graph encoding module GEM a Pooling P and a combination of
these data and attributes ATT takes place.
OM (Output Module)
[0073] As said, the data obtained from the graph encoding module may be optionally pooled.
The data cannot be directly interpreted, i.e. do not show directly e.g. a physical
or technical or chemical meaning. What is done in the output module OM is to extract
a useful indicator from the graph encoded data, which are optionally pooled. Therefore,
the multi-dimensional encoded information is transformed to e.g. a continuous numeric
output or vector, w. For this advantageously a so called "dense layer" is used which
can be regarded as a "feed forward neural network" or multilayer perceptron.
[0074] In the output module OM the graph encoded data, which are optionally pooled, as shown
in Pooling P, are transformed such that the dimension is reduced. E.g. they are used
as input for dense layers DL. The dense layers DL can be realized by a multilayer
perceptron and reduces the number of dimensions of the (optionally pooled) graph encoded
data such that a vector denoting the searched indicator KPI is achieved.
[0075] According to advantageous embodiment pooling has been done and for the dense layers
DL the following activation function is used:

wherein y is the output of the dense layers, hence the sought for indicator, e.g.
an acceleration versus fuel consummation.
Xpool is the pooled graph encoded data.
[0076] Wh is the weight of layer h or in other words the set of parameters or weights of the
connections or edges used in a hidden layer of a neural network for the edges from
one layer to the node of another layer. An entry of this vector can be the weight
of an edge leading from a node in layer h to another node in a different layer.
Wout is the weight of the output layer, i.e. describes the weights of the edges leading
from the last hidden layer to the output layer.
Wout and
Win are both learned in the machine learning process.
[0077] Relu (rectified linear unit) is an activation function for the dense layers, which
is used in an embodiment and performs well for the described example where indicators
KPI are derived for a hybrid car architecture or another technical system such as
transmission unit or a flying object, e.g. a quadrucopter.
[0078] For each of node encoding module NEM, graph encoding module GEM, and output module
OM exist many possible options, i.e. different realisations.
Variations in the Node Encoding Module of a Graph Neural Network
[0079] As an example, the node encoding module NEM may also be comprised of several graph
convolutional neural networks GCNN applied one on the output of the previous GCNN
as depicted in
Fig. 6 with a first graph convolutional neural network GCNN1 and a second graph convolutional
neural network GCNN2. The described model learns vector representations for all nodes
which capture the structural identity of each node (e.g., motors, batteries etc.)
by encoding adjacency information.
[0080] Additionally or optionally, the node encodings NE emerging from the respective GCNN,
i.e. first graph convolutional neural network GCNN1 and second graph convolutional
neural network GCNN2, can be concatenated CC in a predefined manner.
[0081] According to an advantageous embodiment by the concatenation the focus is set differently
for different nodes. This is done in that for nodes at the edges of a layer 2
nd order or higher orders are considered, whereas for nodes at the centre only first
order relations are considered or vice versa. By the concatenation then neighbored
nodes can be considered differently for individual nodes.
[0082] Further exemplary node encoding module options are
- Standard graph convolution where a convolutional operator is used, e.g.

wherein H is the representation of the nodes, 1 is a running variable denoting the
number of latent dimensions in the graph convolutional neural network or the convolutional
layer of the graph convolutional network. For 1=0, H(0) represents node features, e.g. the type which might be e.g. "component" or the number
and type of ports. H is iteratively updated and then represents for values 1>0 also
relations between the nodes.
σ is a sigmoid function which is used as an activation function of the GCNN.
The matrix
D̃-1 is used for normalization and can be derived from the input and a diagonal matrix.
à is a matrix reflecting the topology of the data structure. E.g.,
à is an adjacency matrix which describes the connections between one node and another
node for all nodes in the graphical representation, hence it represents essentially
the link structure.
W(l) is a parameter or weight denoting the strength of a connection between units in the
neural network. The advantage of this convolutional operators is its basic form. The
aggregation, i.e. gathering of information relevant for one specific node, is based
on mean values.
[0083] Alternatively, other convolutional operators can be used that are tailored for a
specific problem, e.g. for a description of vibrations in the chassis or also for
a process where gases or liquids or piece good are produced by various transformation
along a production line.
- Graph convolution with attention mechanism: Generally spoken for the network architecture graph convolution layers are stacked,
i.e. there are several hidden layers, i.e. layers between input and output layer.
Specific nodes in these stacked layers, e.g. nodes of one layer, can attend features
of neighboured nodes, e.g. in the same or in different layers. This has the effect,
that different weights to different nodes in a neighborhood can be assigned, without
requiring any complex calculations, e.g. matrix operations such as inversion or further
knowledge on the graph structure.
- Self loops or concatenation: As shown in Fig.6 several graph convolutional neural networks may be used and the output thereof may
be concatenated in a prescribed way. Alternatively, instead of or additionally to
using more than one graph convolutional neural network, also the output of a graph
convolutional neural network may again be fed into the same graph convolutional neural
network, i.e. self-looped, and then concatenated.
- Aggregation function (sum, mean, max): By aggregation the gathering of information relevant for one specific
node is meant, see above. This may be based on mean values, sums of values or by taking
the maximum values.
- Number of layers: The number of hidden layers may be varied depending on the specific problem. According
to an advantageous example, it would be started with one layer, many problems can
be modelled using two hidden layers.
- Dropout layers: Dropout works by randomly setting the neurons of a hidden layers to 0 during training.
This prevents neurons from co-adapting too much and thus overfitting the model, i.e.
describing it with more independent variables than actually needed. This is described
in more detail by Srivastava et al. in the article "Dropout: A Simple Way to Prevent Neural Networks
from Overfitting" which is published in Journal of Machine Learning Research 15 (2014),
1929-1958.
- Jumping knowledge layers: This addresses the problem that the range of "neighboring" nodes that a node's representation
draws from strongly depends on the graph structure. To adapt to local neighborhood
properties and tasks, by jumping knowledge layers, for each node, different neighborhood
ranges can be flexibly set to enable better representation of the complex system.
Variations of the Graph Encoding Module in Graph Neural Networks
[0084] With respect to the graph encoding module for the example shown in
Fig. 5 the pooling P can advantageously done as maxpooling, i.e., considering only the strongest
values. Alternatively, pooling P can be done that only values close to a mean value
are considered etc., see below. Also, with regard to the combination of data, various
combination schemes between attributes ATT and results of the pooling can be applied.
According to the shown example the data that are output from the pooling P are just
concatenated with the attributes.
[0085] Exemplary graph encoding module GEM options are
- Simple pooling (sum, mean, max):
By applying pooling P on the data, the dimension of the data is reduced. Thus, by
the pooling P the node representations for a variable number of nodes are compressed,
in particular to a predefined size that can be processed further. Thus, an independence
of the number of input nodes, which may vary, is achieved. A further advantage of
the pooling is that a focus can be made on encoded information that is of particular
interest and which might- due to encoding also information of distant nodes otherwise
be disregarded.
[0086] As said, an advantageous pooling algorithm is to consider only maximum entries in
a certain area, e.g. a row or line or sub-matrix, i.e. a matrix having lower dimension
than the complete matrix, e.g. a 2x2 matrix out of a 16x16 matrix. This allows to
focus on the most prominent values which often have the largest influence on relations
between components.
[0087] According to an embodiment a pooling algorithm is used that

[0088] Where x is a vector of the concatenated data matrix and r is the representation resulting
from the pooling P. i is the number of dimensions pooled data can assume. N
i is the number of rows of the concatenated data obtained in the concatenation. Thus,
by the pooled data a representation is obtained that is independent of the original
number of nodes, which depends on the technical system to be investigated and on what
or which indicator in the technical system is of interest.
[0089] According to an advantageous embodiment the
[0090] In this embodiment a maximum pooling considering only the maximum value is done.
As was said, alternatively, other pooling methods such as taking a mean value or the
sum of the regarded entries can be taken.
- Hierarchical pooling, wherein the pooling is performed according to a defined relation, e.g. on the amount
of information a node gives to a neighbour.
- Attention based pooling: Therefore, attention weights are introduced for individual nodes in order to keep
the most relevant information when pooling is performed.
Variations of the Training in Graph Neural Networks
[0091] The result of the training should be a neural network architecture that sufficiently
good describes the relevant portions of the complex system CS so that the desired
predictions can be made.
[0092] As an example, for the training of a graph neural network GNN that models a new car,
e.g. hybrid car, available data sets from previous hybrid cars are taken. The data
sets may be obtained from measurements or simulations. For a training a predefined
first sub-set of the data set may be taken. To decide on the successful completion
of the training a second sub-set of the data set, which has not been used for the
training is taken. It is then investigated whether the graph neural network can predict
also the indicators of the second sub-set correctly.
[0093] For the training there are training specific parameters, which need to be specified,
which are e.g.:
- Batch size: The batch size is a hyperparameter that controls the number of training samples before
the model's internal parameters are updated, i.e. before the intermediate graph neural
network is modified.
- Learning rate: The learning rate controls how quickly the graph neural network is adapted to the
specific system design. If the learning rate is small, then there are only small changes
and more training epochs are required. Larger learning rates effect rapid changes
and require therefore fewer training epochs.
- Stopping criteria: It has to be decided, when the training of a neural network is stopped. What is advantageously
used as stopping criterion is the following: The available training data are separated
in different sets, at least one set for the training and a disjunct validation set
for the testing how good the predictions made by the created intermediate graph neural
network are. With ongoing training one will experience an increase in the quality
of predictions made for the validation until a certain point, when a decrease in the
quality will start, meaning that the training data set is overfitted. This point can
be used as stopping criterion.
- Number of epochs: Is a hyperparameter, that controls the number of complete passes through the training
data set. For example, it is assumed that there are 1000 samples (i.e. 1000 lines
of training data samples), the batch size is 5 and there are 500 epochs. Then the
data set is divided in 1000/5 = 200 batches, and the model will be updated 200 times.
As there are 500 epochs, the intermediate graph neural network will go through the
whole data set of 1000 samples 500 times. This results in a total of 1000*500 batches
= 500.000 batches during the entire training process.
Variations of the Output Module in Graph Neural Networks
[0094] With regard to the output module there are, e.g., the options:
- Number of layers of the output model
- Dropout layers (see above)
- Number of hidden units, i.e. neurons per layer
[0095] To determine the ideal graph neural network GNN architecture, according to one aspect
of the invention is proposed to train a reinforcement learning agent to choose among
the possible options.
[0096] By "ideal" a graph neural network architecture is meant which achieves prediction
results for the specified problem lying within the specified boundaries, e.g., accuracy,
training time etc.
[0097] Since the number of possible options for node encoding, graph encoding and output
module is extremely large, searching over all possibilities how the GNN architecture
is modified is practically infeasible. Random search will also lead to inadequate
results as it is not informed by any of the commonalities between system types and
would spend lots of time exploring architectures that perform poorly. By applying
the concept of the invention, these problems can be overcome.
Reinforcement Learning of the Agent (Fig.7 and 8)
[0098] For this reason, according to one aspect of the invention, reinforcement learning
is used to train the agent. In this process a reward signal provides feedback, whether
an actual modification was useful for modeling a specific complex system CS. In other
words, the modifications are conditioned on the properties of the complex system so
that it can more efficiently and thoroughly explore highly performant architectures.
[0099] The procedure of how the agent A learns a policy is detailed in
Fig. 7, where it is shown for one specific system design which is entered as agent input
data AID. Once the agent is trained, the process shown in Fig.7 is applied to unknown
system designs and the agent applies the learnt policy, e.g. changes the network architecture
in step ACT in an advantageous way.
In Fig. 7, the agent A receives as agent input data AID a description GNN AD of a
start GNN architecture and the system description of the complex system CS as knowledge
graph KG. These agent input data are taken out of a variety of start GNN architecture
and a variety of system descriptions, see Fig. 8. Alternatively, the agent A starts
with a fixed start GNN architecture, e.g. a very simple one.
[0100] "Agent" A denotes a computer program which is able to act autonomously within a well-defined
range. The agent A in the context of the invention may decide on the various options
with regard to node encoding module NEM, graph encoding module GEM and output module
OM and training parameters, as set out above. Advantageously it may further decide
on a starting architecture by consulting previously solved problems instead of using
a predefined starting graph neural network. It may also decide on when an architecture
is good enough, e.g. that the predictions are within the desired range. This range
may be preset or defined while working on the problem depending on e.g. the work progress.
[0101] The agent A has a "policy" which will be explained further below. The policy determines
how it should modify the GNN architecture to try and improve prediction performance.
According to this policy, the agent A performs actions ACT on the GNN architecture.
The exemplary actions ACT shown in Fig.7 are to remove a graph convolutional layer
GCN and instead add a graph attention layer GAT.
[0102] In a training and evaluation step T&E, the new GNN architecture is trained with training
data for the specific complex system CS provided to the agent A and prediction performance
for this complex system CS is calculated. In the shown example, the change the agent
made has improved performance and therefore receives a positive reward R.
[0103] The reinforcement learning agent A generates a string that represents a GNN (graph
neural network) architecture. For example, the string "16 GAT mean dropout 32 GAT
mean" for the node encoder represents an architecture with two graph convolutional
layer with attention using mean aggregation and a drop out layer in between. The output
dimension of the first layer is 16 and the second is 32. The numbers 1 in the second
last box from bottom and 16 in the top box mean the dimensionality of the particular
neural network NN layer. Linear means that in the respective layer a "linear transformation"
takes place, i.e. a matrix multiplication without non-linear activation function.
[0104] For each GNN architecture the agent A produces, the prediction performance of the
GNN architecture is computed.
[0105] By prediction performance is meant how well the indicators, e.g., key performance
indicators, of a complex system CS of interest can be predicted with relation to actual
data, i.e. data gained from a real world complex system such as a hybrid car. Alternatively
to actual data, also data obtained from a simulation can be taken as reference value.
[0106] As an example, the prediction performance may be a difference between actual or simulation
data and data produced by the current GNN architecture or any other sort of error
function that indicates the strength of the deviation.
[0107] The reward for the agent A is defined as the performance improvement gained by modifying
the GNN architecture. Positive rewards if the performance is improving and negative
if the performance is decreasing. In the shown example in Fig. 7, the reward is positive
because the performance has improved, i.e., the new intermediate graph neural network
architecture can better model the actual system design.
[0108] Performance as a reward is not limited to the actual learning task performance, but
could e.g., also take into account the memory-footprint or the training time of the
GNN architecture.
[0109] For example, the reward may be the difference between a previous error function from
a previous GNN architecture and the current architecture.
[0110] According to another example, alternatively or additionally to that difference the
training time of the respective GNN architecture may enter the reward function, e.g.
(current error function *(training time current system/average training time)-previous
error function(training time previous system/average training time).
[0111] As said above, the agent A is according to an embodiment provided with an initial
baseline or starting GNN architecture as part of the agent's A input data AID and
then produces actions ACT based on a learned policy to iteratively modify the elements
of the GNN architecture.
[0112] By learned policy is meant that in the course of setting up intermediate GNN architectures
the agent has discovered which combinations of the options in the different modules
meet best a specific problem or system design described by the knowledge graph KG
of the complex system and hence result in an improvement of prediction performance
and consequently positive reward. The agent may then fall back on the experiences
made with previous GNN architectures.
[0113] This training process which results in the agent's policy and which has been also
been explained with regard to Fig.7, is also depicted in
Fig. 8.
[0114] In a first step 1, a set of different system designs is provided as part of the agent's
A input data AID.
[0115] In a second step 2, the agent samples, i.e. takes, one system design at a time, which
is fully or partly described by a knowledge graph KG and optionally a start GNN architecture.
Advantageously the system designs refer to a similar technical area, e.g. vehicles
with a similar drive/propulsion unit if e.g. the desired indicators are in the context
of the drive/propulsion function. One advantage is that the system designs SD comprising
knowledge graphs KG and optionally the attributes ATT, which may apply only for individual
nodes, e.g. motor properties, can be derived not only from real world data but also
from simulation data. By the reliant transfer learning of the agent thus costly real
world experiments can be reduced.
[0116] In a third step 3 the agent starts the design process for the GNN from the start
GNN architecture and the sampled system design.
[0117] Then the following loop is performed: In a fourth step 4 the agent performs actions
which alter the basic or start GNN architecture.
[0118] The thus obtained intermediate GNN architecture is in step 5 trained and the predicted
indicators for the sampled system design achieved with this intermediate GNN architecture
are evaluated with respect to the actual, e.g., measured, values of this or these
indicator(s).
[0119] Depending on the amount of improvement in this cycle or loop, i.e. how much better
the predictions were with this intermediate GNN architecture in comparison to the
previous GNN architecture obtained in the previous cycle, a reward function is calculated
in step 6.
[0120] Then, at least one iteration takes place until an intermediate GNN architecture is
obtained, that meets predefined requirements such as prediction accuracy of the indicators
or required length of the training.
[0121] In order that the agent A acquires a policy this process repeated for all or a subset
of the system designs.
[0122] After having acquired a policy the agent A can provide for an unseen system design
USD, i.e. a system that has not been part of its training, an output GNN architecture
that can sufficiently well model the unseen system design USD, i.e. make appropriate
predictions for indicators.
[0123] Then the intermediate GNN architecture becomes the output architecture. This result
is depicted in Fig. 9, where the agent provides for a system design the suitable output
GNN architecture OA.
TRAINING
[0124] The agent is thus trained to generalize to different types of systems described in
the standardized data model. This is done by randomly sampling a system for each episode
the RL agent is trained.
[0125] After each iteration, the GNN architecture is trained, performance is evaluated,
and a reward is provided,
see Fig. 8. The agent is able to observe the current state of the GNN architecture, and also
the graph representation of the complex system
see Fig. 8.)Since the complex systems are all described using the same standardized ontology
, the agents policy can more easily generalize between the different system types.
This leads to faster convergence to highly performant GNN architectures.
[0126] This process is usually carried out many thousands of iterations to train the agent
to make good decisions about how to modify the GNN architecture such that prediction
performance for a wide variety of system types improves.
[0127] As a result, the agent A provides a GNN architecture that should be "near" optimal
architecture output with regard a system design, that has not been trained. In other
words, the policy learnt allows the agent to provide GNN architectures for unseen
systems and thus transfers what was learnt during the training of the agent when receiving
unseen system input USD, see
Fig. 9.
Advantages of the invention
[0128] One primary advantage is that a machine learning expert is not required to design
new machine learning models for each possible system type. This reduces the time to
develop a machine learning solution. Moreover, better machine learning solutions can
be achieved, as much more relations in the provided set of system designs can be explored.
In other words, without this technology, a solution may only be able to roll out this
kind of fast feedback loop for commonly seen system types, i.e. by the invention solutions
can be found that have not been found before, as the respective parameter regimes
have not been investigated, e.g. because relations were unknown.
[0129] In consequence, engineers can receive fast and mostly more comprehensive feedback
on their designs regardless of the type of system they are designing. Depending on
the application domain this can mean different things. In the space of vehicle design,
that could lead to overall, e.g., more fuelefficient designs, long ranges for electric
vehicles, or less product failures in the field.
Other Embodiments
[0130] Although the present invention has been described in accordance with preferred embodiments
or aspects thereof, it is obvious for the person skilled in the art that modifications
or combination between the embodiments, fully or in one or more aspects, are possible
in all embodiments.
[0131] The proposed methods can be performed by a computer, computer system, embedded computer
or any electronic circuit.
[0132] Parts of the description have been presented in terms of operations performed by
a computer system, using terms such as data and the like, consistent with the manner
commonly employed by those skilled in the art to convey the substance of their work
to others skilled in the art. As is well understood by those skilled in the art, these
quantities take the form of electrical, magnetic, or optical signals capable of being
stored, transferred, combined, and otherwise manipulated through mechanical and electrical
components of the computer system; and the term computer system includes general purpose
as well as special purpose data processing machines, routers, bridges, switches, and
the like, that are standalone, adjunct or embedded.
Additionally, various operations will be described as multiple discrete steps in turn
in a manner that is helpful to understand the present invention. However, the order
of description should not be construed as to imply that these operations are necessarily
order dependent, in particular, the order of their presentation.
[0133] Reference in the specification to "one embodiment" or "an embodiment" means that
a particular feature, structure, or characteristic described in connection with the
embodiment is included in at least one embodiment of the invention. The appearances
of the phrase "in one embodiment" in various places in the specification are not necessarily
all referring to the same embodiment.