Field of the invention
[0001] The invention relates to a method to filter a graph stored in a storage device.
Background of the invention
[0002] Networks, in particular directed networks, encode the flow of a system parameter
along multiple entities. One example for a network is a supply chain network, wherein
the system parameter may be a material or a planned distribution or a combination
thereof. Another example for the network is a process network, wherein the system
parameter may be an output of a process step, such as a material or a business document
or any combination thereof.
[0003] In a supply chain network, it is crucial to understand how materials and/or goods
are flowing and/or how material is planned to be distributed. Material flows and/or
planned distributions in supply chain networks fuel various use cases, for instance,
estimating which finished goods might be affected due to a raw material shortage or
calculating the carbon foot print of a product based on its components, their respective
components and so on. These networks can comprise an arbitrary number of plant and
material combinations, which are denoted stock keeping units (SKUs) in the following.
[0004] In a typical scenario of a supply chain network, several SKUs are involved in multiple
layers representing different stages in a production phase as well as in the distribution
of goods. In practice, both the production and the distribution are often scattered
over multiple manufacturers. The layers of SKUs can range from raw materials over
partially produced goods to the finished good itself. Due to the various layers between
raw materials and the finished good there are usually many SKUs involved which span
the supply chain network.
[0005] Likewise, single processes, such as business processes (e.g., an order process) or
technical manufacturing processes (e.g., processes executed in the SKUs of a supply
chain network) are in practice often part of a (large) process network. Hereby, processes
are executed in a computer system or with the aid of a computer system and may comprise
several process steps. The execution of a process is called a process instance. Each
process step may create data during the execution, which is stored in the computer
system, in which the process is carried out or with the aid of which the process is
carried out.
[0006] It is known to store process instances along with their process steps in a process
protocol, from which single process instances may be analyzed efficiently using classical
process mining techniques. However, in a realistic scenario, instances of different
processes can be hardly considered isolated from each other. Instances of different
processes are rather connected in a process network. In a process network, however,
classical process mining fails to analyze the interactions between process instances
of two or more different processes. In practice, process networks are wide spread
as most organizations run not just a set of independent sequential processes but rather
comprise many processes which are interacting with each other and with processes of
other organizations.
[0007] Technically, a network, such as the supply chain network or the process network,
may be represented using a graph. A graph usually consists of nodes and edges, wherein
the nodes may represent entities of the respective network and the edges interactions
between entities in the respective network.
[0008] In practice, the nodes (entities) are usually stored in a disconnected manner. For
instance, in a supply chain network, the SKUs are usually organized in different (relational)
data tables. In process networks, single processes are stored independently of each
other in a process protocol.
[0009] In practice, the landscapes of interconnected SKUs and/or interconnected processes
in modern networks change constantly. Therefore, traditional approaches that would
either manually connect one SKU to another, (e.g. via joining tables in a relational
database), or manually connect one process instance to another, are highly inefficient
and prone to errors. Thus, usual approaches for storing a graph representing a modern
network, in particular a supply chain network and/or a process network, may directly
lead to incorrect reconstructions and subsequent analysis of such a network.
[0010] Due to the large amount of data involved in such a network, the traditional approaches
for storing a graph representing the network furthermore often leads to timeouts since
a manual connection of nodes in the graph, representing entities in the network, are
computationally very demanding.
Object of the invention
[0011] Therefore, it is an object of the present invention to provide a method for filtering
on a graph representing a network, in order to efficiently extract nodes that are
interconnected to a selected node.
Solution according to the invention
[0012] Accordingly, a computer-implemented method for extracting a subgraph of a graph,
the subgraph starting from at least one selected node into a selected direction, is
provided. The graph represents a network and comprises multiple nodes and multiple
directed edges. Each directed edge connects a start node and an end node wherein each
directed edge is composed of an incoming edge which is connected to the end node and
an outgoing edge which is connected to the start node. Each node represents an entity
of the network. Each directed edge represents a relationship between two entities.
[0013] The method comprises:
Recording each start node in a first record of a first data structure stored with
a storage device, and each end node in a second record of the first data structure.
[0014] Each record comprises at least:
- a combination of a number of first attributes, in which an identifier of a node is
stored,
- a combination of a number of second attributes, in which an identifier of an incoming
edge is stored,
- a combination of a number of third attributes, in which an identifier of an outgoing
edge is stored.
[0015] The method further comprises storing, in the first record, the identifier of the
start node in the combination of the number of first attributes and a unique relationship
identifier in the combination of the number of third attributes. The unique relationship
identifier represents a step along a path in the network. In other words, the unique
relationship identifier represents an interaction between two entities of the network.
[0016] The method further comprises storing, in the second record, the identifier of the
end node in the combination of the number of first attributes and the unique relationship
identifier in the combination of the number of second attributes.
[0017] A value of the combination of the number of second attributes of the second record
matching the value of the combination of the number of third attributes of the first
record defines the directed edge between the start node and the end node. The subgraph
is extracted according to a traversal of the graph starting from the at least one
selected node into the selected direction according to a predefined graph traversal
protocol.
[0018] The method according to the invention has the advantage that a graph representing
a network can be filtered without an à-priori knowledge of which nodes are adjacent
to which node, i.e., of the neighborhood of nodes. The first data structure can be
recorded with nodes from different individual raw data sources according to a predefined
rule for creating the unique relationship identifier assigned to the incoming edge
or outgoing edge the nodes are connected to. At the time of recording the first data
structure the directed edges of the graph are not yet defined. The directed edges
and with those the graph as such only emerge from the recorded first data structure,
namely from finding identical unique relationship identifiers assigned to the second
attribute of the second record and the third attribute of the first record. Based
thereon, the emerged graph can be filtered using any graph traversal protocol. Furthermore,
subgraphs can be determined in real time, independent of the number of nodes.
[0019] Traditionally, a graph is first determined, i.e., its structure is to be known and
fixed accordingly, before the graph can be stored in a storage device and filtered
subsequently. According to the invention, in contrast, only the data which describes
the entities in the network is recorded in the first data structure, without knowledge
of the relationships within the network. A further technical advantage is that predecessor-successor
relationships of nodes (i.e. between the nodes) do not need to be explicitly stored,
which significantly reduce the amount of data. The graph representing the network
eventually results from the first data structure.
[0020] Filtering the graph by extracting subgraphs of any size from the graph which emerges
from the first data structure enables the efficient analysis of network effects even
in very large networks with thousands (and more) of interconnected entities, since
the number of interactions (direct edges between two entities) can be strongly reduced.
[0021] Further, the filter may be used for analyzing the graph representing the network
by means of a windowing technique, such as a sliding window, a tumbling window, etc.
In doing so, average network performance measures can be determined and compared along
different regions the network.
[0022] The graph to be filtered can be adapted in a very flexible way by creating, updating
or deleting records of the first data structure. Similarly, insights on the graph
can be updated flexibly by using the method according to the invention to filter constantly
adapted graphs due to (rapidly) changing raw data.
[0023] The predefined graph traversal protocol is one of a group of algorithms, the group
consisting of:
- a breadth-first search for records comprising nodes which form directed edges with
the at least one selected node along the selected direction,
- a depth-first search for records comprising nodes which form directed edges with the
at least one selected node along the selected direction, or
- a combination thereof.
[0024] The breadth-first search comprises the following steps:
- a) identifying the at least one record of the at least one selected node based on
the selected direction and marking one record of the at least one selected record
as a current member of the subgraph,
- b) matching the value of the combination of the number of second attributes of the
second record with the value of the combination of the number of third attributes
of the first record to find at least one record comprising the node which forms the
directed edge along the selected direction with the current member of the subgraph
and marking the at least one found record as member of the subgraph,
- c) repeating step b) for every record found in step b), wherein with repeating step
b) each found record is marked as current record, until no further record is found
and marking all found records as member of the subgraph,
- d) repeating steps b) and c) for every record of the at least one selected records,
and
- e) extracting the records marked as member of the subgraph from the first data structure
to store the subgraph with the storage device.
[0025] The depth-first search comprises the following steps:
- a) identifying the at least one record of the at least one selected node based on
the selected direction and marking one record of the at least one selected record
as a current member of the subgraph,
- b) matching the value of the combination of the number of second attributes of the
second record with the value of the combination of the number of third attributes
of the first record to find at least one record comprising the node which forms the
directed edge along the selected direction with the current member of the subgraph
and marking the at least one found record as member of the subgraph,
- c) repeating step b) for one record found in step b), wherein with repeating step
b) the one found record is marked as current record, until no further record is found
and marking all found records as member of the subgraph,
- d) repeating steps b) and c) for every record found in step b),
- e) repeating steps b), c) and d) for every record of the at least one selected records,
and
- f) extracting the records marked as member of the subgraph from the first data structure
to store the subgraph with the storage device.
[0026] The selected direction is one of a network upstream direction and a network downstream
direction, wherein the at least one record of the at least one selected node, in which
the unique relationship identifier is assigned to the combination of the number of
second attributes, is identified for the network upstream direction, and the at least
one record of the at least one selected node, in which the unique relationship identifier
is assigned to the combination of the number of third attributes, is identified for
the network downstream direction.
[0027] In one embodiment, the network is a process network, wherein the process net-work
comprises two or more process instances of different processes and interactions between
process steps of process instances of at least two different processes. Each node
represents a process step of a process instance. The unique relation-ship identifier
represents a signal between the start node which forms part of a process instance
of a first process and the end node which forms part of a process instance of a second
process, in particular an output of the start node provided to the end node. The data
structure further comprises a fourth attribute, in which a sequence of the process
steps within a process instance is stored, such that the data structure forms an extended
process protocol.
[0028] Preferably, the fourth attribute stores an ordinal value related to the respective
process step. A record of the at least one found record is only marked as a member
of the subgraph if the ordinal value assigned to the fourth attribute of the record
is larger than the ordinal value of the current member of the subgraph.
[0029] The ordinal value can be a timestamp, for instance.
[0030] In one embodiment, at least one node is both the end node connected to at least one
incoming edge and the start node connected to at least one outgoing edge, and wherein
the method further comprises recording each node of the at least one node in at least
one record, wherein, in each record of the at least one record, the identifier of
each node is assigned to the combination of the number of first attributes, a first
unique relationship identifier of one incoming edge is assigned to the combination
of the number of second attributes, and a second unique relationship identifier of
one outgoing edge is assigned to the combination of the number of third attributes.
[0031] In one embodiment, the number of first attributes in the combination of the number
of first attributes is one and/or the number of second attributes in the combination
of the number of second attributes is one and/or the number of third attributes in
the combination of the number of third attributes is one.
[0032] In one embodiment, the form directed edge is recorded in a record of a second data
structure stored in the storage device. The second data structure comprises at least:
- a first attribute, in which an identifier of the start node connected to the formed
directed edge is stored,
- a second attribute, in which an identifier of the end node connected to the formed
directed edge is stored, and
- a third attribute, in which the unique relationship identifier of the formed directed
edge is stored.
[0033] Preferably, each record of the first data structure comprises a number of further
attributes, in which data characterizing the respective incoming edge and/or outgoing
edge is stored, and wherein at least one value of the number of further attributes
is retrieved and assigned to the formed directed edge, wherein the at least one retrieved
value is stored in a number of further attributes of the second data structure.
[0034] Based on the further attributes, additional indicators on the directed edges, in
particular (process) performance indicators, can be accessed.
[0035] In one embodiment, the method further comprises a filtering of the records of the
second data structure by the at least one directed edge form between adjacent members
of the subgraph.
[0036] In one embodiment, the nodes of the extracted subgraph are aggregated based on a
subordinate hierarchy level, wherein the formed directed edges are aggregated accordingly
yielding an aggregated subgraph, wherein the records of the second data structure
are filtered based on the aggregated subgraph.
[0037] Preferably, the filtered records of the second data structure are provided to a process
mining system for calculating at least one process performance measure.
[0038] Preferably, the storage device is a volatile memory, in particular the main memory,
of a computer system.
Brief description of the figures
[0039] Details and features of the invention as well as concrete embodiments of the invention
can be derived from the following description in connection with the drawing, in which:
- Fig. 1
- shows a visualization of a directed edge of a graph representing a net-work, wherein
the graph is stored according to an embodiment of the invention;
- Fig. 2
- shows a visualization of a graph representing a basic production network, wherein
a product is produced from two components;
- Fig. 3
- shows a visualization of a graph representing a production network, wherein a product
is used to produce another product;
- Fig. 4
- shows a visualization of a graph representing a basic distribution net-work;
- Fig. 5
- shows a visualization of a graph representing an example of a combined production
and distribution network;
- Fig.6
- shows a flow chart of the method to store a graph representing a network according
to an embodiment of the invention;
- Fig. 7
- shows a visualization of a graph which represents a process network;
- Fig. 8
- shows a sequence diagram of a graph representing a process network, wherein the graph
is stored according to an embodiment of the invention;
- Fig. 9
- shows a flow chart for an embodiment of the method for filtering on the graph stored
in the first data structure; and
- Fig. 10
- shows a flow chart for step B of the embodiment of the method of Fig. 9.
Detailed description of the invention
[0040] Networks are used across industries to organize and communicate information efficiently.
In order to analyze the information comprised in a network, a technical representation
of the network is required, such that the network can be stored in a storage device
and evaluated, e.g. by performing calculations on the network, often repeatedly.
[0041] Usually, a graph serves as a technical representation of a network. Each edge of
the graph connects, or links, a start node and an end node. The node is a technical
representation of an entity of the network and the edge is a technical representation
of a relationship between two entities in the network.
[0042] On a technical level, a directed network can be represented by a directed graph.
The directed graph comprises multiple nodes and multiple directed edges, wherein a
directed edge points to one of the two nodes it is connecting.
[0043] A directed edge between the start node and the end node can represent a relationship
between the start node and the end node, wherein the relationship is characterized
by a direction, such as a signal which is issued from the start node and received
by the end node. Hence, multiple linked directed edges of a (directed) graph may represent
a path of a signal across the network, wherein the signal links one entity of the
network to another.
[0044] Fig. 1 shows a visualization of a directed edge of a graph representing a network, wherein
the graph is stored according to an embodiment of the invention.
[0045] The directed edge 15; 25 shown in Fig. 1 connects a start node 10 to an end node
20, wherein the edge 15; 25 is directed to point from the start node 10 to the end
node 20. In a graph 1 representing a supply chain network, for example, the start
node 10 can represent a combination of a raw material/component and a plant, and the
end node 20 represent a combination of an intermediate component/product and a plant.
In a graph 1 representing a process network, for example, the start node 10 can represent
a process step of a first process instance and the end node 20 can represent a process
step of a second process instance.
[0046] The directed edge 15; 25 is composed of an outgoing edge 15 which is connected to
the start node 10 and an incoming edge 15 which is connected to the end node 20. Both
to the outgoing edge 15 and the incoming edge 25 a unique relationship identifier
30 is assigned, respectively. If the unique relationship identifier of the outgoing
edge 15 matches exactly the unique relationship identifier of the incoming edge 25,
as it is the case in Fig. 1, then the directed edge 15; 25 is formed. Thus, the unique
relationship identifier serves as a connection means by which a pair of start node
10 and end node 20 is formed.
[0047] The unique relationship identifier 30 comprises data characterizing the relationship
between two entities, in particular a signal linking the two entities of the network.
In a supply chain network, for example, the unique relationship identifier 30 can
comprise a bill-of-material (BOM) identifier and alternative bill-of-material (Alt
BOM) identifier and a plant identifier. To map planned distributions of materials
onto a supply chain network, the unique relationship identifier 30 can comprise a
receiving plant identifier and a material identifier. In one embodiment, wherein the
graph 1 represents a process network, the unique relationship identifier 30 can comprise
a serial number and/or identifiers associated with business documents, such as reference
numbers for orders, invoices, returns, etc.
[0048] The technical function of the unique relationship identifier 30 is to establish both
intra- and interorganizational links/connections between entities in the network.
In one embodiment, wherein the graph 1 represents a production network, the plant
identifier therefore is a crucial parameter of the unique relationship identifier
30 as it ensures that the SKUs are connected correctly within one plant. In one embodiment,
wherein the graph 1 represents a planned distribution network, the receiving plant
identifier is a crucial parameter of the unique relationship identifier 30 to ensure
that the SKUs of different plants/factories are connected correctly, i.e., according
to the underlying bill of distribution (BOD). In one embodiment, wherein the graph
1 represents a manufacturing process network, the serial number is a crucial parameter
of the unique relationship identifier 30 as it ensures that the flow of material and/or
components in a manufacturing process is correctly established between process steps
of different process instances, e.g. of an overall manufacturing process. Similarly,
in one embodiment, wherein the graph 1 represents a business process network, the
identifier usually associated with business documents is a crucial parameter for the
unique relationship identifier 30. In one embodiment, the graph 1 can represent any
combination of a supply chain network, a distribution network, and a process network.
[0049] According to the invention, the directed edge 15; 25 shown in Fig. 1 emerges from
recording the start node 10 and the end node 20 in the data structure along with the
corresponding unique relationship identifier 30. The data structure is subsequently
also references as first data structure.
[0050] In one embodiment, the data structure is a (relational) data table, wherein the start
node 10 and the end node 20 are recorded in two rows of the data table. The data table
is subsequently referenced as Signal Link Table. The minimal data set required to
store the start node 10 and the end node 20 of Fig. 1 in the Signal Link Table is
presented in Table 1.
Table 1: Minimal data set for storing an edge in a data structure.
| Node |
Signal_In |
Signal_Out |
| A (start node) |
- |
XYZ |
| B (end node) |
XYZ |
- |
[0051] Each record of the data structure of Table 1 comprises at least a combination of
a number of first attributes, in which an identifier of a node is stored, a combination
of a number of second attribute, in which an identifier of an incoming edge is stored,
and a combination of a number of third attribute, in which an identifier of an outgoing
edge is stored. In Table 1, the number of first attributes is one, i.e., there is
one first attribute which is labeled "Node". Similarly, the number of second attributes
and the number of third attributes are one. In this minimal example, the nodes are
identified by "A" and "B". In other examples, several first attributes which qualify
to identify a node in a graph may need to be combined in order to uniquely reference
nodes in the graph.
[0052] The second attribute in Table 1 is labeled "Signal_In". In the second attribute,
the unique relationship identifier of the incoming edge 25 which is connected to the
end node 20 of the same record is recorded. The incoming edge connected to a node
is a directed edge of which the node is the destination. In the example of Fig. 1,
end node 20 is the destination of the incoming edge 25 with the unique relationship
identifier "XYZ".
[0053] The third attribute of Table 1 is labeled "Signal_Out". In the third attribute the
unique relationship identifier of the outgoing edge 15 which is connected to the start
node 10 of the same record is stored. The outgoing edge connected to a node is a directed
edge of which the connected node is the origin. In the example of Fig. 1, the origin
of the outgoing edge 15 with the unique relationship identifier "XYZ" is the start
node 10.
[0054] Hence, the directed edge 15; 25 is formed by the outgoing edge 15 connected to the
start node 10 and the incoming edge 25 connected to the end nod 20 as the value of
the "Signal In" attribute of the second row in the Signal Link Table matches the value
of the "Signal Out" attribute of the first row in the Signal Link Table. In this way,
the graph stored in the data structure can be established and prepared/provided for
analysis.
[0055] A composite unique relationship identifier 30 based on multiple second attributes
and/or third attributes enables the generation of different graphs from the same data
structure, wherein the graphs differ in particular in the directed edges between two
nodes. I.e., insights on various relationships between two entities in the network
can be gained efficiently.
[0056] In one embodiment, the directed edge 15; 25 shown in Fig. 1 can represent a relationship
in a process network. Table 2 gives an example for the data structure required to
store a start node 10 and an end node 20 such that a directed edge 15; 25 emerges,
the directed edge 15; 25 representing a relationship in the process network.
Table 2: Data set for storing an edge of a process graph in a data structure.
| Case ID |
Activity |
Timestamp |
Signal_In |
Signal_Out |
| 1 |
A |
01. Dec |
- |
XYZ |
| 2 |
B |
02. Dec |
XYZ |
- |
[0057] In this example, the number of first attributes is two. The combination of first
attributes is created from the "Case ID" attribute, in which process instances are
stored, and from the "Activity" attribute, in which process steps related to a process
instance are stored. The start node 10 is hence identified by "1A" and the end node
20 is identified by "2B".
[0058] In the example of Table 2, each node 15; 25 represents a process step of two different
interconnected process instances. The second attribute and the third attribute of
the data structure shown in Table 2 correspond to the second attribute and the third
attribute of the data structure presented in Table 1. Table 2 further comprises a
fourth attribute, in which a sequence of the process steps within a process instance
is stored, represented by the attribute "Timestamp" in Table 1.
[0059] Process protocols comprising the attributes "Case ID", "Activity" and "Timestamp"
are known. According to an embodiment of the invention, the process protocol is extended
by the second attribute, "Signal In", and the third attribute, "Signal_Out", which
provide information about which process steps are connected to another process step
such that a process graph emerges from the process protocol.
[0060] Fig. 2 shows a visualization of a graph representing a basic production network, wherein
a product is produced from two components.
[0061] In the following, the emergence of the graph 1 shown in Fig. 2 from the data structure
according to an embodiment of the invention is derived from raw data tables comprising
information about SKUs, i.e., plant and material combinations of a supply chain network.
[0062] In this example, the data structure in which the graph 1 of Fig. 2 is to be stored
is populated with records created from two raw data tables, Table 3 and Table 4, both
presented below.
Table 3: Raw table comprising information about products and/or semi-finished goods.
| Material ID |
Plant ID |
BOM ID |
Alt BOM ID |
| P_123 |
100 |
123 |
01 |
Table 4: Raw table comprising information about the components of a BOM from which
the products and/or semi-finished goods are to be produced.
| BOM ID |
Alt BOM ID |
Counter |
Material ID |
Plant ID |
| 123 |
01 |
1 |
C_123 |
100 |
| 123 |
01 |
2 |
C_456 |
100 |
[0063] Table 3 links bills of materials (BOMs) to SKUs. To achieve this, Table 3 lists products
or semi-finished goods along with their BOM identifier and their alternative BOM identifier.
Hence, the SKUs of Table 3 are technically represented by end nodes 20.
[0064] Table 4 contains information about individual BOMs and its components. That is, it
links BOMs (and related Alt BOMs) to SKUs from which products are to be manufactured.
Hence, the SKUs of Table 4 are technically represented by start nodes 10. Note that
the counter attribute in Table 4 is introduced for creating a primary key.
[0065] In an embodiment of the invention, the unique relationship identifier 30 is created
according to a predefined rule, for instance, a combination, in particular a concatenation,
of the plant identifier, the BOM identifier and the alternative BOM identifier for
each node 10; 20 of Table 3 and Table 4. The nodes of Table 3 and Table 4 are recorded
in the data structure, wherein the respective node identifier is recorded in the first
attribute of the data structure. For each node, the created unique relationship identifier
30 associated with the respective node is recorded in the third attribute or the second
attribute, depending on whether the respective node is a start node 10 or an end node
20. The resulting Signal Link Table is presented below as Table 5.
Table 5: Signal Link Table for storing the graph of Fig. 2 in a storage device.
| Case Key (Material-Plant) |
Signal_In |
Signal_Out |
| P_123 - 100 |
10012301 |
- |
| C_123 - 100 |
- |
10012301 |
| C_456 - 100 |
- |
10012301 |
[0066] The node identifier is a combination of a material identifier and a plant identifier.
The resulting attribute is labeled "Case Key". In the "Case Key" attribute, the material
identifier, e.g., "P_123", is combined with the plant identifier, e.g., "100".
[0067] The unique relationship identifier 30 for the incoming edge 25 and the outgoing edge
15 connected to the nodes 10; 20 of the graph of Fig. 2, comprises the plant identifier
(underlined in Table 5), the BOM identifier and the alternative BOM identifier (double
underlined in Table 5) of Table 3 and of Table 4, respectively.
[0068] The graph 1 visualized in Fig. 2 can be the output of a graph visualization device
to which the data structure according to Table 5 is provided for displaying the graph
representing the production network.
[0069] The basic example of Fig. 2 illustrates the technical advantages of the method to
store a graph 1 according to the invention over traditional approaches. While the
raw data tables, i.e., Table 3 and Table 4 essentially comprise data on the entities
of a network, traditional approaches requires further information (e.g. structure
and/or schema information) on how the entities are connected in the network in order
to first create and then store a graph representing the network. In particular, connections
are established traditionally by joining the relevant tables in a data base. In the
example of Fig. 2 there are only two raw data tables, however, in practice there may
be hundreds, if not thousands, of data tables representing the entities of a network.
Using the method according to the invention to store the graph, therefore has the
technical advantage, that complex join operations become unnecessary. The nodes 10;
20 of the graph 1 can be stored independently, i.e., without knowledge of their neighborhoods,
in a data structure which maps exactly to the underlying graph structure. In particular,
every end node 20 is recorded with the incoming edge 25 it is connected to and every
start node 10 is recorded with the outgoing edge 15 it is connected to. As a result,
the graph 1, when needed to be visualized and/or analyzed, may be simply read out
from the data structure without the necessity of any complex operations such as joining
tables in a data base.
[0070] The basic example of Fig. 2 also shows the efficiency of the storage of the graph
1 according to the invention. The graph of Fig. 2 comprises only two directed edges
15; 25. However, both directed edges comprise an incoming edge 25 to the node "P_123
- 100", which is therefore duplicated. The method according to the invention, however,
automatically removes redundant data such as duplicated records of combinations of
nodes 10; 20 and incoming edges 25 or outgoing edges 15. Thus, the three records of
Table 5 correspond exactly to the three combinations of nodes 10; 20 and outgoing/
incoming edges 15; 25 shown in Fig. 2.
[0071] Fig. 3 shows a visualization of a graph representing a production network, wherein a product
is used to produce another product.
[0072] The graph 1 shown in Fig. 3 can be considered as an extension of the example of Fig.
2. The product "P_123", which is produced from the two components "C_123" and "C 456",
is further used to produce the product "P 456". The product "P_123" therefore is connected
both to one incoming edge 25 and to one outgoing edge 15. An example for a data structure
from which the graph 1 shown in Fig. 3 emerges is given in Table 6.
Table 6: Signal Link Table in which the graph visualized in Fig. 3 is stored.
| Case Key |
Signal_In |
Signal_Out |
| P 456 - 100 |
Plant ID || BOM ID || |
- |
| Alt BOM ID |
| P_123 - 100 |
- |
Plant ID || BOM ID || |
| Alt BOM ID |
| P_123 - 100 |
Plant ID || BOM ID || Alt BOM ID |
- |
| C_123 - 100 |
- |
Plant ID || BOM ID || |
| Alt BOM ID |
| C_456 - 100 |
- |
Plant ID || BOM ID || |
| Alt BOM ID |
[0073] Table 6 comprises five records, i.e., exactly one record for every combination of
a node 10; 20 with an incoming edge 25 or an outgoing edge 15. Hence, there is one
record of the node "P_123 - 100" as end node with a first unique relationship identifier
stored in the second attribute and one record of the node "P_123 - 100" as start node
with a second unique relationship identifier stored in the third attribute. In this
example, the first unique relationship identifier and the second unique relationship
identifier are just sketched by the identifiers (IDs) of which they are composed.
[0074] Using the method according to the invention, BOMs defining a production network which
can be represented by a graph 1 can be stored in a dedicated data structure.
[0075] The data structure is able to fully map the graph 1 onto a storage device in a computationally
efficient and robust way (e.g. by avoiding complex join operations), wherein nodes
can be connected to other nodes such that SKUs of the production network can be connected
to other SKUs and both serve as inputs or outputs of production.
[0076] Fig. 4 shows a visualization of a graph representing a basic distribution network.
[0077] The graph 1 shown in Fig. 4 is very similar to the graph 1 shown in Fig. 2, however,
the graph 1 of Fig. 4 represents a distribution network wherein the graph 1 of Fig.
2 represents a production network. In order to establish connections in a distribution
network, the unique relationship identifier 30 is created from a combination, in particular
a concatenation, of a receiving plant identifier and a material identifier.
[0078] The graph 1 shown in Fig. 4 is an example, wherein product "P 456" at plant "200"
is to be procured externally, which means supplied from plant "100" and plant "200".
The node "P_456 - 200" therefore is connected to the nodes "P_456 - 100" and "P_456
- 400" via two directed edges. The data structure of which the graph 1 can be reconstructed
is sketched in Table 7.
Table 7: Signal Link Table in which the graph visualized in Fig. 4 is stored.
| Case Key |
Signal_In |
Signal_Out |
| P 456 - 200 |
Receiving Plant ID || Material ID |
- |
| P 456 - 100 |
- |
Receiving Plant ID || Material ID |
| P_456 - 400 |
- |
Receiving Plant ID || Material ID |
[0079] Fig. 5 shows a visualization of a graph representing an example of a combined production
and distribution network.
[0080] The graph 1 of Fig. 5 results from extending Table 6 by Table 7, or vice versa. This
example demonstrates particularly that an extension/adaptation of a graph stored with
a storage device according to an embodiment of the invention is computationally very
efficient. In a traditional approach to store the graph, the schema for creating the
graph from raw data needs to be adapted in order to compute neighborhoods for each
node and the computation of neighborhoods involves joining multiple raw data tables.
[0081] According to an embodiment of the invention, the graph of Fig. 5 can be stored in
Table 8, as presented below.
Table 8: Signal Link Table in which the graph visualized in Fig. 5 is stored.
| Case Key |
Signal_In |
Signal_Out |
| P_456 - 200 |
Receiving Plant ID || |
- |
| Material ID |
| P_456 - 100 |
- |
Receiving Plant ID || |
| Material ID |
| P_456 - 400 |
- |
Receiving Plant ID || |
| Material ID |
| P_456 - 100 |
Plant ID || BOM ID || Alt BOM ID |
- |
| P_123 - 100 |
- |
Plant ID || BOM ID || |
| Alt BOM ID |
| P_123 - 100 |
Plant ID || BOM ID || Alt BOM ID |
- |
| C_123 - 100 |
- |
Plant ID || BOM ID || |
| Alt BOM ID |
| C_456 - 100 |
- |
Plant ID || BOM ID || |
| Alt BOM ID |
[0082] The graph 1 emerging from Table 8 represents a combined production and interorganizational
distribution network The context of the (sub-)network represented by the graph 1 stored
according to an embodiment of the invention may be simply switched by switching the
predefined rule to construct/create the unique relationship identifier 30 which is
assigned to the incoming edge 25 and/or the outgoing edge 15. Thus, a BOM connection
logic can be simply merged with a BOD connection logic.
[0083] The graph 1, such as shown in Fig. 5, provides an iterative connection of SKUs across
the supply chain network, which enables organizations to identify effects in the network
caused by issues that arise at any SKU of the supply chain network. For example, if
a purchase order is running late in plant "100" for the component "C_123", this would
have an effect on plant's "100" ability to produce the product "P_123". Subsequently,
this purchase order running late therefore could disrupt the supply of product "P_123"
from plant "100" to plant "200".
[0084] Depending on data sources available in the organization, this exact same approach
of storing a graph 1 according to the invention can be used to include even more nodes
into the graph, such as nodes representing supply data from supplier ERP systems,
or customer demand from customer ERP systems.
[0085] Fig. 6 shows a flow chart for the method to store a graph representing a network according
to an embodiment of the invention.
[0086] According to an embodiment of the invention the method to store a graph in a storage
device follows the steps A to D with an optional step E, as outlined below.
[0087] Initializing, in step A, a data structure comprising a first attribute, a second
attribute and a third attribute.
[0088] Creating, in step B, for each node of the graph a node identifier.
[0089] Creating, in step C, for each node of the graph a unique relationship identifier
30 for an incoming edge 25 and/or an outgoing edge 15 the node is connected to.
[0090] Recording, in step D, the node identifier in the first attribute and the respective
unique relationship identifier 30 in the second attribute or the third attribute of
the data structure.
[0091] Optionally providing, in step E, the data structure to a graph visualization device
for displaying the network which is represented by the graph.
[0092] Fig. 7 shows a visualization of a graph which represents a process network.
[0093] As described with respect to Fig. 1, in one embodiment of the invention the graph
1 emerging from the data structure can represent a process network. Whereas single
process instances consist of a linear sequence of activities or process steps, process
steps of different process instances can be connected by a signal to form a process
network. A signal marks a direct transmission from one process instance or case to
another. A signal can, for instance, represent a good which is created by one process
and is consumed by another process or a phone call which spawns as a new process.
In general, the signal is an output from an instance of a first process which becomes
an input to an instance of a second process. In the process network, a signal is spawned
by one or more activities and can be consumed by one or more activities. Technically,
the signal is represented by the unique relationship identifier 30.
[0094] To account for the linear sequence of activities in a process instance, the data
structure according to an embodiment of the invention is extended by a fourth attribute.
In the fourth attribute, a sequence of the process steps within a process instance
is stored.
[0095] An example for the data structure of which the graph 1 visualized in Fig. 7 emerges,
is given below in Table 9.
Table 9: Signal Link Table from which the process graph visualized in Fig. 7 can be
generated.
| Case ID |
Activity |
Timestamp |
Signal_In |
Signal_Out |
| 1 |
(A) Produce Screws |
01. Dec |
|
|
| 1 |
(B) Screws into warehouse |
02. Dec |
|
10 |
| 2 |
(A) Get screws from warehouse |
03. Dec |
10 |
|
| 2 |
(B) Produce Chair |
04. Dec |
|
|
| 3 |
(A) Receive Sales Order |
02. Dec |
|
|
| 3 |
(B) Deliver screws to customer |
04. Dec |
10 |
|
[0096] The data structure of Table 9 resembles an extended process protocol. The classical
process protocol comprising the "Case ID", "Activity" and "Timestamp" attributes is
extended by the "Signal_In" and "Signal_Out" attributes for recording the outgoing
edges 15 and incoming edges 25 in order to represent the interactions between process
instances of different processes in the process network.
[0097] In this example three process instances or cases are shown, each resulting from the
execution of a different process. In case with ID "1", "case 1", screws are produced
which are in "case 2" used to further produce a chair and in "case 3" directly sold
to a customer. Based on the data of Table 9 the process graph 1 visualized in Fig.
7 can be generated and calculations on the directed edges 15; 25 of the process graph
1 performed.
[0098] According to an embodiment of the invention, the signals can also connect cases across
multiple extended process protocols. The signals are identified by the unique relationship
identifier 30, independently of the extended process protocol.
[0099] The graph 1 shown in Fig. 7 comprises two types of links between the activities.
A first type of activities is only related to the case or process instance of which
it forms part. A second type of activities, connected by the direct edges 15; 25 represented
with double-lined errors, are connected by a signal with the unique relationship identifier
"10" between two different cases. Those records belonging to the first type of activities
are stored in the Signal Link Table shown in Table 9 without any unique relationship
identifiers assigned to the second attribute and the third attribute. Those records
belonging to the second type of activities store either a start node 10 or an end
node 20. Accordingly, the unique relationship identifier 30 is recorded in the third
attribute or the second attribute of the corresponding records in the Signal Link
Table. In this example, the value " 10" for the unique relationship identifier 30
is given by the serial number of the screws.
[0100] The records of the Signal Link Table can comprise a number of further attributes,
in which data characterizing the respective incoming edge (25) and/or outgoing edge
(15) is stored. The further attributes can be accessed during subsequent extraction
and/or analysis steps such that arbitrary (directed) edge KPIs can be defined and
evaluated.
[0101] In Fig. 7, the ordinal value stored in the fourth attribute of the data structure
shown in Table 9, the timestamp, is visualized on a horizontal axis. Therefore, activities
of the same process instance or case are aligned with the horizontal axis as they
are executed in a linear sequence. The interactions between process instances of different
processes, which are formed by a match of the unique relationship identifier 30 assigned
to the outgoing edge 15 of a start node 10 with the unique relationship identifier
30 assigned to the incoming edge 25 of the end node 20, are visualized by the double-lined
arrows.
[0102] Similar to the case of Fig. 5, the graph 1 shown in Fig. 7 enables organizations
to identify effects in the network caused by issues that arise at any activity in
the process network. For example, if the screws are registered late in the warehouse
in the activity "B" of "case 1", this would have an effect on the ability to produce
the chair in "case 2" and to deliver screws to the customer in due time in "case 3".
[0103] To fully leverage the information contained in the Signal Link Table and in the graph
1 presented in Fig. 7 emerging from Table 9, the process network, in particular the
interactions between activities in the process network, are to be analyzed. Hence,
the signals are to be transformed into a format which can be processed by process
mining operators. The format can be, for instance, a table in a relational data base.
[0104] The transformation of the Signal Link Table, as for example Table 9, into a format
that enables the analysis of signals, that is, directed edges between two nodes in
the process graph can be achieved with the operators "LINK_SOURCE" and "LINK_TARGET".
The first operator, "LINK_SOURCE" pulls values of the combination of the number of
first attributes, the combination of the number of second attributes and the fourth
attribute of source activities, i.e., records comprising start nodes 10, into a second
data structure. Similarly, the second operator, "LINK_TARGET", pulls the combination
of the number of first attributes, the combination of the number of third attributes
and the fourth attribute of target activities, i.e., records comprising end nodes
20, into the second data structure. In the second data structure, the resulting values
from the first operator and the second operator are merged on the unique relationship
identifier 30, which means they are pulled into the same record if they belong to
the same signal. In case a signal is established between multiple end nodes and start
nodes, the cross product of records is created in the second data structure.
[0105] Moreover, the "LINK_SOURCE" operator and the "LINK_TARGET" operator can access any
attribute of the records comprising start nodes 10 and/or end nodes 20. In particular,
they can access further attributes of the records in which data characterizing the
respective incoming edge 25 and/or outgoing edge 15 is stored. The operators can retrieve
any value stored in a further attribute and assigned it to the formed directed edge
(15; 25). The retrieved values are stored in a number of further attributes of the
second data structure, respectively.
[0106] For the example of Table 9, the resulting second data structure, thereafter referenced
as Edge Table, is given in Table 10.
Table 10: Edge Table generated from the Signal Link Table shown as Table 9. LS abbreviates
"LINK_SOURCE", LT "LINK_TARGET" and TS "Timestamp".
| LS(Activity) |
LT(Activity) |
LS(TS) |
LT(TS) |
LT(TS) - LS(TS) |
| Screws into Warehouse |
Get screws from warehouse |
02. Dec |
03. Dec |
1 Day |
| Screws into warehouse |
Deliver screws to customer |
02. Dec |
04. Dec |
2 Days |
[0107] The Edge Table can be used for all kinds of process mining operators to gain insights
on the interactions between activities of different processes in the process network,
such as calculating their throughput times.
[0108] In the Edge Table, in the first attribute, the node identifier of start nodes 10
is stored. In the second attribute, the node identifier of corresponding end nodes
20 is stored. In the third attribute (not shown in Table 10) the unique relationship
identifier 30 of the formed directed edge 15; 25 is stored. In further attributes
of Table 10, the timestamp of the start node 10 and the timestamp of the end node
20, is stored respectively. In further attributes, process performance metrics such
as the time difference between the timestamp attributed to the end node 20 and the
timestamp attributed to the start node 10 can be stored.
[0109] The Edge Table can be computed accordingly for the Signal Link Table (Table 8) which
stores the graph 1 shown in Fig. 5, wherein the graph 1 represents a supply chain
network. In this example, the "LINK_SOURCE" operator pulls the records comprising
the start nodes "P_456 - 100", "P_456 - 400", "P_123 - 100", "C_123 - 100" and "C_456
- 100" into the first attribute of the second data structure. The "LINK TARGET" operator
pulls the records comprising "P 456 - 200", "P_456 - 100" and "P_123 - 100" into the
second attribute of the second data structure. The resulting values are merged on
the unique relationship identifier 30 which can be stored in the third attribute of
the second data structure.
[0110] Fig. 8 shows a sequence diagram of a graph representing a process network, wherein the graph
is stored according to an embodiment of the invention.
[0111] To efficiently identify problems in processes, drill-down functionality is required,
i.e., the functionality to inspect specific, smaller regions of the process network.
Drill-down functionality is enabled by filters which restrict a result set such as
a subgraph from a process graph or a graph representing supply chain networks, according
to predefined conditions. These filters eventually also restrict the number of records
of the Edge Table which allows to calculate specific features (process performance
measures) on their resulting signal records more efficiently.
[0112] To filter the Edge Table for subsequent process mining, traditional filter techniques
are not enough since they fail to capture the underlying graph structure of the records
in the Edge Table and the records of the Signal Link Table from which the Edge Table
is generated.
[0113] Therefore, in one embodiment, specific filter operators, in particular the "LINK
FILTER" operator and the "LINK FILTER ORDERED" operator, are provided. The filter
operators restrict the records of the first data structure (Signal Link Table) and/or
the second data structure (Edge Table) comprising signals which are ancestors or descendants
of a user defined set of nodes. The set of nodes can be a set of SKUs (supply chain
networks), a set of activities (process networks) or any set of entities (general
network). The direction into which the filter operator is applied can be determined
via a user defined input parameter.
[0114] The filter operators can generate a subgraph from the records stored in the first
data structure by traversing the graph emerging from the first data structure. Below
an example is provided to demonstrate the functionality of the filter operators.
Table 11: Signal Link Table from which the sequence diagram of Fig, 8 can be generated.
| Case ID |
Activity |
Timestamp |
Signal_In |
Signal_Out |
| 1 |
A |
1 |
- |
S1 |
| 2 |
B |
2 |
S1 |
- |
| 2 |
C |
3 |
- |
- |
| 2 |
D |
4 |
- |
S2 |
| 3 |
E |
1 |
- |
S3 |
| 3 |
F |
5 |
S2 |
- |
| 4 |
G |
2 |
S3 |
- |
[0115] Table 11 provides a Signal Link Table comprising data of four different process instances
("Case ID"), each with a set of process steps ("Activity") executed in a given sequence
("Timestamp"). Some of the process steps/activities are interconnected by the signals
S1, S2, or S3, as attributed to the second attribute ("Signal_In") and the third attribute
("Signal_Out"). Those activities with a value assigned to the second attribute, activities
"B", "F" and "G", are represented by end nodes 20. Those activities with a value assigned
to the third attribute, activities "A", "D" and "E", are represented by start nodes
10.
[0116] In the sequence diagram of Fig. 8 each process instance is represented by a vertical
line along which the associated activities are ordered according to their timestamp.
The activities represented as start nodes 10 and the corresponding activities represented
as end nodes 20 are each connected via the interaction defined by a matching pair
of outgoing edge 15 and incoming edge 25.
[0117] From the Signal Link Table of Table 11 the Edge Table shown in Table 12 can be calculated
as outlined above. In this example, the identifier of the start/end node is given
by a combination of the "Case ID" and the "Activity", e.g. "1A"/"2B" or "2D"/"3F",
which are recorded in the first attribute or second attribute of the corresponding
records of the second data structure, accordingly. The unique relationship identifier
30 of the formed directed edges 15; 25 are recorded in the third attribute of the
second data structure, which is in the example of Table 12 the "Signal" attribute.
Table 12: Edge Table generated from the Signal Link Table shown as Table 11. LS abbreviates
"LINK SOURCE" and LT "LINK TARGET".
| LS (Case ID) |
LS(Activity) |
LT(Case ID) |
LT (Activity) |
Signal |
| 1 |
A |
2 |
B |
S1 |
| 2 |
D |
3 |
F |
S2 |
| 3 |
E |
4 |
G |
S3 |
[0118] In this example the user defined initial node 40 is node "A" which is framed by a
double line in Fig. 8 and marked as current member of the subgraph. For the sake of
the example, the user defined direction for the filter operators is the descendant's
direction, i.e., the network downstream direction. The node "A" is a start node 10
connected to an outgoing edge 15 with the unique relationship identifier "S1". The
matching incoming edge 25 with the unique relationship identifier "S1" is connected
to the end node "B" of case "2". Hence, the node "B" is marked as current member of
the subgraph. Connected to the node "B" as being part of the same process instance,
case "2", are the nodes "C" and "D", of which node "D" is connected to an outgoing
edge 15 with the unique relationship identifier "S2". Hence, the nodes "C" and "D"
are marked as member of the subgraph, wherein node "D" is marked as current member
of the subgraph. The matching incoming edge 25 with the unique relationship identifier
"S2" is connected to the node "F" of case "3". Hence, node "F" is also marked as current
member of the subgraph. Connected to node "F" as being part of the same process instance
is also the node "E". Without taking the "Timestamp" attribute into account, as it
is done by the "LINK_FILTER" operator, node "E" is inserted into the subgraph and
marked as current member since it is connected to an outgoing edge 15 with the unique
relationship identifier "S3". The matching incoming edge 25 with the unique relationship
identifier "S3" is connected to the node "G" of case "4". Using the "LINK_FILTER"
operator, the resulting subgraph of Fig. 8 starting from node "A" in the network downstream
direction comprises all nodes shown in the sequence diagram of Fig. 8. Applying the
"LINK FILTER" operator on the first data structure shown in Table 11 therefore results
in selecting all the three signals "S1", "S2" and "S3" from the Edge Table shown in
Table 12.
[0119] The second operator, "LINK_FILTER_ORDERED", applied to the same Signal Link Table
with identical initial conditions yields a different result, as it also takes the
fourth attribute of the Signal Link Table, i.e., the "Timestamp" attribute, into account.
Hence, the node "E" is rejected by the "LINK FILTER ORDERED" operator to be inserted
into the subgraph as its associated timestamp is smaller than the timestamp of the
corresponding current member of the subgraph which is node "F". The subgraph resulting
of the "LINK FILTER_ORDERED" operator from the first data structure shown in Table
11 therefore comprises all the nodes shown in the sequence diagram of Fig. 8 except
for node "E" and node "G". As a result, also the signal "S3" represented by the directed
edge between node "E" and node "G" does not form part of the subgraph. The Edge Table
12 filtered by the subgraph resulted from the "LINK FILTER ORDERED" operator therefore
only comprises the records of the signal "S1" and the signal "S2".
[0120] The search for the subgraph is performed according to a traversal of the graph 1
which is generated from the first data structure, by starting from the at least one
selected node into the selected direction according to a predefined graph traversal
protocol. The predefined graph traversal protocol can be a breadth-first search for
records, a depth-first search for records or a combination thereof, such as an iterative
deepening depth-first search for records.
[0121] The filter operators can be applied in a similar way to graphs representing a supply
chain network. For instance, the graph shown in Fig. 5 as generated from the Signal
Link Table of Table 8 can be filtered using the "LINK FILTER" operator to find all
suppliers for the product "P_456" at plant "100", represented by the node "P_456 -
100". In this example, the selected direction is the network upstream direction, and
the selected initial node is the node "P_456 - 100". The resulting subgraph found
by the "LINK FILTER" comprises the nodes "P_456 - 100", "P_123 - 100", "P_123 - 100"
and "C_456 - 100".
[0122] Further, the nodes 40; 10; 20 of the extracted subgraph can be aggregated based on
a subordinate hierarchy level, wherein the formed directed edges 15; 25 are aggregated
accordingly yielding an aggregated subgraph.
[0123] For instance, in the example considered above, the subordinate hierarchy level can
be the combination of the machine / the plant with the material for which the process
steps have been carried out. In order to analyze the transfer times between process
steps of different process instances, the extracted subgraph can be mapped to a graph
of which the nodes represent the subordinate hierarchy level, e.g., the combination
of plant and material. As a result, further process performance indicators, such as
the average time between process instances, become accessible.
[0124] Fig. 9 shows a flow chart for the method for filtering the first data structure from which
a graph representing a network emerges.
[0125] The embodiments sketched by the flow chart of Fig. 9 is applicable to data recorded
in the first data structure, wherein the data may represent any type of network, in
particular entities of a supply chain network or process steps in a process network.
[0126] Selecting, in step A, at least one initial node 40 of the first data structure and
a direction along which the filter is to be applied.
[0127] Finding, in step B, in the first data structure, the subgraph comprising all nodes
10; 20 connected to the at least one initial node 40 into the selected direction using
a predefined graph traversal protocol.
[0128] Extracting, in step C, the subgraph resulting from step B and filtering the second
data structure based on the directed edges of the extracted subgraph.
[0129] Providing, in an optional step D, the filtered second data structure comprising the
interactions between different entities in the network to a process mining system
to analyze network effects, such as process performance indicators in the case of
a process network.
[0130] Fig. 10 shows an embodiment of the drill-down functionality for the step B of the
embodiment illustrated in Fig. 9.
[0131] According to the predefined graph traversal protocol, the links between two nodes
are established by matching the value of the second attribute of the second record
of the first data structure with the value of the third attribute of the first record
of the first data structure. If a match is found, for each of the at least one record
comprising a node forming a directed edge along the selected direction with the current
member of the subgraph, the record can be marked directly as member of the subgraph
in case of the "LINK FILTER" operator. This option is illustrated in Fig. 10 with
the dashed line.
[0132] In case of the "LINK FILTER ORDERED" operator, however, a test before marking the
record as member of the subgraph has to be passed. It is tested, whether the ordinal
value of the fourth attribute of the found record is larger than the ordinal value
of the fourth attribute of the current member of the subgraph. For a positive test
result, the record is marked as member of the subgraph. In the case of a negative
test result, the record is skipped. In case no further record can be found, the subgraph
is complete and can be extracted in subsequent step C.
[0133] In summary, the main advantage of the invention is given by a predefined procedure
to record a data structure, the first data structure, by data which describes an interacting
network, such as a supply chain network or a process network, wherein the interactions
are recorded into the data structure without any prior knowledge on the structure
of the network required. From the data structure a graph representing the network
can be generated and subsequently analyzed, in particular using drill-down and/or
aggregation functionality to access insights on network effects in any detail.
1. Computer-implemented method for extracting a subgraph of a graph (1) the subgraph
starting from at least one selected node (40) into a selected direction,
wherein the graph (1) represents a network and comprises multiple nodes (10; 20) and
multiple directed edges (15; 25),
wherein each directed edge (15; 25) connects a start node (10) and an end node (20),
wherein each directed edge (15; 25) is composed of an incoming edge (25) which is
connected to the end node (20) and an outgoing edge (15) which is connected to the
start node (10),
wherein each node (10; 20) represents an entity of the network,
wherein each directed edge (15; 25) represents a relationship between two entities,
the method comprising:
recording each start node (10) in a first record of a first data structure stored
with a storage device, and each end node (20) in a second record of the first data
structure,
wherein each record comprises at least:
a combination of a number of first attributes, in which an identifier of a node is
stored,
a combination of a number of second attributes, in which an identifier of an incoming
edge is stored,
a combination of a number of third attributes, in which an identifier of an outgoing
edge is stored,
storing, in the first record, the identifier of the start node (10) in the combination
of the number of first attributes and a unique relationship identifier (30) in the
combination of the number of third attributes, wherein the unique relationship identifier
(30) represents a step along a path in the network,
storing, in the second record, the identifier of the end node (20) in the combination
of the number of first attributes and the unique relationship identifier (30) in the
combination of the number of second attributes,
wherein a value of the combination of the number of second attributes of the second
record matching the value of the combination of the number of third attributes of
the first record defines the directed edge (15; 25) between the start node (10) and
the end node (20),
wherein the subgraph is extracted according to a traversal of the graph (1) starting
from the at least one selected node (40) into the selected direction according to
a predefined graph traversal protocol.
2. The method according to claim 1, wherein the predefined graph traversal protocol is
one of a group of algorithms. the group consisting of:
- a breadth-first search for records comprising nodes (10; 20) which form directed
edges (15; 25) with the at least one selected node (40) along the selected direction,
- a depth-first search for records comprising nodes (10; 20) which form directed edges
(15; 25) with the at least one selected node (40) along the selected direction, or
- a combination thereof.
3. The method according to claim 2, wherein the breadth-first search comprises the following
steps:
a) identifying the at least one record of the at least one selected node (40) based
on the selected direction and marking one record of the at least one selected record
as a current member of the subgraph,
b) matching the value of the combination of the number of second attributes of the
second record with the value of the combination of the number of the third attributes
of the first record to find at least one record comprising the node (10; 20) which
forms the directed edge (15; 25) along the selected direction with the current member
of the subgraph and marking the at least one found record as member of the subgraph,
c) repeating step b) for every record found in step b), wherein with repeating step
b) each found record is marked as current record, until no further record is found
and marking all found records as member of the subgraph,
d) repeating steps b) and c) for every record of the at least one selected records,
and
e) extracting the records marked as member of the subgraph from the first data structure
to store the subgraph with the storage device.
4. The method according to claim 2, wherein the depth-first search comprises the following
steps:
a) identifying the at least one record of the at least one selected node (40) based
on the selected direction and marking one record of the at least one selected record
as a current member of the subgraph,
b) matching the value of the combination of the number of the second attributes of
the second record with the value of the combination of the number of the third attributes
of the first record to find at least one record comprising the node (10; 20) which
forms the directed edge (15; 25) along the selected direction with the current member
of the subgraph and marking the at least one found record as member of the subgraph,
c) repeating step b) for one record found in step b), wherein with repeating step
b) the one found record is marked as current record, until no further record is found
and marking all found records as member of the subgraph,
d) repeating steps b) and c) for every record found in step b),
e) repeating steps b), c) and d) for every record of the at least one selected records,
and
f) extracting the records marked as member of the subgraph from the first data structure
to store the subgraph with the storage device.
5. The method according to any of the preceding claims, wherein the selected direction
is one of a network upstream direction and a network downstream direction, wherein
- the at least one record of the at least one selected node (40), in which the unique
relationship identifier (30) is assigned to the combination of the number of second
attributes, is identified for the network upstream direction, and
- the at least one record of the at least one selected node (40), in which the unique
relationship identifier (30) is assigned to the combination of the number of third
attributes, is identified for the network downstream direction.
6. The method according to any of the preceding claims, wherein the network is a process
network, wherein the process network comprises two or more process instances of different
processes and interactions between process steps of process instances of at least
two different processes, wherein each node (10; 20) represents a process step of a
process instance and the unique relationship identifier (30) represents a signal between
the start node (10) which forms part of a process instance of a first process and
the end node (20) which forms part of a process instance of a second process, in particular
an output of the start node (10) provided to the end node (20), and wherein the first
data structure further comprises a fourth attribute, in which a sequence of the process
steps within a process instance is stored, such that the data structure forms an extended
process protocol.
7. The method of the preceding claim, wherein the fourth attribute stores an ordinal
value, and wherein a record of the at least one found record is only marked as member
of the subgraph if the ordinal value assigned to the fourth attribute of the record
is larger than the ordinal value of the current member of the subgraph.
8. The method according to the preceding claim, wherein the ordinal value is a timestamp
related to the respective process step.
9. The method according to any of the preceding claims, wherein at least one node (15;
25) is both the end node (20) connected to at least one incoming edge (25) and the
start node (10) connected to at least one outgoing edge (15), and wherein the method
further comprises recording each node (10; 20) of the at least one node in at least
one record, wherein, in each record of the at least one record, the identifier of
each node (10; 20) is assigned to the combination of the number of first attributes,
a first unique relationship identifier of one incoming edge (25) is assigned to the
combination of the number of second attributes, and a second unique relationship identifier
of one outgoing edge (15) is assigned to the combination of the number of third attributes.
10. The method according to any of the preceding claims, wherein the formed directed edge
(15; 25) is recorded in a record of a second data structure stored in the storage
device, the second data structure comprising at least
- a first attribute, in which an identifier of the start node (10) connected to the
formed directed edge (15; 25) is stored,
- a second attribute, in which an identifier of the end node (20) connected to the
formed directed edge (15; 25) is stored, and
- a third attribute, in which the unique relationship identifier (30) of the formed
directed edge (15; 25) is stored.
11. The method according to any of the preceding claims, wherein each record of the first
data structure comprises a number of further attributes, in which data characterizing
the respective incoming edge (25) and/or outgoing edge (15) is stored, and wherein
at least one value of the number of further attributes is retrieved and assigned to
the formed directed edge (15; 25), wherein the at least one retrieved value is stored
in a number of further attributes of the second data structure.
12. The method according to claims 10 or 11, further comprising a filtering of the records
of the second data structure by the at least one directed edge (15; 25) formed between
adjacent members of the subgraph.
13. The method according to the preceding claim, wherein the nodes (40; 10; 20) of the
extracted subgraph are aggregated based on a subordinate hierarchy level, wherein
the formed directed edges (15; 25) are aggregated accordingly yielding an aggregated
subgraph, wherein the records of the second data structure are filtered based on the
aggregated subgraph.
14. The method according to the claims 12 or 13, wherein the filtered records of the second
data structure are provided to a process mining system for calculating at least one
process performance measure.
15. The method according to claim 1, wherein the storage device is a volatile memory,
in particular the main memory, of a computer system.