[0001] The invention relates to a method for logging and synchronizing diagnostic related
events, in particular, to a method for logging and synchronizing diagnostic related
events in a shared resource system for railway application.
[0002] In railway applications, shared resource systems are known. As shown in Fig. 1, a
shared resource system is e.g. composed by segments 1. Each segment 1 contains one
or more units 2. In one segment 1 there is at least one central intelligence device
(CID), which is called the primary CID (P CID). In the case of redundant configuration,
there is an additional CID in the same segment 1. The additional CID is called the
secondary CID (S CID). CIDs are e.g. responsible for brake force distribution, i.e.
for the so called blending. In one unit, there can be one or more local application
devices (LAD) as stand-alone devices which are responsible for lower level tasks like
measuring environmental information such as axle load or valve pressure and to carry
out a wheel slide protection function. The minimal unit configuration contains one
unit master device (UMD), one gateway (GW) and one CID and one or more LADs. The communication
inside the unit is denominated LO (Level 0 communication), the communication between
the units is denominated L1 (Level 1 communication), and the communication between
the segments 1 is denominated L2 (level 2 communication). The UMD is responsible for
managing the device addressing inside the unit 2. The GW is routing the messages between
a LO bus and a L1 bus.
[0003] In the shared resource system both, the CID and the LAD, can generate diagnostics
related events. These are stored in a non-volatile memory of these devices but the
size of memory of these devices can be different. Usually, the size of the LAD's non-volatile
memory is much less than that of the CID's memory. Therefore, the diagnostic related
events are to be transferred or synchronized from the memory having the less size
to the memory having the larger size of the shared resource system. However, a loss
of the diagnostic related events, even not being safety critical, should be avoided.
Therefore, the object underlying the invention is to provide a method for reliably
transferring stored diagnostic related events from one memory to another memory.
[0004] The object is achieved by a method according to claim 1. Further developments of
the invention are included in the dependent claims.
[0005] According to an aspect of the invention, a method for logging and synchronising diagnostic
related events as events in a system for railway application is provided. The method
includes the steps: step 1: requesting of sending of a number of not yet stored events
from a second system to a first system by the first system; step 2: sending of the
number of not yet acknowledged stored events from the second system to the first system
by the second system; step 3: checking the number of the not yet acknowledged stored
events by the first system, and proceeding to step 4 if the number of not yet acknowledged
stored events is larger than zero, and proceeding to step 1 performed on a next second
system if the number of not yet acknowledged stored events is equal to zero; step
4: requesting of sending a number of stored events from the second system to the first
system by the first system; step 5: sending of the requested number of stored events
as sent events to the first system by the second system; step 6: checking a number
of correctly received events by the first system, and proceeding to step 7 if the
number of correctly received events is equal to a number of requested events, proceeding
to step 4 if the number of correctly received events is not equal to the number of
requested events and a count of retries is smaller than a pre-defined parameter, and
increasing the count of retries by one, and proceeding to step 1 performed on the
next second system if the number of correctly received events is not equal to the
number of requested events and the count of retries is greater than or equal to the
pre-defined parameter; step 7: storing received events in the memory of the first
system by the first system and acknowledging receipt of the received events to the
second system by the first system; step 8: checking a number of stored events, sent
in step 2, by the first system, and proceeding to step 1 performed on the next second
system if the number of stored events sent in step 2 is equal to a number of successfully
stored events in the first system, and proceeding to step 4 performed on the same
second system if the number of stored events is larger than the number of successfully
stored events in the first system.
[0006] By the provision of the method including these steps, the first systems knows the
amount of the events to be transferred and, therefore, it can recognize whether received
data concerning the amount of the events and received data concerning the events are
consistent and whether the events have been transferred correctly. In particular,
in case that the sent events have not been correctly transmitted, the first system
can assure that it correctly receives the data by not acknowledging receipt of the
data and, therefore, causing the second system to resend the data until the sent events
are correctly received. This prevents a loss of the information since the transmission
from the LAD side will always continue from where it has been interrupted until a
correct acknowledge is received from the CID. By this way of data transfer, the loss
of diagnostic information can be avoided in all scenarios.
[0007] In a first implementation of the method according to the aspect, the first system
is a superior system and the second system is a subsystem.
[0008] By the provision of the first system being responsible for the data transmission
or synchronization as a superior system, the second system as the subsystem must not
have an undue performance and, therefore, it can be realized in a less expensive manner.
[0009] In a second implementation of the method according to the aspect or according to
the first implementation, the stored events are sent as subsets of the events.
[0010] By sending the events as subsets of the events, data packets can be sufficient small
for enabling a steady and quick data transmission by a bus for data transmission when
multiple information are simultaneously to be sent by the bus.
[0011] In a third implementation of the method according to the aspect or according to anyone
of the first and the second implementation, the second system sends an error code
if a memory of the second system is corrupted.
[0012] Due to this feature, the shared resource system can recognize a fault in the second
system and, therefore, it can execute countermeasures or signalize the fault so that
the fault can be remedied as soon as possible.
[0013] In a fourth implementation of the method according to the aspect or according to
anyone of the first to third implementation, the method further comprises steps of
setting a state of a diagnostic state, generating one of the events in case of a state
transition of the diagnostic state, and storing the event.
[0014] By these steps, a diagnostic related event can be provided when upon setting a state
of a diagnostic state, the state transition appears and, therefore, a diagnostic related
condition of a component has changed.
[0015] In a fifth implementation of the method according to the fourth implementation, the
diagnostic state is one of a component diagnostic state, a functional diagnostic state,
or a system diagnostic state.
[0016] Diagnostic states can be a component diagnostic state, a functional diagnostic state
or a system diagnostic state. The component diagnostic state represent hardware related
information, the functional diagnostic state and system diagnostic state represents
a software component or a service.
[0017] In a sixth implementation of the method according to the fourth or fifth implementation,
the setting of the state of the diagnostic state depends on a state of at least one
fault linked to the diagnostic state.
[0018] One or more of the faults can be linked to a diagnostic state. Therefore, a state
transition of at least one fault condition causes a change of the state of the diagnostic
state.
[0019] In a seventh implementation of the method according to the sixth implementation,
the method comprises the step of linking at least one of the faults to one or more
related diagnostic states.
[0020] If appropriate, at least one of the faults can also be linked not only to one of
the diagnostic states but also to multiple diagnostic states. Also, a set of faults
can be linked to one or more diagnostic states.
[0021] In an eighth implementation of the method according to the sixth or seventh implementation,
the method further comprises the step of prioritizing the faults depending on a degradation
effect on a linked event.
[0022] By analysing the degradation effect on a linked event, the faults linked to the event
can be prioritized and, therefore, appropriate counter measures against the faults
can be adopted based on the severity of the fault.
[0023] In a ninth implementation of the method according to anyone of the sixth to eighth
implementation, the method further comprises the step of defining a particular fault
as a root cause if a specific diagnostic state is degraded by several faults and the
particular fault linked to the specific diagnostic state has a highest degradation
effect on the specific diagnostic state.
[0024] By the detection of the particular fault having the highest degradation effect on
the linked specific diagnostic state, the root cause can be determined and related
fault can be remedied.
[0025] In a tenth implementation of the method according to the aspect or according to anyone
of the preceding implementations, the method comprises the step of storing the event
including a source of the event and/or a time stamp and/or an environment information
in an event log history.
[0026] Due to the storing of the event including additional information, a facilitated detection
of the cause of the event is enabled. The environment information is defined by a
designer of an application of the system since he is aware which environmental parameters
are important to understand a possible root cause of the event.
[0027] In an eleventh implementation of the method according to the aspect or according
to anyone of the preceding claims, the method further comprises the step of reading
out diagnostic related events, and marking readout events as being readout.
[0028] By reading out the diagnostic related events, the events can be evaluated on external
systems as e.g. a PC. In order to facilitate the detection of new events which had
not been evaluated yet, the readout events are accordingly marked so that only newly
raised events are considered upon the next reading out.
[0029] In a twelfth implementation of the method according to the eleventh implementation,
a first user and a second user and a first level of information and a second level
of information are defined. The first user is provided with the first level of information
and second level of information and the second user is provided with the second level
of information.
[0030] By defining at least two user having different rights to readout information, protected
information of the system can be merely provided for e.g. an engineer of an owner
of the system being allowed to handle e.g. intellectual property included in the system,
whereas diagnostic states can be read by an operator for remedying faults.
[0031] The invention is now elucidated referring to the attached drawings by means of an
embodiment.
[0032] In particular:
- Fig. 1
- shows a shared resource embedded system used by the method according to the invention;
and
- Fig. 2
- shows possible linking solutions of a diagnostic concept according to the invention.
[0033] In the shared resource embedded system shown in Fig. 1 and described above, the diagnostic
related information are to be transferred from the device having the memory having
less size to the device having the memory having the larger size, i.e. from the LAD
to the CID. The CID is responsible for collecting events from the LADs, therefore
for a so-called mirroring. The mirroring also takes place in the case of an optional
redundant CID configuration. In this case, the Primary CID is responsible for the
mirroring. The mirroring can alternatively also be performed from a distant LAD which
is not including in a P CID unit. In this case, the communication is performed via
GW devices in the units 2.
[0034] In use, data are transferred from a smaller memory to a larger memory by a mirroring
process. The CID is denominated as a first system and the LAD is denominated as a
second system. Here, the first system is a superior system and the second system is
a subsystem, however, the systems can alternatively also be equivalent systems or
the first system can be the subsystem and the second system can be the superior system.
[0035] In a first step of the mirroring process, the first system requests sending of a
number of not yet acknowledged stored events from the second system to the first system.
If all the required local resources in the LAD are working correctly, i.e. the storage
is not corrupted, in a second step, the second system sends the number of not yet
acknowledged stored events from the second system to the first system. If the memory
of the second system is corrupted, the second system sends an error code in this step
of the method or, alternatively, at another appropriate moment. In a third step, the
first system checks the number of the not yet acknowledged stored events, and it is
proceeded to a fourth step if the number of not yet acknowledged stored events is
larger than zero, and it is proceeded to step 1 performed on a next second system
if the number of not yet acknowledged stored events is equal to zero. In the fourth
step, the first system requests sending a number of stored events from the second
system to the first system. The number of the requested events may be less than the
total number of stored events. Subsequently, in a fifth step, the second system sends
the requested number of stored events as sent events to the first system. In a sixth
step, the first system checks a number of correctly received events and it is proceeded
to a seventh step if the number of correctly received events is equal to a number
of requested events, and it is proceeded to the fourth step if the number of correctly
received events is not equal to the number of requested events and a count of retries
is smaller than a pre-defined parameter (e.g. 3), and the count of retries is increased
by one, and it is proceeded to the first step performed on the next second system
if the number of correctly received events is not equal to the number of requested
events and the count of retries is greater than or equal to the pre-defined parameter.
The stored events are optionally sent as subsets of events, however, the data of the
events can also be sent as entire package. In the seventh step, the first system stores
received events in the memory of the first system and acknowledges receipt of the
received events to the second system. In an eighth step, the first system checks a
number of stored events sent in the second step and it is preceded to the first step
performed on the next second system if the number of stored events sent in the second
step is equal to a number of successfully stored events in the first system, and it
is proceeded to the fourth step performed on the same second system if the number
of stored events is larger than the number of successfully stored events in the first
system. This can prevent the loss of the information since the transmission from the
LAD side will always continue from where it has been interrupted until a correct acknowledge
is received from the first system. By this way of data transfer, the loss of diagnostic
information can be avoided in all scenarios.
[0036] Fig. 2 shows possible linking solutions of a diagnostic concept according to the
invention. In the shared resource system, the diagnostic related information are presented
at different levels with different meanings.
[0037] The lowest level of the diagnostic related information is a fault. In Fig. 2, faults
are denoted with "F_1", "F_2", ..., "F_k". A fault can represent a hardware related
error or a software related error. A possible state of a fault is either "healthy"
(no error) or "sick" (error).
[0038] The highest level of the diagnostic related information is a diagnostic state. The
diagnostic state represents a higher abstraction of hardware and software. For different
purposes, there are different types of abstraction, e.g., a component diagnostic state,
a functional diagnostic state, and a system diagnostic state. The component diagnostic
state represents hardware related information as e.g. a status of the brakes. The
functional diagnostic state and system diagnostic state represent the status of a
software component or of a service, as e.g. a CAN communication. The diagnostic state
is one of the component diagnostic state, the functional diagnostic state, or the
system diagnostic state. In Fig. 2, the diagnostic states are denoted with "FDS_1",
"FDS_2", ..., "FDS-m" (Functional Diagnostic State) and "CDS-1", "CDS_2", ... "CDS_n"
(Component Diagnostic State).
[0039] As also shown in Fig. 2, the diagnostic state (FDS_2) is linked to only one fault
(F_3) or the diagnostic state (FDS_1) is linked to several faults (F_1, F_2"). However,
it is also possible that the diagnostic state is not linked to any fault (FDS_3).
Alternatively, one fault is linked to several diagnostic states (not shown).
[0040] A setting of the diagnostic states can be done in two ways. In case that no fault
is linked to the diagnostic state, it can be directly set. Otherwise, the diagnostic
state can be set by the linked fault or by the linked faults. The setting of the status
of the diagnostic state depends on the state of at least one fault linked to the diagnostic
state. If at least one of the faults is set to sick, the linked diagnostic state is
also set to sick. Therefore, the state of the diagnostic states are in relationship
with the linked fault or faults.
[0041] Upon every state transition of one of the faults or of one of the diagnostic states,
one of the events is generated. This event is stored as a diagnostic related information
in an event log history in the LAD. Alternatively, the event log history can be stored
in a memory of another component of the system. The event contains information about
a source of the event with time stamp and detailed environment information. Alternatively,
the event can contain merely a part of the information, additional or other information.
[0042] Since it is possible to link a set of faults to one or more diagnostic states, the
fault causing the state transition of the diagnostic state is ambiguous. In order
to clarify a root cause for a sick diagnostic state, the faults are prioritized depending
on a degradation effect on the linked event. A particular fault is defined as a root
cause if a specific diagnostic state is degraded by several faults and the particular
fault linked to the specific diagnostic state has a highest degradation effect on
the specific diagnostic state. By checking the root cause of the diagnostic state,
the fault having the highest degradation effect is reported to a user.
[0043] The CID is connected to an Ethernet based maintenance port which is used for reading
out diagnostic events. A maintenance software, e.g. web browser based, is used for
a connection to one or more of the CIDs of the segment 1. The maintenance software
is able to readout the CID's events and the events mirrored from the LADs. Readout
events including the stored information like timestamp, event type and root cause
are shown e.g. on a user's PC. Optionally, it can be confirmed that the events stored
in the CID have already been readout. In other words, the diagnostic related information
is readout and readout events are marked as being readout. Therefore, the user is
enabled to merely readout newly raised events.
[0044] In a phase of evaluation of diagnostic information, a first user and a second user,
e.g. an operator of the system and an engineer of an owner of the system, can be distinguished
as different users. The two different users are provided with two different levels
of information, which are defined as a first level of information and a second level
of information, since a certain set of the diagnostic information is intellectual
property (IP) of the owner of the system and, therefore, the IP relevant information
is hidden for the operator. An access level is determined by an allocation of the
diagnostic states, i.e., the faults are IP protected information, whereas, the diagnostic
states are not IP protected information. Alternatively, also still multiple different
users can be distinguished and still multiple levels of information can be defined
in accordance with respective requirements.
REFERENCE SIGNS LIST
[0045]
- 1
- segment
- 2
- unit
- CDS_
- Component Diagnostic State
- CID
- Central Intelligence Device
- F_
- Fault
- FDS_
- Functional Diagnostic State
- GW
- GateWay
- LAD
- Local Application Device
- L0
- Level 0 communication
- L1
- Level 1 communication
- P CID
- Primary Central Intelligence Device
- S CID
- Secondary Central Intelligence Device
- UMD
- Unit Master Device
1. Method for logging and synchronizing diagnostic related events as events in a system
for railway application, including the steps:
step 1: requesting of sending of a number of not yet acknowledged stored events from
a second system (LAD) to a first system (CID) by the first system (CID);
step 2: sending of the number of not yet acknowledged stored events from the second
system (LAD) to the first system (CID) by the second system (LAD);
step 3: checking the number of the not yet acknowledged stored events by the first
system (CID), and
proceeding to step 4 if the number of not yet acknowledged stored events is larger
than zero, and
proceeding to step 1 performed on a next second system (LAD) if the number of
not yet acknowledged stored events is equal to zero;
step 4: requesting of sending a number of stored events from the second system (LAD)
to the first system (CID) by the first system (CID);
step 5: sending of the requested number of stored events as sent events to the first
system (CID) by the second system (LAD);
step 6: checking a number of correctly received events by the first system (CID),
and
proceeding to step 7 if the number of correctly received events is equal to a number
of requested events,
proceeding to step 4 if the number of correctly received events is not equal to the
number of requested events and a count of retries is smaller than a pre-defined parameter,
and increasing the count of retries by one,
proceeding to step 1 performed on the next second system (LAD) if the number of correctly
received events is not equal to the number of requested events and the
count of retries is greater than or equal to the pre-defined parameter;
step 7: storing received events in the memory of the first system (CID) by the first
system (CID), and
acknowledging receipt of the received events to the second system (LAD) by the first
system (CID);
step 8: checking a number of stored events, sent in step 2, by the first system, and
proceeding to step 1 performed on the next second system (LAD) if the number of stored
events, sent in step 2, is equal to a number of successfully stored events in the
first system (CID), and
proceeding to step 4 performed on the same second system (LAD) if the number of stored
events is larger than the number of successfully stored events in the first system
(CID).
2. Method according to claim 1, wherein the first system (CID) is a superior system and
the second system (LAD) is a subsystem.
3. Method according to claim 1 or 2, wherein the stored events are sent as subsets of
the events.
4. Method according to anyone of the preceding claims, wherein
the second system (LAD) sends an error code if a memory of the second system (LAD)
is corrupted.
5. Method according to anyone of claims 1 to 4, further comprising the steps:
setting a state of a diagnostic state;
generating one of the events in case of a state transition of the diagnostic state;
and
storing the event.
6. Method according to claim 5, wherein
the diagnostic state is one of a component diagnostic state, a functional diagnostic
state, or a system diagnostic state.
7. Method according to claim 5 or 6, wherein the setting of the state of the diagnostic
state depends on a state of at least one fault linked to the diagnostic state.
8. Method according to claim 7, further comprising the step of linking at least one of
the faults to one or more related diagnostic states.
9. Method according to claim 7 or 8, further comprising the step of prioritizing the
faults depending on a degradation effect on a linked event.
10. Method according to anyone of claims 7 to 9, further comprising the step of defining
a particular fault as a root cause if a specific diagnostic state is degraded by several
faults and the particular fault linked to the specific diagnostic state has a highest
degradation effect on the specific diagnostic state.
11. Method according to anyone of the preceding claims, further comprising the steps:
storing the event including a source of the event and/or a time stamp and/or an environment
information in an event log history.
12. Method according to anyone of the preceding claims, further comprising the steps of
reading out diagnostic related information, and marking readout events as being readout.
13. Method according to claim 12, wherein a first user and a second user and a first level
of information and a second level of information are defined, and wherein
the first user is provided with the first level of information and second level of
information, and
the second user is provided with the second level of information.