BACKGROUND
[0001] It can be desirable to make assessment and/or predictions regarding the operation
of a real world physical system, such as an electro-mechanical system - e.g., an aircraft
turbine engine. Similarly, in the medical field it can be desirable to make informed
predictions regarding the status of a medical patient based on the patient's medical
history and current health condition(s).
[0002] Early detection of imminent disorders from routine measurements and checkups is an
important challenge in the medical and prognostics research community. Multivariate
observations from routine doctor visits are typically available for human patients.
Similarly, sensor data is typically available through the life of industrial equipment,
from an operational to non-operational state. A predictive model can be used to predict
a condition of the system or patient. Sensor and monitoring technologies provide accurate
data, however, accurately making such assessments and/or predictions based on this
accumulated data can be a difficult task.
[0003] While change detection and health status detection algorithms have been studied in
the past, conventional approaches do not leverage measurements from assets or patients
with unknown health statuses to build better models. Missing from the art is an approach
that can benefit from measurements with unknown health status. Conventional approaches
do not apply transduction (i.e., the act of learning from unlabeled test data - data
to be predicted upon), and chronological constraints to ensure that predictions from
a model respect the reasonable expectation that a health status of a monitored entity
changes from healthy to unhealthy and not the reverse condition.
BRIEF DESCRIPTION OF THE DRAWINGS
[0004]
FIG. 1 depicts a representation of a temporal transduction scenario in accordance
with embodiments;
FIG. 2 depicts a system for predicting performance of a monitored entity using a max-margin
temporal transduction classifier in accordance with embodiments;
FIGS. 3A-3B depict a flowchart for a process of temporal transductive stochastic gradient
descent in accordance with embodiments for a linear model
FIGS. 4A-4B depict a flowchart for a process of temporal transductive stochastic gradient
descent using a complex kernel to enable non-linear models in accordance with embodiments;
FIG. 5 depicts a comparison of the accuracy of various conventional approaches with
an embodying MMTT approach in accordance with embodiments;
FIG. 6 depicts another comparison of the accuracy of various conventional approaches
with an embodying MMTT approach in accordance with embodiments; and
FIG. 7 depicts a comparison of strategies to estimate performance on an unlabeled
set for tuning the model parameter in accordance with embodiments.
DETAILED DESCRIPTION
[0005] Embodying systems and methods apply temporal transductive learning to time-series
data to detect changes, and a corresponding point in time, in monitored physical assets
(e.g., a turbine engine, machinery, appliances, industrial equipment, computer systems,
etc.) or patients. The techniques are scalable, even in the presence of temporal constraints
and non-linear kernels.
[0006] The term "monitored entity" refers to either a monitored physical asset, or a monitored
patient. It should be readily understood that the invention is not so limited, and
that entities of other natures and/or types that can be monitored are within the scope
of this disclosure. For example, a monitored entity can be an industrial plant, machinery,
computing system, or software system.
[0007] The terms "operational, healthy, and functional" can be used interchangeably regardless
of the nature and/or type of monitored entity. Similarly, the terms "nonoperational,
unhealthy, and nonfunctional" can be used interchangeably regardless of the nature
and/or type of monitored entity. It should be readily understood that these terms
are not binary states of the monitored entity (e.g., an engine can have reduced efficiency
(nonoperational) but still provide full power; a patient can have diabetes (unhealthy),
but still participate fully in daily activities).
[0008] In accordance with embodiments, if a monitored entity is diagnosed with a failure
(non-operational, disorder, etc.), historical data can be examined to detect an early
indicator of the failure's cause. Detection of change-points in the historical data
can lead to the identification of external stimuli and/or operating conditions that
were the root cause for the eventual failure. For example, the doctor may question
the exposure of the individual to certain geographies and an engineer might investigate
the operating conditions around the time of change.
[0009] In some implementations, once the casual external stimuli and/or operating condition
is identified, the knowledge can be applied in forecasting whether other monitored
entities can experience possible faults.
[0010] Historical data on functioning entities can be seldom available. For example, a doctor
might get visits from patients that are already symptomatic or ill. Other patients
might ignore early onset symptoms until their next routine visit. Similarly, physical
assets might experience continued usage till the problem escalates. Once failure occurs,
a user might then first complain about non-functioning units. Routine measurements
are inexpensive, but expensive investigative analyses are usually not performed on
asymptomatic entities. Thus, for most purposes, any asymptomatic entities are assumed
to have an unknown health status.
[0011] Figure 1 depicts a representation of temporal transduction scenario 100 in accordance
with embodiments. Each horizontal row corresponds to the temporal state of a monitored
entity ME1, ME2,..., MEn. Each circle represents a time of a multivariate measurement,
or observation, for that particular monitored entity. Time progresses along the rows
from left to right, so that the measurements/ observations are chronologically ordered.
Measurements 110 indicate that the monitored entity is in a healthy state. Measurements
120 indicate that the monitored entity is in an unhealthy state. Measurements 130
indicate that the status of the monitored entity was made as part of a routine checkup,
and that an occurrence of disorder was unknown at that time. The interval between
measurements need not be uniform, nor does the quantity of measurements need be equal
for each monitored entity.
[0012] In accordance with embodiments, monitored entities are initially considered healthy.
For a mechanical asset this might be a safe assumption, since every new equipment
is thoroughly inspected before being deployed. While doctors might perform a comprehensive
examination before admitting a new patient, this assumption might not hold for disorders
that exist prior to the initial measurement. Only few monitored entities may have
known final unhealthy diagnoses, but the exact occurrence of when the status changed
could be unknown.
[0013] In accordance with embodiments, transductive learning is applied to model the problem
setting with temporal information. Transductive learning is similar to semi-supervised
learning, in its use of unlabeled data for inferring a better model with minimal supervision.
However, transductive learning differs from semi-supervised learning by using test
data as the unlabeled data, instead of any randomly chosen unlabeled data for model
inference.
[0014] In accordance with embodiments, the model is optimized for better performance on
the unlabeled data that is to be predicted upon, not just historical failures used
for training. The labeled set can include two parts: the initial measurements from
all entities (such as measurement 110) and the final measurements from the entities
with known unhealthy states (such as measurement 120). The unlabeled set can include
intermediate measurements and final observations from entities with unknown health
status (such as measurement 130). In accordance with embodiments, the unlabeled set
is classified to (1) classify the final states of monitored entities with unknown
final diagnoses to identify those with disorders; and to (2) classify the intermediate
observations to identify transitions points (i.e., the time when the entity changed
state from healthy to unhealthy). Thus, the unlabeled set is also the test set, hence
the transductive learning setting.
[0015] In accordance with embodiments, a maximum margin classifier incorporates an support
vector machine (SVM) classifier with its transductive counterpart. This embodying
classifier is referred to as a Max-Margin Temporal Transduction (MMTT). In addition
to traditional constraints for maximum margin classifiers, an embodying MMTT further
incorporates a constraint that penalizes violation in the chronology of events - the
entity cannot go from unhealthy to healthy without intervention. For scalability,
a stochastic gradient descent approach is implemented.
[0016] An MMTT classifier can utilize both linear and non-linear kernels. Independent of
the kernel, the number of iterations needed to infer an ε-accurate model are of the
order of O(1/λε). Experiments on multiple publicly available benchmark datasets, demonstrate
that an embodying MMTT classifier provides superior predictive performance with minimal
supervision compared to conventional and transductive SVMs.
[0017] For purposes of explanation, consider the set of entities X = {x
1,...,x
N}. Each entity denotes the element being monitored - a monitored entity. x
i ∈ R
Ti x D, where D is the dimensionality of the multivariate time-series and Ti is the length
of time-series of that particular entity i. Thus, in Figure 1, i-th row depicts x
i and t-th circle in the i-th row depicts the observation x
it. Let Y = {y
1,...,y
N} be the health-status or labels for each of the entities, where y
i ε {-1, 0, +1}
Ti. Without loss of generality, -1 denotes healthy (initial) state, +1 denotes unhealthy
(changed) state and 0 indicates unknown diagnoses.
[0018] The set of entities with known final diagnoses can be denoted by K ⊂ X, and usually
|K| << |X|. In accordance with embodiments, only entities with a changed final state
comprise K, thereby y
iTi = +1, ∀x
i ∈ K. All entities start from the healthy state, thus, y
i1 = -1, 1; ∀x
i ∈ X. The training set consists of X, y
i1 = -1, 1, ∀x
i ∈ X, and y
iTi = +1, ∀x
i ∈ K.
[0019] Embodying systems and methods can provide a final label prediction to identify the
health statuses of entities not in K. Specifically, the identification solves

∀x
i ∈ K
c, where K
c = X \ K. The prediction can be denoted by

∀x
i ∈ K
c, with the goal of minimizing the prediction error

In addition to providing a final label prediction, embodying systems and methods
also can provide a change-point detection to identify the earliest point of change
from healthy to unhealthy state in an unhealthy entity.
[0020] In accordance with embodiments, both of these predictions can be addressed by classifying
all observations along each entity, x
it, ∀x
it ∈ x
i, x
i ∈ X - i.e., the prediction can be represented as

∀i
t, which results in an entire series prediction. Analyzing the predictions of the entire
series can enable the discovery of change-points in the health state of a monitored
entity.
[0021] Figure 2 depicts system 200 for predicting performance of a monitored entity using
a MMTT classifier in accordance with embodiments. The MMTT classifier can detect operational
status changes and a corresponding point in time in the monitored entity. System 200
can include monitored physical asset 202. Sensors 204 (quantity = N1) can monitor
the operational status of monitored physical asset 202. The frequency of data sampling
by the sensors can be varied based on factors that include the nature and type of
physical asset that is undergoing monitoring. For example for a turbine engine, sensors
can monitor turbine vane wear, fuel mixture, power output, temperature(s), pressure(s),
etc. Sensor(s) 204 provide data to asset computing device 206. Asset computing device
206 can include control processor 207 that executes executable instructions 208 to
control other modules of the asset computing device. Memory 209 can include executable
instructions 208 and sensor data records 210.
[0022] System 200 can include monitored patient 212. Sensors 214 (quantity = N2) can monitor
the physiological state of monitored patient 212. The frequency of data sampling by
the sensors can varied based on the particular parameter (e.g., respiratory rate,
pulse rate, blood pressure, temperature, etc.). Sensor(s) 214 provide data to telemetry
computing device 215. Asset computing device 215 can include control processor 216
that executes executable instructions 217 to control other modules of the telemetry
computing device. Memory 218 can include executable instructions 217 and sensor data
records 219. For purposes of this discussion one monitored physical entity and one
monitored patient is illustrated in FIG 2, however it is readily understood that system
200 can include multiple monitored entities of any type and nature. Further, embodying
systems and methods be implemented regardless of the number of sensors, quantity of
data, and format of information received from monitored entities.
[0023] In accordance with embodiments, server 220 can obtain multivariate, temporal sensor
data from an asset computing device and/or a telemetry computing device across electronic
communication network 240. The electronic communication network can be, can comprise,
or can be part of, a private internet protocol (IP) network, the Internet, an integrated
services digital network (ISDN), frame relay connections, a modem connected to a phone
line, a public switched telephone network (PSTN), a public or private data network,
a local area network (LAN), a metropolitan area network (MAN), a wide area network
(WAN), a wireline or wireless network, a local, regional, or global communication
network, an enterprise intranet, any combination of the preceding, and/or any other
suitable communication means. It should be recognized that techniques and systems
disclosed herein are not limited by the nature of network 240.
[0024] Server 220 can include at least one server control processor 222 configured to support
embodying MMTT techniques by executing executable instructions 231 accessible by the
server control processor from server data store 230. The server can include memory
224 for, among reasons, local cache purposes.
[0025] Server 220 can include data preprocessing unit 226 which can apply a stochastic gradient
descent based solution to enable scalability. Transductive learning unit 228 can include
max-margin temporal transduction SVM classifier unit 229 that can detect operational
status changes and a corresponding point in time in the monitored entity sensor data
to predict performance.
[0026] Data store 230 can include past observation data 232 that contains records of prior
accumulated historic time series data 233 and diagnostics data 234. This cumulative
data can be organized by unique identifiers associated with individual monitored assets.
Recent observations 235 can include current time series data 236 and predictions/alerts
237. Max-margin model 238 is a predictive model of when a monitored entity could change
from healthy to unhealthy state.
[0027] In accordance with embodiments, a support vector machine (SVM) is extended with a
max-margin model represented by Equation 1:

[0028] Where:
w ∈ RD are parameters of the model; and
φ(·) is a feature transformation.
[0029] For inferring ω, the minimization problem of Equation 2 is solved:

[0031] The terms l
L, l
U and l
C refer, respectively, to the constraints arising from the labeled, unlabeled and chronological
considerations. The hinge-loss (l
L) is utilized in the context of supervised SVM classifiers. The unlabeled loss (l
U) can maximize the margin with the help of unlabeled data, a formulation very commonly
used in transductive SVMs. The chronological loss (l
C) can penalize if the chronological constraints are violated. The transition from
one state (e.g., operational-to-nonoperational; healthy-to-unhealthy) is strictly
a one-way transition. Any violation of the chronological constraint can result in
a penalty that needs to be minimized.
[0032] In accordance with implementations, chronological loss can be achieved by defining
the set C = {(x
it; x
ik) : x
it ∈ A; x
ik ∈ A; ∀t < k}. Instead of requiring just the local chronological ordering of labels,
in some implementations there could be a requirement that the prediction at a particular
observation is consistent with all the observations that follow it. While attractive
for the linear kernel, this comprehensive loss is likely to be computationally expensive.
By design, a localized constraint leads to an efficient streaming algorithm that requires
just an observation and its immediate neighbor (i.e., one element of time-series data
and the next subsequent element), not the entire series, making it attractive for
scalability.
[0033] To enable scalability, we propose a stochastic gradient descent based solution for
the optimization problem. The sub-gradient at steps of Equation 2, with respect to
w
s, in the context of a single training example (x
it, y
it), is expressed as Equation 3:

where 1[•] is the indicator function which takes value 1 only if • is true, else
it is 0.
[0034] Equipped with this sub-gradient, w can be iteratively learned using the update w
s+1 ← ws -η
s∇
s. srs. The term η
s = 1/λ
s can be used, similar to primal estimated sub-gradient solver (PEGASOS) for SVMs.
This leads to a stochastic gradient descent (SGD) algorithm, presented in Algorithm
1:

[0035] Figures 3A-3B depict a flowchart for temporal transductive stochastic gradient descent
(TTSGD) process 300 for a linear model in accordance with embodiments. TTSGD process
300 implements Algorithm I for the linear kernel φ(X
it) = X
it. Initially the weight vector (w) is set to zero, step 305. For each member of the
sensor dataset, step 310, a loop is entered to determine gradient update components.
If there are no further dataset members, process 300 ends.
[0036] A sensor observation is chosen, step 315, using a random distribution to select the
particular observation. The learning rate (η) is updated, step 320. The learning rate
is the speed at which the model is updated. If the learning rate is too fast, then
the optimum can be overshot, resulting in oscillations around the optimum; if too
slow, then the optimum can be reached at some longer duration. The learning rate can
be updated by η ← 1/(λ(s)), as shown in Algorithm I.
[0037] The gradient update is updated (δ ← λw
s), step 325, to account for regularization loss (R). A determination is made, step
330, as to whether a prediction using the chosen observation (step 315) is chronologically
incorrect. This step compares a chosen observation having an unhealthy status with
the next sensor data for that monitored entity. Should a prediction based on the next
sensor data indicate a healthy status, an error is flagged - recall that a basis condition
is set that assumes monitored entities go from healthy to unhealthy states (not unhealthy
to healthy states).
[0038] If at step 330 a determination is made that the current weight vector is chronologically
incorrect, the gradient component (A) is set to zero, step 335. If chronologically
correct, process 300 generates an update to gradient component (A), step 340.
[0039] A determination is made, step 345, as to whether the predicted health of the monitored
entity matches a known health status of the monitored entity. If the predicted does
match the known health status, the gradient component (B) is set to zero, step 350.
If the predicted and known health statuses do not match, process 300 generates an
update to gradient component (B), step 355.
[0040] A determination is made, step 360, as to whether the true health status of the monitored
entity is known (either healthy or unhealthy). If the true health status is known,
the gradient component (C) is set to zero, step 365. If the true health status is
not known, process 300 generates an update to gradient component (C), step 370. At
step 375, the weight vector (w
s) for the particular member of the dataset is updated using the components (R+A+B+C)
further weighted by the learning rate (step 320).
[0041] Irrespective of the round in which ∇
k(w
k; (x
k,
yk)) is first used for SGD update, its contribution is weighed by the factor 1/λs in
the s-th iteration. Also, each time (x
it, y
it) leads to an update (if a corresponding indicator is true), it contributes γ
itφ(x
it), sign((w
k, φ(x
it)))φ(x
it); or (φ(x
it+1) - φ(x
it)) respectively for the three components of the loss. Thus, after s rounds, if (x
it, y
it) has resulted in updates l
it, u
it, c
it-times (for the three components of loss - labeled (l
it), unlabeled (u
it), and chronological (c
it)), then cumulatively, w
s+1 can be summarized in terms of the number of times each observation (x
it, y
it) contributes to the update. This can be expressed as shown in equation 4:

[0042] In accordance with embodiments, by substitution using EQ. 4 the predictive model
of EQ 1 can be written as Equation 5:

[0043] In EQ. 5 the kernel representation K(a, b) = (φ(a), φ(b)) is used. With the resulting
predictive model being inner products on the feature transforms φ, a complex Mercer
kernel for non-linear transforms of the input feature space can be used. Such an embodiment
is presented in Algorithm II. Note that Algorithm II still optimizes for the primal,
but due to the nature of the sub-gradient, kernel products can be used to optimize
for the primal. In this implementation, w
s is never explicitly calculated, but rather, <w
s, φ(x
it)> is estimated as ŷ
it using EQ. 5.

[0044] In accordance with embodiments, the convergence rate of the optimization algorithm
is independent of the dataset size, labeled as well as unlabeled, but rather depends
on the regularization parameter λ and the desired accuracy ε.
[0045] Figures 4A-4B depict a flowchart for temporal transductive scholastic gradient descent
(TTSGD) process 400 in accordance with embodiments. TTSGD process 400 implements Algorithm
II using a complex Mercer kernel to enable non-linear models. Initially the weight
vector (w) is set to zero, step 405. Counters representing labeled (1), unlabeled
(u), and chronological (c) loss are set to zero, step 410. For each member of the
sensor dataset, step 415, a loop is entered to determine gradient update components.
If there are no further dataset members, process 400 ends.
[0046] A sensor observation is chosen, step 420, using a random distribution to select the
particular observation. A determination is made, step 425, as to whether a prediction
using the current weight vector is chronologically incorrect. This step compares a
chosen observation having an unhealthy status with the next sensor data for that monitored
entity. Should a prediction based on the next sensor data indicate a healthy status,
an error is flagged - recall that a basis condition is set that assumes monitored
entities go from healthy to unhealthy states (not unhealthy to healthy states).
[0047] If at step 425 a determination is made that the current weight vector is chronologically
incorrect, the chronological counter c is incremented, step 430. If chronologically
correct, process 400 continues to step 435.
[0048] A determination is made, step 435, as to whether the true health status of the monitored
entity is known (either healthy or unhealthy). If the true health status is known,
a determination is made as to whether the predicted health of the monitored entity
matches a known health status of the monitored entity. If the predicted does match
the known health status, process 400 continues to step 460. If the predicted and known
health statuses do not match, label counter c is incremented, step 345.
[0049] If at step 435 the true health status of the monitored entity is not known, a determination
is made regarding the confidence of the predicted health status, step 450. If the
confidence in the accuracy of the predicted health status is low, then unlabeled counter
u is incremented, step 455. If the confidence in the predicted health status is not
low, then process 400 continues to step 460. At step 460, the weight vector (w
s) for the particular member of the dataset is updated using counters 1, c, and u as
described by EQ. 5.
[0050] For the TTSGD embodiment of process 300, there is a high probability to find an ε-accurate
solution in O(1/λε) iterations. Just like the conventional PEGASOS algorithm, the
number of iterations is independent of the number of examples (labeled or unlabeled),
but rather depends on the regularization and desired accuracy. For the TTSGD-Kernel
approach embodiment of process 400, the runtime can depend on the number of observations,
due to the min(s, |L| + |U| + |C|) kernel evaluations at iteration s, bringing the
overall runtime to O((|L| + |U| + |C|)/λε). The bounds derived above are for the average
hypothesis, but in practice the performance of the final hypothesis can be better.
[0051] Algorithm III is pseudocode for a data generation process. The terms N, T, p, a are
predefined constants; where: N is the total instances to generate, T is a typical
length of a series, p is a fraction of instances that will change state, and a is
a fraction of the time-series that denotes change. C
-/+ represents an input binary classification dataset.

[0052] Intuitively, the time series of each entity can start from observations from one
class (healthy), and then for a select group of entities change over to the other
class (unhealthy). Generating datasets in this manner enables the accurate identification
of changepoints, since ground truth about change is available for evaluation. Algorithm
III describes the process of generating the state-change dataset given any binary
classification problem.
[0053] Algorithm III is governed by four parameters: (1) number of entities to generate
(N); (2) length of time series for entity i (modeled as a random variable drawn from
a Poisson distribution with expected value T); (3) fraction of instances that undergo
change (modeling the likelihood that a certain entity can undergo change as a Bernoulli
distribution with success probability p); and (4) time of change (modeled as a Poisson
distribution with expected value aTi for entity i). Where the factor 'a' can be thought
of as roughly the fraction of the time series of entity i that has changed state.
[0055] In addition to these datasets from the life-sciences community, also utilized were
datasets that have been utilized for comparing SVM classifiers - namely the Adult,
MNIST and USPS datasets used in the comparison of scalable SVM algorithms. These are
not life-sciences datasets, but are generally applicable as data that changes from
one class to another over time, as enabled by Algorithm III.
[0056] Table I describes the characteristics of the datasets. In the experiments, T = 10,
a = 0.5, p = 0.5 and N was chosen to generate a wide variety of data. Note that the
amount of data being classified by the classifier is approximately of the order of
NT.
Table I
Dataset |
# observations |
#features |
N |
Diabetes |
768 |
8 |
75 |
Parkinson |
195 |
22 |
20 |
Ovarian Cancer |
1200 |
5 |
10 |
Adult |
4884 |
123 |
100 |
MNIST |
70000 |
780 |
100 |
USPS |
9298 |
256 |
100 |
[0057] Figure 5 depicts comparison 500 illustrating the accuracy of various conventional
approaches with an embodying MMTT approach in accordance with embodiments, where the
number of known unhealthy entities is increased. The approaches for this comparison
are stratified 540, SVMlight 530, SVM 520, and MMTT 510. The particular dataset of
FIG. 5 is for diabetes, however the results shown are representative of the performance
for the other data sets (Parkinson, Ovarian Cancer, Adult, MNIST, and USPS).
[0058] The accuracy is expressed as a percentage (normalized to 1.0) and is plotted against
an increasing number of known unhealthy entities. Increasing the number of known unhealthy
entities is akin to the task of attempting to classify all monitored entities given
a disease outbreak wherein, the information about few unhealthy entities becomes available
at a time. The accuracy is reported on being able to classify all entities at the
final diagnoses, as well as all the intermediate predictions leading to the final
state. It can be observed that MMTT 510 outperforms all the other approaches, with
minimal supervision.
[0059] Barring the USPS dataset (not shown), the initial accuracy of MMTT is significantly
superior. The particularly weak performance of svmlight (the conventional transductive
baseline), might be attributable to the need to maintain a balance of class proportions
in the labeled and unlabeled sets. This assumption may not hold in other real-world
problem settings. The negative impact of this assumption is more pronounced in the
straw-man baseline, the stratified classifier, which by definition, randomly assigns
label by class proportions in the labeled set. Its performance worsens as more unhealthy
instances are added to the labeled set, thereby skewing the class proportions.
[0060] Root-cause analysis is another use case where final diagnoses of many/all entities
are available and the goal is to classify observations along the time-series for time-of-change
identification, enabling investigation of external stimuli concurrent with time of
change. Figure 6 depicts comparison 600 illustrating the accuracy of various conventional
approaches with an embodying MMTT approach in accordance with embodiments. The approaches
for this comparison are stratified 640, SVMlight 630, SVM 620, and MMTT 610. FIG.
6 presents the trends of accuracy on the various datasets when there is knowledge
of final diagnoses of entities - e.g., either a final diagnosis of healthy or unhealthy.
[0061] The trends look similar to those in FIG. 5, albeit, the overall performance for all
approaches has improved. Even with the knowledge of all final diagnoses, the performance
of conventional transduction is sub-par compared to even the induction based simple
SVM. It is possible that the implicit strategy of attempting to maintain class proportions
across labeled and unlabeled sets leads to poorer performance.
[0062] Limited supervision makes parameter tuning particularly challenging. The conventional
approach of using cross-validation is unlikely to be applicable. One of the important
use-cases for this kind of approach is the detection of problems that have recently
cropped up, such as a disease, equipment malfunction or otherwise. If there are only
few (e.g., perhaps one, two or three) cases of known problems, mechanisms for arriving
at the right parameters through methods that rely on estimating the generalization
error through a cross-validation or similar setup are infeasible.
[0063] Based on the chronological problem setting, embodying systems and methods offer a
remedy for parameter tuning. This being a transductive problem setting, interest resides
in the performance on unlabeled sets, not the generalization error. It is well known
that estimating this error using the labeled subset is a untenable and might lead
to overfitting the labeled set. In this particular problem setting, embodying systems
and methods utilize the fact that healthiness only transitions from healthy to unhealthy.
In accordance with embodiments, a chronological error l
chrono is defined. This chronological error is estimated only on the unlabeled subset, as
the fraction of violated chronological constraints. This estimate can be used in conjunction
with the accuracy on the labeled set l
labeled to arrive at an estimate of eventual performance on the unlabeled set.

[0064] Figure 7 depicts comparison 700 of strategies to estimate performance on an unlabeled
set for tuning the model parameter in accordance with embodiments. The x-axis represents
the parameter being tuned for better performance. The y-axis represents the fraction
of correctly classified observations as an accuracy normalized to unity. As shown
in comparison 700, for any value of a parameter the estimated score obtained by embodying
systems and methods better aligns with the true score, as compared to the score that
is derived from the labeled dataset only.
[0065] The comparison of FIG. 7 relies on estimates of unlabeled loss from EQ. 6, with true
loss 720 and estimated loss 710 based on the labeled set. The estimated unlabeled
score tracks the true score better than labeled set score 730, which is likely to
overfit the labeled set. It is crucial that the comparison uses the estimate of EQ.
6, rather than just the chrono loss, because a perfect chrono score can be achieved
just by labeling all instances as belonging to any one class. The linear combination
with labeled score penalizes such scenarios and achieves a more balanced score. The
score on the labeled set alone is not good for choosing the right γ though and might
lead to sub-optimal performance. Thus, EQ. 6 seems to be a valid potential surrogate
performance metric for tuning parameters that are likely to achieve better performance
on the unlabeled set.
[0066] In accordance with some embodiments, a computer program application stored in non-volatile
memory or computer-readable medium (e.g., register memory, processor cache, RAM, ROM,
hard drive, flash memory, CD ROM, magnetic media, etc.) may include code or executable
instructions that when executed may instruct and/or cause a controller or processor
to perform methods discussed herein such as a method for max-margin temporal transduction
SVM, as disclosed above.
[0067] The computer-readable medium may be a non-transitory computer-readable media including
all forms and types of memory and all computer-readable media except for a transitory,
propagating signal. In one implementation, the non-volatile memory or computer-readable
medium may be external memory.
[0068] Although specific hardware and methods have been described herein, note that any
number of other configurations may be provided in accordance with embodiments of the
invention. Thus, while there have been shown, described, and pointed out fundamental
novel features of the invention, it will be understood that various omissions, substitutions,
and changes in the form and details of the illustrated embodiments, and in their operation,
may be made by those skilled in the art without departing from the spirit and scope
of the invention. Substitutions of elements from one embodiment to another are also
fully intended and contemplated. The invention is defined solely with regard to the
claims appended hereto, and equivalents of the recitations therein.
[0069] Various aspects and embodiments of the present invention are defined by the following
numbered clauses:
- 1. A method of detecting status changes, and a corresponding point-in-time, in monitored
entities, the method comprising:
receiving one or more elements of time-series data from one or more sensors, the elements
of time-series data representing an operational state of the monitored entity;
creating a predictive model from the time-series data in a datastore memory;
applying a transduction classifier to the predictive model;
the transduction classifier detecting a change from healthy to unhealthy in the time-series
data, and the corresponding point-in-time when the change occurred; and
providing an identification of the change in the time-series data and the corresponding
point-in-time.
- 2. The method of clause 1, the transduction classifier being a maximum margin classifier
having a support vector machine component and a temporal transductive component.
- 3. The method of any preceding clause, the maximum margin classifier incorporating
a constraint on event chronology, the constraint penalizing predictions that changes
from unhealthy to healthy status.
- 4. The method of any preceding clause, the maximum margin classifier implementing
a stochastic gradient descent.
- 5. The method of any preceding clause, the maximum margin classifier detecting the
change in the time-series data by analyzing one element of the time-series data and
its next subsequent element.
- 6. The method of any preceding clause, applying a maximum margin classifier to the
predictive model including:
zeroing a weight vector;
for each element of the time-series data:
selecting an element of the time-series data;
updating a learning rate;
updating a gradient from a regularization loss component;
if a member of the predictive model corresponding to the selected element is chronologically
incorrect then generating a value for a first gradient update component, else setting
the first gradient update component to zero;
if a health status indicated by the corresponding predictive model member matches
a known health status of the selected element, then setting a second gradient update
component to zero, else generating a value for the second gradient update component;
if a true condition of the health status of the selected element is known, then setting
a third gradient update component to zero, else generating a value for the third gradient
update component; and
updating a weight vector for the selected element by the values of the first, the
second, and the third gradient update components and the regularization loss component.
- 7. A non-transitory computer readable medium containing computer-readable instructions
stored therein for causing a control processor to perform a method of detecting status
changes, and a corresponding point-in-time, in monitored entities, the method comprising:
receiving one or more elements of time-series data from one or more sensors, the elements
of time-series data representing an operational state of the monitored entity;
creating a predictive model from the time-series data in a datastore memory;
applying a transduction classifier to the predictive model;
the transduction classifier detecting a change from healthy to unhealthy in the time-series
data, and the corresponding point-in-time when the change occurred; and
providing an identification of the change in the time-series data and the corresponding
point-in-time.
- 8. The non-transitory computer readable medium of any preceding clause, containing
computer-readable instructions stored therein to cause the control processor to apply
the transduction classifier as a maximum margin classifier having a support vector
machine component and a temporal transductive component.
- 9. The non-transitory computer readable medium of any preceding clause, containing
computer-readable instructions stored therein to cause the control processor to perform
the method, including the maximum margin classifier incorporating a constraint on
event chronology, the constraint penalizing time-series data that changes from unhealthy
to healthy status.
- 10. The non-transitory computer readable medium of any preceding clause, containing
computer-readable instructions stored therein to cause the control processor to perform
the method, including the maximum margin classifier implementing a stochastic gradient
descent.
- 11. The non-transitory computer readable medium of any preceding clause, containing
computer-readable instructions stored therein to cause the control processor to perform
the method, including the maximum margin classifier detecting the change in the time-series
data by analyzing one element of the time-series data and its next subsequent element.
- 12. The non-transitory computer readable medium of any preceding clause, containing
computer-readable instructions stored therein to cause the control processor to perform
the method, including:
zeroing a weight vector;
for each element of the time-series data:
selecting an element of the time-series data;
updating a learning rate;
updating a gradient from a regularization loss component;
if a member of the predictive model corresponding to the selected element is chronologically
incorrect then generating a value for a first gradient update component, else setting
the first gradient update component to zero;
if a health status indicated by the corresponding predictive model member matches
a known health status of the selected element, then setting a second gradient update
component to zero, else generating a value for the second gradient update component;
if a true condition of the health status of the selected element is known, then setting
a third gradient update component to zero, else generating a value for the third gradient
update component; and
updating a weight vector for the selected element by the values of the first, the
second, and the third gradient update components and the regularization loss component.
- 13. A method of detecting status changes, and a corresponding point-in-time, in monitored
entities, the method comprising:
receiving one or more elements of time-series data from one or more sensors, the elements
of time-series data representing an operational state of the monitored entity;
creating a predictive model from the time-series data in a datastore memory;
applying a transduction classifier to the predictive model;
the transduction classifier detecting a change from healthy to unhealthy in the time-series
data, and the corresponding point-in-time when the change occurred;
if a prediction based on a current weight vector for a selected element is chronologically
incorrect, then incrementing a chronological counter;
if a true health status of the selected element is known:
then if a predicted health status does not match the true health status, then incrementing
a label counter;
else, if the true health status of the selected element is unknown:
then if a confidence in the predicted health status is low, then incrementing an unlabel
counter; and
updating the weight vector using the chronological, label, and unlabel counters.
- 14. A system for detecting status changes, and a corresponding point-in-time, in monitored
entities, the system comprising:
a server having a control processor and a transductive learning unit, the transductive
learning unit including a max-margin temporal classifier unit;
the server in communication with at least one computing device across an electronic
network, the at least one computing device having a memory with time-series data;
the server configured to receive the time-series data;
the control processor configured to execute executable instructions that cause the
control processor to perform a method of:
receiving one or more elements of time-series data from one or more sensors, the elements
of time-series data representing an operational state of the monitored entity;
creating a predictive model from the time-series data in a datastore memory;
applying a transductive classifier to the predictive model;
the transductive classifier detecting a change from healthy to unhealthy in the time-series
data, and the corresponding point-in-time when the change occurred; and
providing an identification of the change in the time-series data and the corresponding
point-in-time.
- 15. The server of any preceding clause, the executable instructions including instructions
to cause the control processor to apply the transduction classifier as a maximum margin
classifier having a support vector machine component and a temporal transductive component.
- 16. The server of any preceding clause, the executable instructions including instructions
to cause the control processor to perform the method including the maximum margin
classifier incorporating a constraint on event chronology, the constraint penalizing
time-series data that changes from unhealthy to healthy status.
- 17. The server of any preceding clause, the executable instructions including instructions
to cause the control processor to perform the method including the maximum margin
classifier implementing a stochastic gradient descent.
- 18. The server of any preceding clause, the executable instructions including instructions
to cause the control processor to perform the method including the maximum margin
classifier detecting the change in the time-series data by analyzing one element of
the time-series data and its next subsequent element.
- 19. The server of any preceding clause, the executable instructions including instructions
to cause the control processor to perform the method including:
zeroing a weight vector;
for each element of the time-series data:
selecting an element of the time-series data;
updating a learning rate;
updating a gradient from a regularization loss component;
if a member of the predictive model corresponding to the selected element is chronologically
incorrect then generating a value for a first gradient update component, else setting
the first gradient update component to zero;
if a health status indicated by the corresponding predictive model member matches
a known health status of the selected element, then setting a second gradient update
component to zero, else generating a value for the second gradient update component;
if a true condition of the health status of the selected element is known, then setting
a third gradient update component to zero, else generating a value for the third gradient
update component; and
updating a weight vector for the selected element by the values of the first, the
second, and the third gradient update components and the regularization loss component.
- 20. The server of any preceding clause, the executable instructions including instructions
to cause the control processor to perform the method including flagging an error if
a prediction based on a subsequent element of the time-series data indicating a healthy
state after a preceding element indicated an unhealthy state.
1. A method of detecting status changes, and a corresponding point-in-time, in monitored
entities (ME1, ME2, Men), the method comprising:
receiving one or more elements of time-series data (233, 236) from one or more sensors
(204, 214), the elements of time-series data representing an operational state of
the monitored entity;
creating a predictive model (238) from the time-series data in a datastore memory
(230);
applying a transduction classifier (229) to the predictive model;
the transduction classifier detecting a change from healthy to unhealthy in the time-series
data, and the corresponding point-in-time when the change occurred; and
providing an identification of the change in the time-series data and the corresponding
point-in-time.
2. The method of claim 1, the transduction classifier being a maximum margin classifier
having a support vector machine component and a temporal transductive component.
3. The method of claim 2, the maximum margin classifier incorporating a constraint on
event chronology, the constraint penalizing predictions that changes from unhealthy
to healthy status.
4. The method of either of claim 2 or 3, the maximum margin classifier implementing a
stochastic gradient descent.
5. The method of any of claims 2 to 4, the maximum margin classifier detecting the change
in the time-series data by analyzing one element of the time-series data and its next
subsequent element.
6. The method of any of claims 2 to 5, applying a maximum margin classifier to the predictive
model including:
zeroing a weight vector;
for each element of the time-series data:
selecting an element of the time-series data;
updating a learning rate;
updating a gradient from a regularization loss component;
if a member of the predictive model corresponding to the selected element is chronologically
incorrect then generating a value for a first gradient update component, else setting
the first gradient update component to zero;
if a health status indicated by the corresponding predictive model member matches
a known health status of the selected element, then setting a second gradient update
component to zero, else generating a value for the second gradient update component;
if a true condition of the health status of the selected element is known, then setting
a third gradient update component to zero, else generating a value for the third gradient
update component; and
updating a weight vector for the selected element by the values of the first, the
second, and the third gradient update components and the regularization loss component.
7. A method of detecting status changes, and a corresponding point-in-time, in monitored
entities, the method comprising:
receiving one or more elements of time-series data from one or more sensors, the elements
of time-series data representing an operational state of the monitored entity;
creating a predictive model from the time-series data in a datastore memory;
applying a transduction classifier to the predictive model;
the transduction classifier detecting a change from healthy to unhealthy in the time-series
data, and the corresponding point-in-time when the change occurred;
if a prediction based on a current weight vector for a selected element is chronologically
incorrect, then incrementing a chronological counter;
if a true health status of the selected element is known:
then if a predicted health status does not match the true health status, then incrementing
a label counter;
else, if the true health status of the selected element is unknown:
then if a confidence in the predicted health status is low, then incrementing an unlabel
counter; and
updating the weight vector using the chronological, label, and unlabel counters.
8. A system (200) for detecting status changes, and a corresponding point-in-time, in
monitored entities (ME1, ME2, MEn), the system comprising:
a server (220) having a control processor (222) and a transductive learning unit (228),
the transductive learning unit including a max-margin temporal classifier unit (229);
the server in communication with at least one computing device (206, 215) across an
electronic network (240), the at least one computing device having a memory (209,
218) with time-series data (210, 219);
the server configured to receive the time-series data;
the control processor configured to execute executable instructions (231) that cause
the control processor to perform a method of:
receiving one or more elements of time-series data from one or more sensors, the elements
of time-series data representing an operational state of the monitored entity;
creating a predictive model (238) from the time-series data in a datastore memory;
applying a transductive classifier (228) to the predictive model;
the transductive classifier detecting a change from healthy to unhealthy in the time-series
data, and the corresponding point-in-time when the change occurred; and
providing an identification of the change in the time-series data and the corresponding
point-in-time.
9. The server of claim 8, the executable instructions including instructions to cause
the control processor to apply the transduction classifier as a maximum margin classifier
(229) having a support vector machine component and a temporal transductive component.
10. The server of claim 9, the executable instructions including instructions to cause
the control processor to perform the method including the maximum margin classifier
incorporating a constraint on event chronology, the constraint penalizing time-series
data that changes from unhealthy to healthy status.
11. The server of either of claim 9 or 10, the executable instructions including instructions
to cause the control processor to perform the method including the maximum margin
classifier implementing a stochastic gradient descent.
12. The server of either of claim 10 or any claim dependent thereon, the executable instructions
including instructions to cause the control processor to perform the method including
the maximum margin classifier detecting the change in the time-series data by analyzing
one element of the time-series data and its next subsequent element.
13. The server of any of claims 8 to 12, the executable instructions including instructions
to cause the control processor to perform the method including:
zeroing a weight vector;
for each element of the time-series data:
selecting an element of the time-series data;
updating a learning rate;
updating a gradient from a regularization loss component;
if a member of the predictive model corresponding to the selected element is chronologically
incorrect then generating a value for a first gradient update component, else setting
the first gradient update component to zero;
if a health status indicated by the corresponding predictive model member matches
a known health status of the selected element, then setting a second gradient update
component to zero, else generating a value for the second gradient update component;
if a true condition of the health status of the selected element is known, then setting
a third gradient update component to zero, else generating a value for the third gradient
update component; and
updating a weight vector for the selected element by the values of the first, the
second, and the third gradient update components and the regularization loss component.
14. The server of claim 9 or any claim dependent thereon, the executable instructions
including instructions to cause the control processor to perform the method including
flagging an error if a prediction based on a subsequent element of the time-series
data indicating a healthy state after a preceding element indicated an unhealthy state.