12
State-Space Models
State-space models are suited to analyze time-series data. In chap-
ter 8, we saw that it is possible to model time series using regres-
sion methods (e . g., see figure 8.30). Regression is suited for model-
ing time series displaying a stationary or trend-stationary baseline
Note: Atimeseriesorprocessissaidto
be stationary when its statistical proper-
ties, such as its mean and variance, are
constant through time, as illustrated in
figure 12.1d. A time series is said to be
trend-stationary if the trend (linear or not)
can be removed from it.
on which is added recurrent patterns, as displayed in figures 12.1a–
b. In the case of a time s er i es consisting in nonstationary processes
such as the one in figure 12.1c , regression methods typically have
a limite d predictive capacity when extrapolating beyond the cases
covered in t h e training set.
(a) Stationary base-
line
(b) Trend-stationary
baseline and periodic
pattern
(c) Nonstationary
process
(d) Stationary pro-
cess
Figure 12.1: The concept of time-series
stationarity.
Figure 12.2: A cannon-man trajectory
can be described using the kinematics
equations
x
t+1
= x
t
x
t
t +
1
2
¨x
t
t
2
+ w
˙x
t+1
x
t
x
t
t w
¨x
t+1
x
t
w,
where w =[
w ˙w ¨w
]
|
is a perturbation in
the trajectory.
State-space models (SS M ) analyze time series in a dierent way
than regre ss ion models do. Regression is used in a supervised setup
in order to describe the dependence between system responses and
covariates, whereas in the case of time series, the covariates are
related t o time. On the other hand, state-space models bears that
name because it uses a dynamic model to describe the dependence
between the hidden states of a system at subse q ue nt time steps.
Take, for example, the task illustrated in figure 12.2 of predictin g
the landi n g location of a projectile; using kinematic equations, we
can derive a linear model of the projectile position at time
t
0
+
t
,
from the initial position
x
0
, the initial velocity
˙x
0
, and the exit an-
gle
0
. This task is not well suited for regression met h od s, yet it c an
easily be formulated as an SSM because there is a kinemat ic model
available for describing the dependence between the hidden states
at a time
t
and
t
+
t
. SSM is a probabilistic method where, given
a model’s structure, the task is to learn how to predict the hidden-
state variables x
t+1
given the current knowledge for x
t
. St at e-s pac e
models can be classified as unsupervised learning because it builds
a joint probability density for a set of hi d de n- st at e variables while
most of them are typically never observed.
In the context of civil engineering we can, for exam pl e, use
SSMs for modeling the degradation of infrastructures over time.
j.-a. goulet 182
As illust r at ed in figure 12.3, we employ the st ate variable
x
to
describe the condit i on of a structure. Starting from a known initial
condition
x
0
, we may develop a dynamic model
f
(
x
1
|x
0
), sim i lar to
the one in the projectile example, in orde r to describe the kinematic
process behi n d degradation. Because our model will inevitably
be an approximation of the real degradation kinematics, we will
recursively update our predicti ons using observations
y
t
as they
become available. The theory presented in this chapter allows
combining probabilisti cal l y the information contained in the prior
knowledge
f
(
x
t1
), the mo del
f
(
x
t
|x
t1
), and the observation
y
t
,in
order to ob tai n a posteri or esti mat e
f
(
x
t
|y
1
, ···,y
t
). The procedure
is employed recursively to go from a time step t 1tot.
Observation:
Predicted state:
Updated state:
Initial state:
Figure 12.3: Example of state-space model
where the state
x
t
at a time
t
is estimated
from the combination of an observation
y
t
,
adynamicmodeldescribingtheconditional
probability
f
(
x
t
|x
t1
), and the prior
knowledge for x
t1
.
SSMs allow performing several tasks such as removing (i.e.,
filtering) the obse rvation errors, forecasting future states, and
estimating hidden states that ar e not directly observable. For
example, in the infrastructure degradation example , even if onl y the
condition
x
t
is observed, an SSM allows estimating two addit i onal
hidden states: the degradation speed
˙x
t
and its acceleration
¨x
t
.
State-space models take their ori gi n from the field of control theory,
where the breakthr ough application of the method was the guiding
system for the Apollo mission in the 1960s (see figure 12.4).
Figure 12.4: Apollo mission service module.
(Photo: NASA)
12.1 Linear Gaussian State-Space Models
We can describe the mathematical formulation for SSMs using
the same theory we covered in §3. 4. 1, for the linear functions of
random variables, and in §4.1.3, where we explored the analytic
formulation of the multivariate Normal conditional probability
density function (PDF). Let us consider a linear model Y = AX + b,
where X
N
(x;
µ
X
,
X
), and a vector of observations y.Wesaw
in §3.4.1 that the output for linear functions of Normal random
variables is also Normal,
µ
Y
= Aµ
X
+ b
Y
= A
X
A
|
XY
=
X
A
|
9
>
=
>
;
X
Y
,
µ
X
µ
Y
,
X
XY
|
XY
Y
.
In §4.1.3, we saw that the conditional PDF of X given the observa-
tions y is obtained by dividing t h e joint PDF of X and Y by the
PDF of Y evaluated for observations y ,
posterior knowledge
z }| {
f
X|y
(x|Y = y
| {z }
observations
)=
prior knowledge
z }| {
f
XY
(x, y)
f
Y
(y)
= N(x; µ
X|y
,
X|y
),
probabilistic machine learning for civil engineers 183
where the conditional mean vector and covariance matrix are
µ
X|y
= µ
X
+
XY
1
Y
(y µ
Y
),
X|y
=
X
XY
1
Y
|
XY
.
12.1.1 Basic Problem Setup
We employ a basic setup involving a single hidden-state variable
in order to illustrate how the notions of linear functions of random
variables and the multivariate Normal conditional PDF are behind
the stat e-s pac e model theory. We use
t 2{
0
,
1
, ··· , T}
to describe
discrete time stamps,
x
t
to describe a hidden state of a system at a
A hidden state
x
t
is a deterministic, yet
unknown, quantity.
time t, and y
t
to describe an imperfect observation of x
t
.
Observation and transition models The relation between an obser -
vation and a hidden state is described by the observation model,
y
t
= x
t
+ v
t
| {z }
observation model
,
observation error
z }| {
v
t
: V N(v;0,
2
V
) .
We call the observation
y
t
imperfect because what we observe is
the tru e state contaminated by zero-mean Normal-distributed
observation errors.
x
t
is refer re d to as a hidden state because it is
not d ir e ct l y obser ved; only
y
t
is. The dy n ami c model descri bi n g the
possible transitions between successive time steps from a state
x
t1
to x
t
is described by the transi ti on m odel,
x
t
= x
t1
+ w
t
| {z }
transition model
,
model error
z }| {
w
t
: W N(w;0,
2
W
), (12.1)
where
w
t
is a zero-mean Normal-distributed model error describing
the disc re pan cy between the reality and our i de ali z ed transiti on
model. The transiti on model in equation 12.1 is Markovian (see
chapter 7) because a state
x
t
only depen d s on the state
x
t1
and
is thus independent of all previous states. This aspect is key in the
computational eciency of the method.
Notation
y
1:t
{y
1
,y
2
, ··· ,y
t
}
µ
t|t
E[X
t
|y
1:t
]
µ
t|t1
E[X
t
|y
1:t1
]
2
t|t
var[X
t
|y
1:t
]
Notation In order to keep the notation simple, we opt for compact
abbreviations; a set of observations
{y
1
,y
2
, ··· ,y
t
}
is described by
y
1:t
and the posterior conditional expected value for
X
t
given
y
1:t
,
E
[
X
t
|y
1:t
], is summarized by
µ
t|t
. For the subscript,
t|t
,thefirst
index refers to
X
t
and the second to
y
1:t
. Following the same nota-
tion,
E
[
X
t
|y
1:t1
] is equivalent to
µ
t|t1
. The posterior conditional
variance for X
t
given y
1:t
, var[X
t
|y
1:t
], is summarized by
2
t|t
.
j.-a. goulet 184
Prediction step At time
t =0
, only our prior knowledge
E
[
X
0
]=
µ
0
and
var
[
X
0
]=
2
0
are available. From our prior knowledge at Note: An absence of prior knowledge for
X
0
can be described by var[X
0
] !1.
t =0
, the transition model
x
t
=
x
t1
+
w
t
, and the observation
model
y
t
=
x
t
+
v
t
, we can compute our joint prior knowledge at
t =1
for
X
1
and
Y
1
,
f
(
x
1
,y
1
)=
N
([
x
1
,y
1
]
|
;
µ,
), wher e the mean
vector and covariance matrix are de s cr i bed by
µ =
µ
X
µ
Y
=
E[X
1
]
E[Y
1
]
(12.2)
=
X
XY
YX
Y
=
var[X
1
]cov(X
1
,Y
1
)
cov(Y
1
,X
1
) var[Y
1
]
. (12.3)
The expect ed value
E
[
X
1
] and variance
var
[
X
1
] are obtained by
propagating the uncertainty associated with the prior knowledge for
X
0
through the transition model (x
t
= x
t1
+ w
t
),
Note: We use the notation
µ
1
and
1
rather than
µ
1|0
and
1|0
because there is
no observation at t =0.
E[X
1
] µ
1
= µ
0
var[X
1
]
2
1
=
2
0
+
2
W
.
The expect ed value
E
[
Y
1
] and variance
var
[
Y
1
] are obtained by
propagating the uncertainty associated with the prior knowledge for
X
1
through the observation model (y
t
= x
t
+ v
t
),
E[Y
1
]=µ
0
var[Y
1
]=
2
0
+
2
W
| {z }
2
1
+
2
V
.
The covari an ce cov(X
1
,Y
1
)=
2
1
=
2
0
+
2
W
is equal to var[X
1
].
Update step We can obtain the posterior knowledge at time
t
=1
using
f(x
1
|y
1
)=
f(x
1
,y
1
)
f(y
1
)
= N(x
1
; µ
1|1
,
2
1|1
),
where the posterior mean and variance are
µ
1|1
= µ
X
+
XY
1
Y
(y
1
µ
Y
)
2
1|1
=
X
XY
1
Y
|
XY
.
(12.4)
Note that the terms in equation 12.4 correspond to those defin ed
in equat ion s 12.2 and 12.3. The posterior
f
(
x
1
|y
1
) contains the
information coming from the prior knowledge at
t
= 0, the system
dynamics described by the transition model, and the observation at
time t = 1.
The predi ct i on and update steps are now generalized to tr ans i-
tion from
t
1
! t
. From our posterior knowledge at
t 1
, the tran-
sition model
x
t
=
x
t1
+
w
t
, and the observation model
y
t
=
x
t
+
v
t
,
probabilistic machine learning for civil engineers 185
we compute our joint prior knowledge at
t
,
f
(
x
t
,y
t
|y
1:t1
)=
N([x
t
y
t
]
|
; µ , ), where the mean and covariance are
µ =
µ
X
µ
Y
=
E[X
t
|y
1:t1
]
E[Y
t
|y
1:t1
]
,
=
X
XY
YX
Y
=
var[X
t
|y
1:t1
]cov(X
t
,Y
t
|y
1:t1
)
cov(Y
t
,X
t
|y
1:t1
) var[Y
t
|y
1:t1
]
.
Each term of the mean vector and covariance matrix is obtained
following
E[X
t
|y
1:t1
] µ
t|t1
= µ
t1|t1
var[X
t
|y
1:t1
]
2
t|t1
=
2
t1|t1
+
2
W
E[Y
t
|y
1:t1
]= µ
t|t1
var[Y
t
|y
1:t1
]=
2
t1|t1
+
2
W
| {z }
2
t|t1
+
2
V
cov(X
t
,Y
t
|y
1:t1
) =
z }| {
2
t1|t1
+
2
W
.
The posterior knowledge at t is given by
Note: cov(X
t
,Y
t
|y
1:t1
)
Following the covariance prop erties,
cov(X
1
+X
2
,X
3
+X
4
)=cov(X
1
,X
3
)
···+cov(X
1
,X
4
)
···+cov(X
2
,X
3
)
···+cov(X
2
,X
4
)
therefore,
cov(X
2
,Y
2
|y
1
)= cov(X
1|1
+W,X
1|1
+W +V )
=cov(X
1|1
,X
1|1
)
···+cov
⇠:
0,X
1|1
?? W
(X
1|1
,W )
···+cov
⇠:
0,X
1|1
?? V
(X
1|1
,V )
···+cov
⇠:
0,W?? X
1|1
(W,X
1|1
)
···+cov(W,W )
···+cov
⇠:
0,W?? V
(W,V )
=cov(X
1|1
,X
1|1
)+cov(W,W )
=
2
1|1
+
2
W
.
f(x
t
|y
1:t
)=
f(x
t
,y
t
|y
1:t1
)
f(y
t
|y
1:t1
)
= N(x
t
; µ
t|t
,
2
t|t
),
where posterior mean and variance are
µ
t|t
= µ
X
+
XY
1
Y
(y
t
µ
Y
)
2
t|t
=
X
XY
1
Y
|
XY
.
The marginal likelihood
f(y
t
|y
1:t1
)
describing the prior probabili ty Note:
f(y
t
|y
1:t1
) f
(
y
t
|y
1:t1
;
), that
is, we can represent either implicitly or
explicitly the dependency on the model
parameters
that enter in the definition of
the transition and observation equations.
See §12.1.4 for further details.
We refer to
f
(
y
t
|y
1:t1
)asthemarginal
likelihood because the hidden state
x
t
was marginalized. In the case of the
multivariate Normal, the solution to
R
f
(
y
t
,x
t
|y
1:t1
)
dx
t
is analytically
tractable; see §4.1.3.
density of observing
y
t
given all observations up to time
t
1 is given
by
f(y
t
|y
1:t1
)=N(y
t
; µ
t|t1
|{z}
E[Y
t
|y
1:t1
]
,
2
t1|t1
+
2
W
+
2
V
| {z }
var[Y
t
|y
1:t1
]
).
Example: Temperature time series We now look at an illustrative
example where we apply the prediction and update steps recursively
to estimat e the temperature
x
t
over a period of three hours u si n g
imprecise observations made every hour for
t 2{1, 2, 3}
. Our prior
knowledge is described by
E[X
0
] = 10
C
and
[X
0
]=7
C
.The
transition and observation models are
y
t
= x
t
+ v
t
| {z }
observation model
,
observation error
z }| {
v
t
: V N(0, 3
2
|{z}
2
V
),
x
t
= x
t1
+ w
t
| {z }
transition model
,
model error
z }| {
w
t
: W N(0, 0.5
2
|{z}
2
W
) .
j.-a. goulet 186
Performing t h e prediction step from t =0! t = 1 leads to
E[X
1
]=E[X
0
] = 10
E[Y
1
]=E[X
1
] = 10
var[X
1
] = var[X
0
]+
2
W
= 49.25
var[Y
1
] = var[X
1
]+
2
V
= 58.25
cov(X
1
,Y
1
) = var[X
1
] = 49.25
XY
=
cov(X
1
,Y
1
)
[ X
1
][Y
1
]
=0.92.
At the time
t
= 1, the observed temperature is
y
1
=4
.
8, leadin g to
alikelihoodf (y
t
|y
1:t1
)=0.04. Performin g the update step leads to
E[X
1|1
]=E[X
1
]+
cov(X
1
,Y
1
)(y
1
E[Y
1
])
var[Y
1
]
=5.6
var[X
1|1
] = var[X
1
]
cov(X
1
,Y
1
)
2
var[Y
1
]
=7.6.
Following the Gaussian conditional example presented in §4.1.4,
figure 12.5a presents graphically each of the quantities associated
with th e prior knowledge at
t
= 1 obtained by propagating the
knowledge of
X
0
in the transition and observation models, the
likelihood of
y
t
, and the posterior knowledge. By repeating the
same predi ct i on and update steps for
t
= 2, where
y
2
= 12
.
1, we
obtain
Prediction step
E[X
2|1
]=E[X
1|1
]=5.6
E[Y
2|1
]=E[X
2|1
]=5.6
var[X
2|1
] = var[X
1|1
]+
2
W
=7.9
var[Y
2|1
] = var[X
2|1
]+
2
V
= 16.9
cov(X
2|1
,Y
2|1
) = var[X
2|1
]=7.9
XY
=
cov(X
2|1
,Y
2|1
)
[X
2|1
][Y
2|1
]
=0.68
f(y
2
|y
1
)=0.03
Update step
E[X
2|2
]=E[X
2|1
]+
cov(X
2|1
,Y
2|1
)(y
2
E[Y
2|1
])
var[Y
2|1
]
=8.6
var[X
2|2
] = var[X
2|1
]
cov(X
2|1
,Y
2|1
)
2
var[Y
2|1
]
=4.2.
These resu lt s are illustrated in figure 12.5b. The details of s ub se -
quent steps are not presented because they consist in repeating
recursively the same prediction and update steps. Figure 12.5c
shows how, in only three time steps, we went from a diuse k nowl-
edge describing X
0
to a more precise estimate for X
2
.
0 1 2 3
0
5
10
15
20
Time - t
State - x
t
µ
t|t
E[X
t
|y
1:t
]
µ
t|t
±
t|t
E[X
t
|y
1:t
] ±[X
t
|y
1:t
]
ˇx
t
(true values)
y
t
µ
t|t1
E[X
t
|y
1:t1
]=E[Y
t
|y
1:t1
]
µ
t|t1
±
t|t1
E[X
t
|y
1:t1
] ±[X
t
|y
1:t1
]
E[Y
t
|y
1:t1
] ±[Y
t
|y
1:t1
]
0
10
20
0
4.8
20
x
t
y
t
f(x
t
,y
t
|y
1:t1
)
0 4.8 20
0
y
t
f(y
t
|y
1:t1
)
0
10
20
0
4.8
20
x
t
y
t
f(x
t
|y
1:t
)
(a) t =1
0 1 2 3
0
5
10
15
20
Time - t
State - x
t
0
10
20
0
12.1
20
x
t
y
t
f(x
t
,y
t
|y
1:t1
)
0 12.1 20
0
y
t
f(y
t
|y
1:t1
)
0
10
20
0
12.1
20
x
t
y
t
f(x
t
|y
1:t
)
(b) t =2
0 1 2 3
0
5
10
15
20
Time - t
State - x
t
0
10
20
0
7.4
20
x
t
y
t
f(x
t
,y
t
|y
1:t1
)
0 7.4 20
0
y
t
f(y
t
|y
1:t1
)
0
10
20
0
7.4
20
x
t
y
t
f(x
t
|y
1:t
)
(c) t =3
Figure 12.5: Step-by-step probabilistic
hidden-state estimation. Note that the pink
curve overlaid on the surfaces describes
either the joint or the conditional PDF
evaluated for the observed value y
t
.
12.1.2 General Formulation
We now extend the prediction and update steps with a formulation
compatible with multiple hidden states. For example, consider
probabilistic machine learning for civil engineers 187
the case where want to use a SSM to estimate the hidden states
x
t
=[x
t
˙x
t
]
|
, that is, the temperature
x
t
and its rate of change
˙x
t
.
The observation model then becomes
y
t
= x
t
+ v
t
C=[1 0]
z}|{
Cx
t
+ v
t
,
where the observation matrix C = [1 0] indicates that an observa-
tion
y
t
is a function of
x
t
but not of the rate of change
˙x
t
.
v
t
is a
realization of an observation error
V N
(
v
;0
,
2
V
). In the general
case, obser vation errors are described by a multivariate Normal
random variable V
N
(v; 0
,
R), where R is the covariance matrix.
The transition model for the two hidden states is
x
t
= x
t1
+t · ˙x
t1
+ w
t
˙x
t
x
t1
w
t
x
t
=
A=
1t
01
z}|{
Ax
t1
+
w
t
=
w
t
˙w
t
z}|{
w
t
,
where the transiti on matrix A describes the model kinematics and
w
t
are realiz at ion s of the model errors W
N
(w; 0
,
Q) that aect
each of t he hidden states. Observation model
y
t
= Cx
t
+ v
t
, v : V N(v; 0, R)
| {z }
Observation errors
C:Observationmatrix
R:Observationerrorscovariance
Transition model
x
t
=Ax
t1
+w
t
, w : W N(w; 0, Q)
| {z }
Transit i on err o rs
A:Transitionmatrix
Q:Transitionerrorscovariance
In t he prediction step, we compute our prior knowledge at
t
from
our posterior knowledge at t 1, us i ng the transition model and the
observation mod el ,
µ
t|t1
= Aµ
t1|t1
t|t1
= A
t1|t1
A
|
+ Q
E[Y
t
|y
1:t1
]=Cµ
t|t1
cov(Y
t
|y
1:t1
)=C
t|t1
C
|
+ R
cov(X
t
, Y
t
|y
1:t1
)=
t|t1
C
|
.
(12.5)
These ter ms are collectively describing our joint prior knowledge
f
(x
t
,
y
t
|
y
1:t1
)=
N
([x
t
y
t
]
|
;
µ,
) with the mean vector and
covariance matrix defined as
µ =
µ
X
µ
Y
=
µ
t|t1
E[Y
t
|y
1:t1
]
=
X
XY
|
XY
Y
=
t|t1
cov(X
t
, Y
t
|y
1:t1
)
cov(X
t
, Y
t
|y
1:t1
)
|
cov(Y
t
|y
1:t1
)
.
In the update step, the posterior knowledge is obtained using the
conditional multivariate Normal PDF,
f(x
t
|y
1:t
)=
f(x
t
, y
t
|y
1:t1
)
f(y
t
|y
1:t1
)
= N(x
t
; µ
t|t
,
t|t
),
which is defined by its post e ri or me an vector and covariance matrix,
µ
t|t
= µ
X
+
XY
1
Y
(y
t
µ
Y
)
t|t
=
X
XY
1
Y
|
XY
.
(12.6)
j.-a. goulet 188
The posteri or knowledge
f
(x
t
|
y
1:t
) combines the information com-
ing from the prior knowledge
{µ
0
,
0
}
, the sequence of observation
y
1:t
, and the system dynamics described by the transition model.
Equations 12.5 and 12.6, obtained using the theory presented in
§3.4.1 and §4.1.3 are referred to as the K al m an filter
1
where they
1
Kalman, R. E. (1960). A new approach
to linear filtering and prediction problems.
Transactions of the ASME–Journal of
Basic Engineering 82 (Series D), 35–45
are reorganized as
Prediction step
µ
t|t1
= Aµ
t1|t1
Prior mean
t|t1
= A
t1|t1
A
|
+ Q Prior covariance
Update step
f(x
t
|y
1:t
)=N(x
t
; µ
t|t
,
t|t
)
µ
t|t
= µ
t|t1
+ K
t
r
t
Posterior expected value
t|t
=(I K
t
C)
t|t1
Posterior covariance
r
t
= y
t
ˆ
y
t
Innovation vector
ˆ
y
t
= Cµ
t|t1
Prediction observations vector
K
t
=
t|t1
C
|
G
1
t
Kalman gain matrix
G
t
= C
t|t1
C
|
+ R Innovation covariance matrix
In the name Kalman filter,thetermfilter refers to the action
of removing the observation errors from an obs er ved time series.
Filtering is an intrinsically recursive operation that is suited for
applications where a flow of data can be continuously processed
as it gets collected. Note that in the classic Kalman formulation,
2
2
Simon, D. (2006). Optimal state estima-
tion: Kalman, H infinity, and nonlinear
approaches. Wiley
there are additional control terms for including the eect of exter-
nal action s on the system. These terms are omitted here becau se ,
contrarily to the field of aerospace, where flaps, r u dd er s, or trust
vectors are employed to continuously modi f y the state (i.e., position,
speed, and accelerati on) of a vehicle traveling through space, civ il
systems s el dom have the possibility of such direct control actions
being applied to them.
Example: Temperature time series We want to estimate
x
t
=[x
t
˙x
t
]
|
,
the tempe rat u r e and its rate of change over a period of 5 h our s us-
ing impre ci se observations made every hour for
t 2{
1
,
2
, ··· ,
5
}
.
The observation and transition models are respectively
y
t
=
[1 0]
z}|{
Cx
t
+
V N(0,
2
V
)
z}|{
v
t
,
x
t
˙x
t
z}|{
x
t
=
1t
01
z}|{
Ax
t1
+
WN(0,Q)
z}|{
w
t
˙w
t
z}|{
w
t
.
Our pri or knowledge for the temperature is
E
[
X
0
] = 10
C
,
[
X
0
]=
probabilistic machine learning for civil engineers 189
2
C
, and for the rate of change,
E
[
˙
X
0
]=
1
C
,
[
˙
X
0
]=1
.
5
C
.
We assume that our knowledge for
X
0
and
˙
X
0
is independent
(
X
0
??
˙
X
0
!
X
0
˙
X
0
= 0), so the prior mean vector and covariance
matrix are
µ
0
=
10
1
,
0
=
2
2
0
01.5
2
.
The observation-errors standard deviation is
V
=4
C
.Model
errors are defined by the covariance matrix Q,whichdescribes
the error s on the temperature estimate and its rate of change. We
assume t hat these errors are caused by a constant acceleration that
takes place du ri n g t he interval
t
, which is not accounted for in the
model, th at builds on the hypothesis of a constant speed and no
acceleration. In order to define the covariance matrix Q ,weneed
to define a new vector z
t
=[
z
t
˙z
t
¨z
t
]
|
that desc ri bes a realization
of Z
t
N
(z
t
; 0
,
Z
), that is, the eect of a constant acceleration
having a random magnitude such that Z
t
??
Z
t+i
, 8i 6
= 0. As
illustrated in figur e 12.6, having a constant acceleration
¨z
t
over a
time ste p of length
t
causes a change of
¨z
t
t
in the trend (i.e.,
speed) and a change of
¨z
t
t
2
2
in the level. We can express t he er r ors
caused by the constant acceleration in z
t
in terms of our transition
errors w
t
as
(acceleration)
(trend/speed)
(level)
Figure 12.6: The error caused on the level
and the trend by a constant acceleration
¨z
t
over a time step of length t.
W
t
= A
z
Z
t
N(w
t
; 0, A
z
z
A
|
z
|
{z }
Q
),
where
A
z
=
00
t
2
2
00 t
,
z
=
2
4
00 0
00 0
00
2
W
3
5
.
Therefore, the defin it ion of the matrix Q for the transition errors
W
t
is
Q = A
z
z
A
|
z
=
2
W
"
t
4
4
t
3
2
t
3
2
t
2
#
.
For further details regarding the derivation of the proce ss noise
covariance matrices, the reader should consult the textbook by Bar-
Shalom, Li, and Kirubarajan.
3
In this example, the noise parameter
3
Bar-Shalom, Y., X. Li, and T. Kirubara-
jan (2001). Estimation with applications to
tracking and navigation: Theory algorithms
and software. John Wiley & Sons
is equal to
2
W
=0
.
1 and
t
= 1. Figure 12.7 presents the state
estimates for th e temperatu r e and its rate of change along with the
true values that were employe d to generate simulated observations.
0 1 2 3 4 5
10
5
0
5
10
15
Time - t
State - x
t
µ
t|t
E[X
t
|y
1:t
]
µ
t|t
±
t|t
E[X
t
|y
1:t
] ± [X
t
|y
1:t
]
y
t
ˇ
x
t
(true values)
µ
t+1|t
E[X
t+1
|y
1:t
]=E[Y
t+1
|y
1:t
]
µ
t+1|t
±
t+1|t
E[X
t+1
|y
1:t
] ± [X
t+1
|y
1:t
]
E[Y
t+1
|y
1:t
] ± [Y
t+1
|y
1:t
]
0 1 2 3 4 5
4
2
0
2
Time - t
Rate of change -
˙
x
t
Figure 12.7: Example of hidden-state
estimation for a constant-speed model.
j.-a. goulet 190
12.1.3 Forecasting and Smoothing
The filtering method presented in the p re vi ou s section estimates
f
(x
t
|
y
1:t
), that is, the posterior probability density of hidden states
x
t
, given all the observations up to time
t
.Withthisprocedure,
acquiring new data at a time
t
+ 1 does not modify our knowledge
about x
t
. This second task of estimating the posterior probability
of hidden states x
t
given all the observations up to time
T >t
,
f
(x
t
|
y
1:T
), is referred to as smoothing. A third task consi st s in
forecasting the hidden states x
t+n
,
n
time steps in the future,
given all the observations up to time
t
, that is,
f
(x
t+n
|
y
1:t
). The
three tas k s of filtering, smoothing, and forecasting are illustrate d
schematically in figure 12.8.
Filtering
0 t
Smoothing
0 t
Forecasting
0 t
Figure 12.8: Comparison of filtering,
smoothing,andforecasting,wheretheblue
region corresponds to the data employed to
estimate x
t
or forecast x
t+n
.
Forecasting:
f
(x
t+n
|
y
1:t
) In the context of SSMs, forecasting is
trivial because it only involves recursively propagating the current
knowledge
µ
t|t
,
t|t
at time
t
through t he predict i on step. In short,
forecasting from a time t to t + 1 follows
f(x
t+1
|y
1:t
)=N(x
t+1
; µ
t+1|t
,
t+1|t
)
µ
t+1|t
= Aµ
t|t
t+1|t
= A
t|t
A
|
+ Q.
Note that whenever data is missing at a given time step, only the
prediction step i s recursively applied until new data is obtained.
Therefore, missing dat a is treated in the same way as forecasting
using the transition model. This is an interesting feature for SSMs,
which do not require any special treatment for cases wh er e data is
missing.
0 1 2 3 4 5 6 7 8 9 10
10
5
0
5
10
15
Time - t
State - x
t
µ
t|t
E[X
t
|y
1:t
]
µ
t|t
±
t|t
E[X
t
|y
1:t
] ± [X
t
|y
1:t
]
y
t
ˇ
x
t
(true values)
µ
t+1|t
E[X
t+1
|y
1:t
]=E[Y
t+1
|y
1:t
]
µ
t+1|t
±
t+1|t
E[X
t+1
|y
1:t
] ± [X
t
+1|y
1:t
]
0 1 2 3 4 5 6 7 8 9 10
4
2
0
2
Time - t
Rate of change - ˙x
t
Figure 12.9: Example of hidden-state
forecast for a constant-speed model.
Example: Temperature time series (continued) We build on the
temperature estimation example presented in figure 12.7 to illus-
trate th e concept of forecasting. The goal here is t o forecast the
temperature for five hours after the last time step for which data is
available. The result is presented in figu r e 12.9. Note how, because
no observation is available to update the predictions, the standard
deviation of the estimated hidden states keeps increasing because
at each time step, the model error covariance matrix Q is added
recursively to the hidden-state covariance matrix.
Smoothing:
f
(x
t
|
y
1:T
) With smoothing, we perform the reverse
process of updati n g the knowledge at a time
t
, usin g the knowledge
at
t
+ 1. Smoothing starts from the last time step
T
estimated using
the filtering procedure described in §12.1.2 and moves re cu rs i vely
probabilistic machine learning for civil engineers 191
backward until the initial time step is reached. The information flow
during fil t e ri n g and smoothing is illustrated in figure 12.10. The
Filter:
Smooth:
t
µ
0
0
0
µ
1|1
1|1
...
1
µ
t|t
t|t
t
...
µ
T1|T1
T1|T1
T 1
µ
T|T
T|T
T
µ
T1|T
T1|T
...
µ
t|T
t|T
...
µ
1|T
1|T
µ
0|T
0|T
Figure 12.10: Information flow for the
filtering and smoothing processes.
posterior knowledge
f
(x
t
|
y
1:T
) combines the information coming
from the prior knowledge
{µ
0
,
0
}
, the entire sequence of observa-
tion y
1:T
, and the transition model. The smoothing procedure can
be employed to learn initial hidden states
{µ
0
,
0
}!{µ
0|T
,
0|T
}
.
Updating i ni t i al values using the smoother typically improves the
model quality. The most common algorithm for performing smooth-
ing is the RTS Kalman smoother,
4
which is defined following
4
Rauch, H. E., C. T. Strieb el, and F. Tung
(1965). Maximum likelihood estimates of lin-
ear dynamic systems. AIAA Journal 3 (8),
1445–1450
f(x
t
|y
1:T
)=N(x
t
; µ
t|T
,
t|T
)
µ
t|T
= µ
t|t
+ J
t
µ
t+1|T
µ
t+1|t
t|T
=
t|t
+ J
t
t+1|T
t+1|t
J
|
t
J
t
=
t|t
A
|
1
t+1|t
.
As illust r at ed in figure 12.10, the RTS Kalman smoother is per-
formed it er at i vely starting from the Kalman filter step
t
=
T
1,
where the mean vector
µ
t+1|T
=
µ
T|T
and covariance matrix
t+1|T
=
T|T
are already available from the last Kalman fil t er
step.
0 1 2 3
0
5
10
15
20
Time - t
State - x
t
µ
t|t
µ
t|t
±
t|t
µ
t|T
±
t|T
µ
t|T
ˇx
t
Figure 12.11: Comparison of filtering,
smoothing, and the true states.
Example: Smoothing temperature estimates Figure 12.11 presents
the RTS smoothing procedure applied to the te mpe rat u r e esti-
mation exam pl e initially presented in figure 12.5. These results
show how, between the filtering and smoothing procedures, the
mean at
t
= 2 went from 8
.
6to8
.
4, and the variance from 4
.
2
to 1
.
8. The uncertainty is reduced with smoothing because the
estimate
var
[
X
2|3
] is based on one more data point than
var
[
X
2|2
].
Figure 12. 11 presents the entire smoothed estimate, which almost
coincides with the true values and has a narrower uncertainty
confidence region than the filtering esti mat es .
j.-a. goulet 192
12.1.4 Parameter Estimation
The defini t ion of matrices A, C, Q, and R involved in the observa-
tion and t r ans i ti on models requ i re s estimat i ng several paramet er s
using obser vations
D
=
{y
1:T
}
. As for other methods presented in
previous chapters, the posterior for paramet e rs is
f(|D)=
f(D|)f()
f(D)
.
In this Bayesian formulation for estimating the posterior PDF
for paramet er s, the key component is the marginal likelihood,
f
(
D|
)
f
(
y
1:T
|
). Wit h the hypothesis that observations are
conditionally independent given the states x
t
, the joint marginal
likelihood (i.e., the joint prior probability density of observations We refer to
f
(y
1:T
|
)asthejointmarginal
likelihood because the hidden variables
x
1:T
are marginalized. In the remainder
of this chapter, for the sake of brevity, we
use the term likelihood instead of marginal
likelihood.
given a s et of parameters) is
f(y
1:T
|)=
T
Y
t=1
f(y
t
|y
1:t1
, ),
where the marginal prior probability of each observation is obtained
using the filtering procedure,
f(y
t
|y
1:t1
, )=N(y
t
; Cµ
t|t1
, C
t|t1
C
|
+ R).
The posteri or
f
(
|D
) can be estimated using the sampling tech-
niques
5
presented in chapter 7. However, because evaluating
5
Nguyen, L. H., I. Gaudot, and J.-A.
Goulet (2019). Uncertainty quantification
for model parameters and hidden state
variables in Bayesian dynamic linear
models. Structural Control and Health
Monitoring 26 (3), e2309
f
(y
1:T
|
) can be computationally demanding, most practical appli-
cations found in the literature only employ the maximum likelihood
estimate (MLE) rath er than estimating the full posterior with a
Bayesian approach. For an MLE, the optimal set of parameters is
then defined as
= arg max
log-likelihood
z }| {
ln f (y
1:T
|),
where becau se often
f
(y
t
|
y
1:t1
,
)
1, the product of the
marginals
Q
T
t=1
f
(y
t
|
y
1:t1
,
)
0issensitivetozerounderow
issues. W h en the contrary happens, so that
f
(y
t
|
y
1:t1
,
)
1,
the pr obl e m of numerical overflow arises. In order to mitigat e these
issues, the log-likelihood is employed instead of t he likelihoo d ,
ln f (y
1:T
|)=
T
X
t=1
ln f (y
t
|y
1:t1
, ).
Figure 12.12: The MLE optimal parame-
ters correspond to the location where the
derivative of the log-likelihood equals zero.
As illust r at ed in figure 12.12, the optimal set of parame te r s
corresponds to the location where the derivative of t he log-
likelihood equals ze ro. The values
can be found using a gradient
probabilistic machine learning for civil engineers 193
ascent method such as the Newton-Raphson algorithm presented in
§5.2. Note that because Newton-Raphson is a conve x optimization
method, and because SSMs typically involve a non-convex (see
chapter 5) and non-identifiable (see §6.3.4) likelihood function, the
optimization is parti c ul arl y sensitive to initial values
0
.Itisthus
essential to initiali ze parameters using engineering knowledge and
to try several initial starting locations .
Note that a closed-form expectation maximization method exists
for estimating the parameters of SSMs.
6
However, even if it is mat h -
6
Ghahramani, Z. and G. E. Hinton (1996).
Parameter estimation for linear dynamical
systems. Technical Report CRG-TR-96-2,
University of Toronto, Dept. of Computer
Science
ematically elegant, in the context of civil engineering applications
such as those presented in the following sections, i t is still sensitive
to the selection of the initial values and is thus easily trapped in
local maxima.
12.1.5 Limitations and Practical Considerations
There are some limitations to the Kalman filter formulation. First,
it is sensitive to numerical errors when the covariance is rapidly
reduced i n the measurement step, for example, when accurate
measurements are used or after a period with missing d at a, or
when ther e are orders of magnitude separating the eigenvalues
of the covariance matrix. Two alternative formulations that are
numericall y more robust are the UD filter and squar e -r oot filter.
The reader interested in these methods should consult specialized
textbooks such as the one by Simon
7
or Gibbs.
8
7
Simon, D. (2006). Optimal state estima-
tion: Kalman, H infinity, and nonlinear
approaches. Wiley
8
Gibbs, B. P. (2011). Advanced Kalman
filtering, least-squares and modeling: A
practical handbook. Wiley
A second key limiting aspect is that the formulation of the
Kalman filt e r and smoother are only applicable for linear observa-
tion and transition models. When problems at hand require nonlin-
ear models, there are two main types of alternatives: First, there
are the unscented Kalman filter
9
and extended Kalman filter,
10
9
Wan, E. A. and R. van der Merwe (2000).
The unscented Kalman filter for nonlinear
estimation. In Adaptive Systems for Signal
Processing, Communications, and Control
Symposium,153158.IEEE
10
Ljung, L. (1979). Asymptotic behavior of
the extended Kalman filter as a parameter
estimator for linear systems. IEEE
Transactions on Automatic Control 24 (1),
36–50
which allow for an approximation of the Kalman algorithm while
still describing the hidden states using multivariate Normal random
variables. The second option is to employ sampling method s such
as the particle filter,
11
where the hidden-st at e variables are de-
11
Del Moral, P. (1997). Non-linear filtering:
interacting particle resolution. Comptes
Rendus de l’Acad´emie des Sciences—Series
I—Mathematics 325 (6), 653–658
scribed by particles (i.e., samples) that are propagated through the
transition model. Th e weight of each particle is updated using the
likelihood computed from the observation model. Particle filtering is
suited to esti mat e PDFs that are not well represented by the multi-
variate Normal. The reader interested in these nonlinear-compatible
formulations can consult Murphy’s
12
textbook for further details.
12
Murphy, K. P. (2012). Machine learning:
Aprobabilisticperspective.MITPress
j.-a. goulet 194
12.2 State-Space Models with Regime Switching
The state- sp ace model theory we have covered so far is applicable
for a set of system responses following a s ame regime. Figures
12.13a–b are both examples of behaviors where SSMs are applic able.
Figure 12. 13c presents an example where a standard SSM i s not
applicable because th er e is a switch from a z e ro- spe ed regime to a
constant-spe ed regime.
(a) Local level (b) Local trend
(c) Regime switching
Figure 12.13: Comparison between single-
and multiple-regime models.
Estimating the probabi l i ty of hidden states along with the
probability of each regime involves computational challenges. For
example, if we consider a case with
S
= 2 possible regimes, we
then have 2 possible transitions along with 2 models to describe
each regime. We start f r om a time step
t
, where the system can be
in regime 1 with probability
1
1
and in regime 2 with probability
2
1
=1
1
1
, as illustrated in figure 12.14a. There are 2 regime s
2 possible transition paths, leading to 4 possible combinations to
describe the joint probability at times
t
and
t
+ 1. Goi ng from
t
+1
to
t
+ 2, each of t h e 4 paths at
t
+ 1 branches into 2 new ones at
t
+ 2, leading t o a total of now 8 paths. We see through this example
that the number of paths to c ons i de r increases exponentially (
S
T
)
with th e number of time steps
T
. Even for short time series, this
approach bec ome s computational l y prohibitive. In this section, we
present the generalized pseudo Bayesian algorithm of order two,
13
13
Blom, H. A. P. and Y. Bar-Shalom
(1988). The interacting multiple model
algorithm for systems with Markovian
switching coecients. IEEE Transactions
on Automatic Control 33(8), 780–783
which is behind the switching Kalman filter .
14
The key aspect of
14
Murphy, K. P. (1998). Switching Kalman
filters. Citeseer; and Kim, C.-J. and C. R.
Nelson (1999). State-space models with
regime switching: Classical and Gibbs-
sampling approaches with applications.MIT
Press
this method is to collapse the possible transition paths at each time
step, as illustrated in figure 12.14b.
12.2.1 Switching Kalman Filter
We first explore the switching Kalman filter (SKF) while consid-
ering only
S
= 2 possible regimes. The goal i s to estimate at each
time step
t
the pr obab i l ity of each regime, along with the estimates
for the hidden states corresponding to each of them. The notati on
that will be employed in the SKF is
model i
where the superscri pt on the mean vector and covariance matrix
refers t o the regime number, which is descr i bed by the variable
s 2{
1
,
2
, ··· , S}
.
i
t|t
is the shorthand notation for the probability
of the regime
i
at time
t
, given all the available observations up to
time t,Pr(S
t
= i|y
1:t
).
(a) Regime switching
without collapse
Merge
(b) Regime switch-
ing with merge
(SKF)
Figure 12.14: Comparison of regime
switching with and without merging the
state at each time step. (Adapted from
Nguyen and Goulet (2018).)
probabilistic machine learning for civil engineers 195
Filter with
model
Filter with
model
t-1
model i
Figure 12.15: Switching Kalman filter:
How initial states transition through two
distinct models.
Filtering st ep As presented in figure 12.15, starting from regime 1,
at
t
1, there are two possible paths, each of them corresponding
to one of the regime switches. The two possible paths
j 2{
1
,
2
}
are
explored by estimating the posterior state estimates
{µ
1,j
t|t
,
1,j
t|t
}
,
along with their likelihood
L
1,j
t|t
,usingthefilteringprocedurepre-
sented in §12.1. 2,
(µ
i,j
t|t
,
i,j
t|t
, L
i,j
t|t
)=Filter(µ
i
t1|t1
,
i
t1|t1
, y
t
; A
j
, C
j
, Q
j
, R
j
).
(12.7)
In equati on 12.7, the initial states are obtained from the r egi me
i
,
and the matrices {A
j
, C
j
, Q
j
, R
j
} are those from the model j.
t
Merge
model 1
model 1
model 2
Figure 12.16: The switching Kalman filter
merge step.
Once the procedure presented in figure 12.15 is repeated for
each model, the next step consists in merging t he state estimates
that land ed in a same regime. Figure 12.16 presents an example
where the state estimates from the model starting from regime
1 at
t
1 and transiting through regime 1 are merged with t h e
state est i mat es from the model starting from regime 2 at
t
1
and transi t i ng through regime 1. This merge step is per for me d by
considering that t h e final state estimates
{µ
i
t|t
,
i
t|t
}
, along with its
probability
i
t|t
, descri be a multivariate Normal, itself obtained by
mixing t he state estimates originating from all paths that transited
from t 1 ! t through model j.
3 0.25 2.5 5.25 8
1
=0.4
2
=0.6
x
PDF : f()
N(x
1
, 1, 1
2
) N(x
2
, 4, 1
2
)
f(x
12
)
Figure 12.17: Example of a Gaussian
mixture PDF.
Gaussian mixture We now take a quick detour from the switching
Kalman filt e r in order to review the mathematical details related
to Gaussian mixtures (see §10.1.1) that are behind the merge step.
For example, we have two states
X
1
and
X
2
described by Gaussian
random variables so that
X
1
N(x
1
; µ
1
,
2
1
)
X
2
N(x
2
; µ
2
,
2
2
).
The outcome of the random process, where
1
and
2
=1
1
correspond to the probability of each state, is described by a
Gaussian mixture,
X
12
f(x
12
)=
1
N(x
12
; µ
1
,
2
1
)+
2
N(x
12
; µ
2
,
2
2
).
j.-a. goulet 196
Figure 12.17 pres ents an example of two Gaussian PDFs combining
in a Gaussian mixture. We can approximate a Gaussian mixture by
a single Gaussian
15
f
(
x
12
)
N
(
x
12
;
µ
12
,
2
12
) having a mean and
15
Runnalls, A. R. (2007). Kullback-Leibler
approach to Gaussian mixture reduction.
IEEE Transactions on Aerospace and
Electronic Systems 43 (3), 989–999
variance given by
µ
12
=
1
µ
1
+
2
µ
2
2
12
=
1
2
1
+
2
2
2
+
1
2
(µ
1
µ
2
)
2
.
Figure 12. 18 presents four examples of Gaussian mixtures ap-
proximated by a single Gaussian PDF. Notice how in (b), when the
PDFs of
X
1
and
X
2
have the same parameters, the result coincides
with th e two distributions. Also, by comparing figures ( c ) and (d),
we see th at the greater the distance |µ
1
µ
2
| is, the greater
12
is.
2 0 2 4 6
1
=0.9
2
=0.1
x
PDF : f ()
N
1
(1, 1
2
) N
2
(3, 1
2
) N
12
(1.2, 1.17
2
)
(a)
2 0 2 4 6
1
=0.3
2
=0.7
x
PDF : f()
N
1
(2, 1
2
) N
2
(2, 1
2
) N
12
(2, 1
2
)
(b)
2 1.5 5 8.5 12
1
=0.3
2
=0.7
x
PDF : f()
N
1
(2, 1
2
) N
2
(5, 1
2
) N
12
(4.1, 1.7
2
)
(c)
2 1.5 5 8.5 12
1
=0.9
2
=0.1
x
PDF : f ()
N
1
(2, 0.5
2
) N
2
(8, 0.5
2
) N
12
(2.6, 1.87
2
)
(d)
Figure 12.18: Examples of Gaussian
mixtures.
Merge step Goi n g back to the switching Kalman filter, the m er ge
step is based on the approximation of a G aus si an mixture by a
single Gaussian PDF. The joint posterior probability of the regimes
S
t1
=
i
and
S
t
=
j
is propor t i onal to the product of the likelihood
L
i,j
t|t
, the prior probability
i
t1|t1
, and the probability
Z
i,j
of
transiting from s
t1
= i to s
t
= j,
M
i,j
t1,t|t
=Pr(S
t1
= i, S
t
= j|y
1:t
)
=
L
i,j
t|t
· z
i,j
·
i
t1|t1
P
i
P
j
L
i,j
t|t
· z
i,j
·
i
t1|t1
.
(12.8)
The posteri or probabili ty of the regime
j
at time
t
is obtain ed by
marginalizing the joint posterior in equation 12.8. For a regime
j
,
this oper at ion corresponds to summing the probability of all paths
transiting through model j, irrespective of t h ei r origin,
j
t|t
=
S
X
i=1
M
i,j
t1,t|t
.
The posteri or states for each regime
j
,
µ
j
t|t
,
j
t|t
, are estimated
using
W
i,j
t1|t
=Pr(S
t1
= i|s
t
= j, y
1:t
)=
M
i,j
t1,t|t
j
t|t
µ
j
t|t
=
S
X
i=1
µ
i,j
t|t
W
i,j
t1|t
µ = µ
i,j
t|t
µ
j
t|t
j
t|t
=
S
X
i=1
h
W
i,j
t1|t
(
i,j
t|t
+ µµ
|
)
i
.
The general overview of the switching Kalman filter is presented
in figure 12.19. Note that in addition to the parame t er s defining
probabilistic machine learning for civil engineers 197
each filtering model , the SKF r eq ui r es defin i ng the pr i or probab i l ity
of each regime
j
0
along with the
S S
transition matrix Z defining
the probability of transiting from a state at time
t
1 to another at
time t.
Filter with
model 1
Filter with
model 2
t-1
t
Merge
Filter with
model 1
Filter with
model 2
Merge
model 1
model 2
model 1
model 2
Figure 12.19: General overview of the
switching Kalman filter procedure.
12.2.2 Example: Temperature Data with Regime Switch
17-01 17-05 17-08 17-12
10
11
12
13
Time [YY-MM]
Temperature
y
t
±
V
y
t
Figure 12.20: Example of data set where
there is a switch between a zero-speed
regime and a constant-speed regime.
This examp le explores an application of the switching Kalman
filter t o temperature data subject to a regime switch. Figure 12. 20
presents the data set where there is a switch be tween a zero-speed
regime and a constant-speed regime. Note that with the SKF, the
filtering model associat ed with each regime must have the same
hidden s tat e s. In this example, the two stat e s x
t
=[
x
t
˙x
t
]
|
are
the tempe rat u r e and its rate of change, with i ni t i al states for both
models initialized as
µ
0
=
10
0
,
0
=
0.01
2
0
00.001
2
,
0
=
0.95
0.05
.
Note that the initial states
{µ
0
,
0
,
0
}
can be estimated using the
smoothing al gori t h m. The first regime is described by a local level
where the rate of change is forced to be equal to 0 in the transition
matrix [A]
:,2
= 0,
x
t
˙x
t
z}|{
x
t
=
1 0
0 0
z}|{
Ax
t1
+
WN(0,Q
11/12
)
z}|{
w
t
˙w
t
z}|{
w
t
(Regime 1).
j.-a. goulet 198
The second regime is described by a local trend model where
x
t
˙x
t
z}|{
x
t
=
1t
01
z}|{
Ax
t1
+
WN(0,Q
22/21
)
z}|{
w
t
˙w
t
z}|{
w
t
(Regime 2).
Describing the four possible paths, that is,
{
1
!
1
,
1
!
2
,
2
!
1
,
2
!
2
}
, requires the definition of four covariance matrices for the
transition-model errors,
Q
11
= Q
21
= Q
22
= 0, Q
12
=
00
0
2
12
.
In [Q
12
]
2,2
, the MLE optimal parameter estimated using empirical
data is
12
=2
.
3
10
4
. There is a single observation model for both
regimes defined following
y
t
=
[1, 0]
z}|{
Cx
t
+
V N(0,0.5
2
)
z}|{
v
t
.
The MLE for terms on the main diagonal are
z
ii
=0
.
997 so that
the complete transition matrix is defi ne d as
Z =
0.997 0.003
0.003 0.997
.
Figure 12.21 presents the outcome from the switching Kalman filter:
(a) prese nts the probability of the local trend regime, which marks
the tran si t i on between the zero-speed and constant-speed regimes;
(b) presents the level describing the filtered system responses where
the obser vation errors were removed; and (c) presents the rate of
change of t h e system responses, that is, the trend .
17-01 17-05 17-08 17-12
0
0.5
1
Time [YY-MM]
Pr(S = M
2
)
(a) Probability of the local trend regime
17-01 17-05 17-08 17-12
10
11
12
13
Time [YY-MM]
x
L
[Temperature]
µ
t|t
±
t|t
µ
t|t
(b) Level
17-01 17-05 17-08 17-12
0.00
0.01
0.02
0.03
Time [YY-MM]
x
LT
[Temperature]
(c) Trend
Figure 12.21: Example of hidden-states
estimation using the switching Kalman
filter.
12.3 Linear Model Structures
This sect ion presents how to build linear models from ge ne r ic
components. This approach is referred to as Bayesian dynamic
linear models by Goulet and Nguyen,
16
and more commonly as
16
Nguyen, L. H. and J.-A. Goulet (2018).
Anomaly detection with the switching
Kalman filter for structural health mon-
itoring. Structural Control and Health
Monitoring 25 (4), e2136; and Goulet, J.-A.
(2017). Bayesian dynamic linear models for
structural health monitoring. Structural
Control and Health Monitoring 24 (12),
1545–2263
Bayesian forecasting or dynam ic linear models in the literature
17
.
17
Harrison, P. J. and C. F. Stevens (1976).
Bayesian forecasting. J our n al of the Royal
Statistical Society: Series B (Method-
ological) 38 (3), 205–228; West, M. and
J. Harrison (1999). Bayesian forecasting
and dynamic models.Springer;andWest,
M. (2013). Bayesian dynamic modelling. In
Bayesian theory and applications,145166.
Oxford University Press
Note that the term linear does not mean that t h ese models can
only be employed to describe linear relationships with respect to
time; i t refers to the linearity of the observati on and transition
equations.
Five main components are reviewed here: local level, local trend,
local accele rat i on , periodic, and autoregressive. As illustrated
in figure 12.22, these components can be seen as the building
probabilistic machine learning for civil engineers 199
Figure 12.22: Generic components that
can be seen as the building blocks, which,
once assembled, allow modeling complex
empirical responses. From left to right:
local level, local trend, local acceleration,
periodic, and autoregressive.
blocks that, once assembled, allow modeling complex time series.
Each component has its own specific mathematical formulation
and is intended to model a specific behavior. We will first see the
mathematical formulation associated wit h each component; then
we present how to assemble components to form
{
A
,
C
,
Q
,
R
}
matrices.
12.3.1 Generic Components
Local level (x
LL
)
Local level A local level describes a quantity that is locally con-
stant over a time st ep . It is described by a single hidden variable
x
LL
. Because of its locally constant nature, its transition model
predicts that the hidden state at
t
is equal to the one at
t
1, so the
transition coecient is 1. The standard de vi at i on describing the pre-
diction error for the locally constant model is
LL
W
. Local levels are
part of the hidden variables that contribute to obse rvations, so the
coecient in the observation matrix equals 1. The hidden-state vari-
ables as well at the formulation for the t r an si t ion and observation
model matrices are
x
LL
= x
LL
, A
LL
=1, C
LL
=1, Q
LL
=(
LL
)
2
.
Figure 12. 23 presents realizations of a stochastic process defined by
the tran si t i on model of a local level. We can see that the process
variability inc r ease s as the standard deviation
LL
increases.
0 5 10 15 20 25
15
0
15
Time - t
x
LL
t
(a)
LL
=0.01
0 5 10 15 20 25
15
0
15
Time - t
x
LL
t
(b)
LL
=0.1
0 5 10 15 20 25
15
0
15
Time - t
x
LL
t
(c)
LL
=1
Figure 12.23: Examples of local level
components. The initial state
x
LL
0
=0isthe
same for all realizations.
Level (x
L
)
Local trend (x
LT
)
Local trend A local trend describes a quantity that has a locally
constant rate of change over a time step. It acts jointly with a
hidden variable describing the level. It is thus defined by two hid-
den variables [
x
L
x
LT
]
|
. Because of its locally constant nature, its
transition model pre di c t s that the hidden state at
t
is equal to
the curr ent one,
x
LT
t
=
x
LT
t1
, so the transition coecient is 1. The
j.-a. goulet 200
hidden s tat e
x
L
t
=
x
L
t1
+
x
LT
t1
t
is described by the current level
value plus the change brought by the local trend, multiplied by the
time ste p length
t
. As we have seen in §12.1.2, the definition of
the proces s-n oi se covariance matrix Q
LT
requires considering the
uncertainty t hat is caused by a constant acceleration with a random
magnitude that can occur over the span of a time step. Analytically,
it corresponds to
Q
LT
=(
LT
)
2
"
t
4
4
t
3
2
t
3
2
t
2
#
,
where
LT
is the process noise parameter. In most cases involv-
ing a local trend, only the hidden vari abl e representing the level
contribut es to observations, so the coecient in the observation
matrix f or
x
LT
equals 0. The hidden-state estimation as well as the
formulation for the transition and observation model matrices are
x
LT
=
x
L
x
LT
, A
LT
=
1t
01
, C
LT
=
1
0
|
.
Figure 12. 24 presents realizations of a stochastic process defined by
the tran si t i on model of a local trend. We can see that the hidden-
state variable
x
LT
is locally constant and that
x
L
has a locally con-
stant speed. A local trend is employed to model a system response
that changes over time with a locally constant speed.
0 5 10 15 20 25
30
0
Time - t
x
L
t
0 5 10 15 20 25
3
1
1
Time - t
x
LT
t
(a)
LT
=0.001
0 5 10 15 20 25
30
0
Time - t
x
L
t
0 5 10 15 20 25
3
1
1
Time - t
x
LT
t
(b)
LT
=0.01
0 5 10 15 20 25
30
0
Time - t
x
L
t
0 5 10 15 20 25
3
1
1
Time - t
x
LT
t
(c)
LT
=0.1
Figure 12.24: Examples of local trend
components. The initial state [
x
L
0
x
LT
0
]=
[0 1]
|
is the same for all realizations.
Level (x
L
)
Trend (x
T
)
Local acceleration (x
LA
)
Local acceleration A local acceleration describes a quantity that
has a locally constant acceleration over a time step. It acts jointly
with a level hidden variable and a trend hidden variable. It is thus
defined by three hidden-state variables [
x
L
x
T
x
LA
]
|
. Because of
its locall y constant nature, its transition model predicts that the
hidden s tat e at
t
is equal to the current one,
x
LA
t
=
x
LA
t1
,sothe
transition coecient is 1. The hidden state
x
T
t
=
x
T
t1
+
x
LA
t1
t
is
probabilistic machine learning for civil engineers 201
described by the current value plus the change brought by the local
acceleration multipl i ed by the time step length
t
.Thehidden
state
x
L
t
=
x
L
t1
+
x
T
t1
t
+
x
LA
t1
t
2
2
. The hidden-state variables
as well as the formulation for the transition and obser vation model
matrices are
x
LA
=
2
4
x
L
x
T
x
LA
3
5
, A
LA
=
2
4
1t
t
2
2
01 t
00 1
3
5
, C
LA
=
2
4
1
0
0
3
5
|
,
Q
LA
=(
LA
)
2
2
6
4
t
4
4
t
3
2
t
2
2
t
3
2
t
2
t
t
2
2
t 1
3
7
5
.
Figure 12. 25 presents realizations of a stochastic process defined by
the transition model of a local acceleration. We can see that the
x
LA
hidden variable is locally constant,
x
T
has a locally constant speed,
and
x
L
has a locally constant acceleration. A local acceleration is
employed to mode l a system response that changes over time with a
locally constant acceleration.
0 5 10 15 20 25
30
20
10
0
Time - t
x
L
t
0 5 10 15 20 25
3
1
1
Time - t
x
T
t
0 5 10 15 20 25
0.2
0.1
0
0.1
Time - t
x
LA
t
(a)
LA
=0.0001
0 5 10 15 20 25
30
20
10
0
Time - t
x
L
t
0 5 10 15 20 25
3
1
1
Time - t
x
T
t
0 5 10 15 20 25
0.2
0.1
0
0.1
Time - t
x
LA
t
(b)
LA
=0.001
0 5 10 15 20 25
30
20
10
0
Time - t
x
L
t
0 5 10 15 20 25
3
1
1
Time - t
x
T
t
0 5 10 15 20 25
0.2
0.1
0
0.1
Time - t
x
LA
t
(c)
LA
=0.01
Figure 12.25: Examples of local accel-
eration components. The initial state
[
x
L
0
x
T
0
x
LA
0
]=[0
0
.
5
0
.
05]
|
is the same
for all realizations.
Periodic (x
S1
,x
S2
)
Periodi c Periodic components are expr ess ed in their Four-
rier form wh er e they can be described by two hidden variables
x
S
=[
x
S1
x
S2
]
|
. Here, only the first hidden variable contribute s
to observations, so the coecients in the ob se r vation matrix are
1 for
x
S1
and 0 for
x
S2
. The hidden-state variables as well as th e
formulation for the transition and observation model matrices are
x
S
=
x
S1
x
S2
, A
S
=
cos ! sin !
sin ! cos !
, C
S
=
1
0
|
,
j.-a. goulet 202
Q
S
=(
S
)
2
10
01
,
where the frequency
!
=
2·t
p
is defined as a function of the period
p
and the t im e step lengt h . Figur e 12.26 presents realiz at ion s of the
hidden variables describing the periodic process. A periodic compo-
nent is e mp l oyed to model s i ne -l i ke periodic phenomena with known
period
p
. For example, in the context of civil engineering, it can
be employed to model the eect of daily and seasonal temperature
changes.
0 5 10 15 20 25
3
0
3
Time - t
x
S1
t
0 5 10 15 20 25
3
0
3
Time - t
x
S2
t
(a) p =10,
S
=0.01
0 5 10 15 20 25
3
0
3
Time - t
x
S1
t
0 5 10 15 20 25
3
0
3
Time - t
x
S2
t
(b) p =10,
S
=0.001
0 5 10 15 20 25
3
0
3
Time - t
x
S1
t
0 5 10 15 20 25
3
0
3
Time - t
x
S2
t
(c) p =10,
S
=0.0001
Figure 12.26: Examples of periodic compo-
nents. The initial state [
x
S1
0
x
S2
0
]=[01]
|
and the period
p
=10arethesameforall
realizations.
Autoregressive (x
AR
)
Autoregressive The aut or egr es si ve (AR) component is described
by a single hidden variable
x
AR
that pre sents a dependence on itself
over consecutive time steps. Here, we only present the autoregres-
sive process of ord er 1, where the current state only depen ds on the
previous one,
x
AR
t
=
AR
x
AR
t1
+ w
AR
,w
AR
: W
AR
N(w
AR
;0, (
AR
)
2
).
AR
is the autoregression coecient defining the transition matrix.
The autoregr es si ve hidden-state variable contributes to the obser-
vation, so the coecient in the observation matrix equals 1. The
state variable as well as the for mulation for the transition and
observation mod el matrices are
x
AR
= x
AR
, A
AR
=
AR
, C
AR
=1, Q
AR
=(
AR
)
2
.
Figure 12. 27 presents examples of realizations of an autoregressive
process for dierent parameters. Like for other components, the
variability increases with the process-error standard deviation
AR
.
A key property of the autoregressive process is that for 0
<
AR
<
1,
it leads to a stationary process for wh i ch the standard deviation is
(
AR,0
)
2
=
(
AR
)
2
1 (
AR
)
2
.
probabilistic machine learning for civil engineers 203
0 5 10 15 20 25
5
0
5
Time - t
x
AR
t
(a)
AR
=0.9,
AR
=0.25
0 5 10 15 20 25
5
0
5
Time - t
x
AR
t
(b)
AR
=0.99,
AR
=0.25
0 5 10 15 20 25
5
0
5
Time - t
x
AR
t
(c)
AR
=1.1,
AR
=0.25
0 5 10 15 20 25
5
0
5
Time - t
x
AR
t
(d)
AR
=0.9,
AR
=0.5
0 5 10 15 20 25
5
0
5
Time - t
x
AR
t
(e)
AR
=0.99,
AR
=0.5
0 5 10 15 20 25
5
0
5
Time - t
x
AR
t
(f)
AR
=1.1,
AR
=0.5
Figure 12.27: Examples of autoregressive
components. The shaded region corre-
sponds to the
±
AR,0
confidence region for
the process’s stationary parameters. Note
that cases (c) and (f) are nonstationary
because
AR
>
1. The initial state
x
AR
0
=0
is the same for all realizations.
The confidence region corresponding to
±
AR,0
is represented by the
shaded area. Notice that for
AR
1 the process is nonstationary,
which explains why examples (c) and (f) do not disp l ay the
±
AR,0
region. Th e special case where
AR
= 1 corresponds to a random
walk, which is equivalent to the l ocal level component because
x
t
= x
t1
+ w
t
.
The autoregr es si ve (AR) component plays a key role in the
model. It is there to describe the model errors that have inherent
dependencies over time steps; the definition of a transition model
involves some approximation errors, and because the tran si t i on
model is employed to make predictions over successive time steps,
the pred i ct i on errors are correlated across time. Moreover, note
that thi s correlation tends to one as the time-step length tends to
zero. If we do not employ an AR process to describe these time-
dependent model errors, our ability to correctly estimate other
hidden-state variables can be diminished.
12.3.2 Component Assembly
A time series of observations
y
1:T
is modeled by selecting the rel-
evant components and then assembling them to form the global
matrices
{
A
,
C
,
Q
,
R
}
. The global hidden-state variable vector x
is obtain ed by concatenating in a column vector the st at e variables
defining e ach selected component. The observation matrix C is
obtained by concatenatin g in a row vector the s pe ci fic C vectors
defining e ach selected component. The transition matrix A and
the proces s noise covariance Q are ob t ain ed by forming block di-
agonal matrices (
blkdiag
(
·, ·
)). A and Q each employs the speci fic
j.-a. goulet 204
terms defining the selected components. For a single time series, the
observation err or covariance matrix R contains only one term,
2
V
.
Example: Cli m ate change This example s t ud i es daily average tem-
perature data recorded at Montreal’s YUL airport for the period
1953–2013, as presented in figure 12.28. The goal is to quantify the
current yearly rate of change in the average temperature. Because
53-01 68-04 83-07 98-10 13-12
40
20
0
20
40
Time [YY-MM]
y
t
[
C]
Figure 12.28: Montreal YUL airport daily
average temperature observations for the
period 1953–2013.
data is recorded daily, the time-step length is
t
= 1 day. Figure
12.28 shows a yearly seasonal pattern with frequency
!
=
2
365.2422
. Note: The length of a solar year is
365.2422 days.
In additi on to the periodic component, this model requires a local
trend t o quantify the rate of change and an autoregressive one
to captur e the time-correlated model prediction errors. For these
components, the hidden-state variables are
x
t
=[x
L
|{z}
x
LT
|{z}
x
S1
x
S2
| {z }
x
AR
|{z}
]
|
.
The specifi c formulation for each component is described below:
Local trend component - LT
A
LT
=
11
01
Q
LT
=(
LT
)
2
·
1/41/2
1/21
C
LT
=
10
Periodic component - S
A
S
=
cos ! sin !
sin ! cos !
,!=
2
365.2422
Q
S
= 0 (2 2 matrix)
C
S
=
10
Autoregressive component - AR
A
AR
=[
AR
]
Q
AR
=[(
AR
)
2
]
C
AR
= [1]
probabilistic machine learning for civil engineers 205
The global matrices describing the model are obtained by assem-
bling the components so that
A = blkdiag(A
LT
, A
S
, A
AR
)
Q = blkdiag(Q
LT
, Q
S
, Q
AR
)
C =[C
LT
C
S
C
AR
]
R =
2
V
.
The prior knowledge defining the initial st at e s is defined as
E[X
0
] = [5 0 0 0 0]
|
cov[X
0
] = diag([100 10
6
100 100 100]).
The unknown parameters and their MLE optimal values lear ne d
from data using the Newton-Raphson algorithm ar e
=[
AR
,
LT
,
AR
,
V
]
|
=[0.69 2.2 10
6
3.42.1 10
4
]
|
.
53-01 68-04 83-07 98-10 13-12
6
7
8
Time [YY-MM]
x
L
[
C]
µ
t|T
±
t|T
µ
t|T
(a) Level component
53-01 68-04 83-07 98-10 13-12
2
0
2
·10
4
Time [YY-MM]
x
LT
[
C/day]
(b) Local trend component
53-01 68-04 83-07 98-10 13-12
10
0
10
20
Time [YY-MM]
x
S1
[
C]
(c) Periodic (amplitude)
53-01 68-04 83-07 98-10 13-12
10
5
0
5
10
Time [YY-MM]
x
AR
[
C]
(d) Autoregressive component
Figure 12.29: Hidden-state estimates for
the temperature time series.
Figure 12. 29 presents the smoothed estimates for hidden-state
variables x
t
. Notice in figure 12.29a how the long-term average
temperature drops fr om the 1950s to the 1970s and then increases
steadily. Figure 12.29b p r es ents t he rate of change in
C/day. When
reported on an annual basis, the annual rate of change in 2013 is
estimated to be
µ
T|T
=0
.
06
C/year with a standard deviation
T|T
=0.05
C/year.
12.3.3 Modeling Dependencies Between Observations
State-space models can be employed to model the depend en ci es
between several observations. Take, for example, the seasonal
variations in the air temperature aecting the displacement of a
structure. If we observe both quantities, that is, the temperature
and the displacement, we can perform their j oi nt estimation.
In the general case where there are
D
observed time series, we
have to select relevant components for each time series and then
assemble them together. The global state vector x
t
for all time
series and all compone nts is obtained by concatenating in a column
vector the state vectors x
j
t
, 8j 2{
1
,
2
, ··· , D}
from each time series.
The global observation (C), transition (A), process noise covariance
(Q), an d observation covariance (R) matrices are all built in the
same way by concatenating block diagonal matrices from each time
series,
C = blkdiag(C
1
, C
2
, ··· , C
D
)
A = blkdiag(A
1
, A
2
, ··· , A
D
)
Q = blkdiag(Q
1
, Q
2
, ··· , Q
D
)
R = blkdiag(R
1
, R
2
, ··· , R
D
).
j.-a. goulet 206
The global matrix C requires special attention because it contains
the infor mat i on regarding the time-series dependencies. In the C
matrix, each row corresponds to a time series. The com pon ent-wise
representation of the observation matrix is
C =
2
6
6
6
6
6
6
6
6
4
C
1
C
c
1,2
··· C
c
1,j
··· C
c
1,D
C
c
2,1
C
2
··· C
c
2,j
··· C
c
2,D
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
C
c
i,1
C
c
i,2
··· C
c
i,j
··· C
c
i,D
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
C
c
D,1
C
c
D,2
··· C
c
D,j
··· C
D
3
7
7
7
7
7
7
7
7
5
,
where the o-diagonal terms C
c
i,j
are row vectors with the same
number of eleme nts as th e m at ri x C
j
.Whenthereisnodependence
between observations, the o-diagonal matrices C
c
i,j
contain only
zeros. Wh en dependence exists between observations, it can be
encoded in a dependence matrix such as
D =
2
6
6
6
6
6
6
6
6
4
1 d
1,2
··· d
1,j
··· d
1,D
d
2,1
1 ··· d
2,j
··· d
2,D
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
i,1
d
i,2
··· 1 ··· d
i,D
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
D,1
d
D,2
··· d
D,j
··· 1
3
7
7
7
7
7
7
7
7
5
,
where
d
i,j
2{
0
,
1
}
is employed to indicate whether (1) dependence
does exist or (0) does not. When
d
i,j
= 0, C
c
i,j
=[0]; when
dependence exists an d
d
i,j
= 1, the o-diagonal matrix C
c
i,j
=
h
i|j
1
i|j
2
···
i|j
k
j
i
2 R
k
j
contains
k
j
regression coecients
i|j
k
describing the line ar dependence between the
k
th
component from
the
j
th
time series and t h e
i
th
time series. All regression coecients
i|j
are treated as unknown parameters to be lear ne d from data.
13-08 14-04 15-01 15-10
25.5
26.0
26.5
Time [YY-MM]
Displacement y
D
t
[mm]
(a) Displacement [mm]
13-08 14-04 15-01 15-10
30
15
0
14
30
Time [YY-MM]
Temperature y
T
t
[
C]
(b) Temperature [
C]
Figure 12.30: Observed time series for the
displacement and temperature of a bridge.
Example: Bridge dis pl acement versus temperature We consider the
joint estimation of the hidden-state variables for two time series:
the disp l acem ent and temperature measured on a bridge from 2013
until 2015. Both time series are presented in figur e 12.30, where we
can see a negative correlation between the obs er ved displacement
and temperature seasonal cycles.
The temperature is modeled using a local l e vel, two periodic
components with respect ive periods of 365.2422 and 1 day, and an
autoregressive component,
x
T
t
=[x
T,LL
t
|{z}
x
T1,S1
t
x
T1,S2
t
| {z }
p=365.24
x
T2,S1
t
x
T2,S2
t
| {z }
p=1
x
T,AR
t
|{z}
]
|
.
probabilistic machine learning for civil engineers 207
All matri ce s defining the transition and the observation models
are obtain ed by assembling the specific mathematical formulation
associated with each component,
A
T
= block diag
1,
h
cos !
T1
sin !
T1
sin !
T1
cos !
T1
i
,
h
cos !
T2
sin !
T2
sin !
T2
cos !
T2
i
,
T,AR
Q
T
= block diag
0, 0, 0, 0, 0, (
T,AR
)
2
C
T
=[110101]
R
T
=(
T
V
)
2
,
where !
T1
=
2
365.2422
and !
T2
=
2
1
.
The displacement is modeled using a local level and an autore-
gressive component,
x
D
t
=[x
D,LL
t
|{z}
x
D,AR
t
|{z}
]
|
.
Matrices defining the transiti on and the observation models are
again obtained by assembling the specific mathematical formulation
associated with each component,
A
D
= block diag (1,
D,AR
)
Q
D
= block diag
0, (
D,AR
)
2
C
D
=[11]
R
D
=(
D
V
)
2
.
The global model matrices for the joint data set y
1:T
=
y
D
1:T
y
T
1:T
are assembled following
x =
x
D
x
T
A = block diag ( A
D
, A
T
)
C = block diag ( C
D
, C
T
)
R = block diag ( R
D
, R
T
)
Q = block diag ( Q
D
, Q
T
) .
The depend en ce matrix is defined as D =
11
01
, which indicates
that the displaceme nt observations depend on the hidden states
defined for the temperature. The assembled global observation
matrix is thus
C =
C
D
C
c
D|T
0C
T
=
1 1 0
D|T
1
0
D|T
2
0
D|T
2
001 1 0 1 0 1
,
where t he first row corresponds to the displacement and the second
to the temperature. Note that there are two regressi on coecients
describing the depe nd en ce between the displacement observations
j.-a. goulet 208
and the temperature hidden-stat e variables:
D|T
1
, the long-term
seasonal eects described by the first periodic component, and
D|T
2
,
the short -t e rm ones described by the second periodic component
and the autoregressive term. MLE optimal model parameter values
estimated from data are
=[
D|T
1
D|T
2
D,AR
T,AR
D,AR
T,AR
D
V
T
V
]
|
=[0.017 0.01 0.982 0.997 0.024 0.28 10
5
10
4
]
|
.
The resul t in g smoothed hidden-state estimations are presented in
figure 12.31. The periodic behavior displayed in figure 12.30a is
not prese nt in any of the displacement components presented in
figures 12.31a–b . This is because the periodic behavior displayed
in the displacement is modeled through the dependence over the
temperature’s hidden states.
13-08 14-04 15-01 15-10
25
26
27
Time [YY-MM]
x
L,D
[mm]
µ
t|T
±
t|T
µ
t|T
(a) Level: Displacement
13-08 14-04 15-01 15-10
0.3
0.0
0.3
Time [YY-MM]
x
AR,D
[mm]
(b) Autoregressive: Displacement
13-08 14-04 15-01 15- 10
5
6
7
8
Time [YY-MM]
x
L,T
[
C]
(c) Level: Temperature
13-08 14-04 15-01 15-10
10
0
10
20
Time [YY-MM]
x
S1,T
[
C]
(d) Per. (p =365.24 days): Temp.
13-08 14-04 15-01 15-10
1
0
1
2
Time [YY-MM]
x
S1,T
[
C]
(e) Per. (p =1day):Temp.
13-08 14-04 15-01 15-10
5
0
5
10
Time [YY-MM]
x
AR,T
[
C]
(f) Autoregressive: Temperature
Figure 12.31: Hidden-state estimates for
the displacements and temperatures.
12.4 Anomaly Detection
It is possible to detect anomalies in time series using th e regime-
switching method presented in §12.2. Th e idea is to create two
models representing, respectively, a normal and an abnormal regime.
The key strength of this approach is to c ons id er not only the like-
lihood of each model at each time step but also the information
about the prior probability of each regime as well as the probability
of switching from one regime to another. In the context of anomaly
detection, it all ows reducing the number of false alarm s while in-
creasing the de t ec ti on capacity. In a general context, the number of
regimes is not limited, so several models can be tested against each
other.
Example: Anomaly detection Thi s example illustrates the concept
of anomal y detecti on for observations made on a dam displacement.
Figure 12. 32 presents observations showing a downward trend as
well as a yearly periodic pattern. Because no temper at u re observa-
tions are available for a joint estimation, this case is thus treated as
a single time series. The components employed are a local accelera-
tion, two periodic components with respective periods
p
= 365
.
2422
and
p
= 168
.
6200 days, and one autoregressive component. In
this case, we want to employ two dierent models, where one is
constrained to have a constant speed and the other a constant ac-
celeration. The vector of hidden-state variables for both regimes
is
x
t
=[x
L
t
|{z}
x
T
t
|{z}
x
LA
t
|{z}
x
T1,S1
t
x
T1,S2
t
| {z }
p=365.24
x
T2,S1
t
x
T2,S2
t
| {z }
p=168.62
x
AR
t
|{z}
]
|
.
probabilistic machine learning for civil engineers 209
02-12 06-03 09-07 12-10 16-02
25.28
13.25
2.78
Time [YY-MM]
Displ, y
t
[mm]
Figure 12.32: Example of a dam displace-
ment time series that contains an anomaly
that requires using the switching Kalman
filter.
Global model matrices A
(j)
,
C
(j)
,
Q
(j)
,
R
(j)
for regimes
j 2{
1
,
2
}
are assembled following the procedure described in §12.3.2. The
only thi n gs dierentiating the two models are the transi t i on ma-
trices an d the process noise covariance for the local acceleration
component,
A
(1)
=
2
4
1t 0
010
000
3
5
, A
(2)
=
2
4
1t
t
2
2
01 t
00 1
3
5
.
The model associated with the first regime has the particularity
that the terms in the transition matrix that corresponds to the
acceleration are set equal to zero. The covariance matrice s for the
transition errors from regimes 1 ! 2 and 2 ! 1 are respectively
02-12 06-03 09-07 12-10 16-02
0
1
Time [YY-MM]
P r(S)
Normal
Abnormal
!
Figure 12.33: Probability of regimes 1 and
2obtainedusingtheSKFalgorithm.
Q
12
=
2
4
(
LA
)
2
·
t
2
20
00
0(
LTT
)
2
·
t
3
3
0
00(
LA
)
2
· t
3
5
Q
21
=
2
4
(
LT
)
2
·
t
3
3
00
0(
LTT
)
2
· t 0
000
3
5
.
The transition matrix is parameterized by z
11
and z
22
as defined in
Z =
z
11
1 z
11
1 z
22
z
22
.
Figure 12.33 presents the probability of each regime as a function
of time; and figure 12.34 presents the hidden-state estimates for
the level, trend, and acceleration. This example displays a switch
between the normal and the abnormal regimes during a short time
span. Thes e results illustrate how anomalies in time series can be
detected using regi me switching. They also illustrate how the base-
line res pon se of a system can be isolated from t h e raw observations
contaminated by observation errors and external eects caused by
environmental factors. The model allows estimating that before the
regime swi tch, the rate of change was -2 mm/year, and it increased
in magnitude to -2.9mm/year after the switch.
j.-a. goulet 210
02-12 06-03 09-07 12-10
16-02
26.33
13.29
0.74
Time [YY-MM]
x
L
[mm]
(a) Level, x
L
t
02-12 06-03 09-07 12-10 16-02
120
3.71
100
·10
3
Time [YY-MM]
x
T
[mm/12-hr]
-2.0 mm/year
-2.9 mm/year
!
(b) Trend, x
T
t
02-12 06-03 09-07 12-10 16-02
480
0
200
·10
5
Time [YY-MM]
x
LA
[mm/144-hr
2
]
-0.0054 mm/year
2
-0.012 mm/year
2
!
(c) Local acceleration, x
LA
t
Figure 12.34: Expected values
µ
t|t
and
uncertainty bound
µ
t|t
±
t|t
for hidden
states resulting from a combination of
regimes 1 and 2 using the SKF algorithm.
All the details related to this example can be found in the study
performed by Nguyen and Goulet.
18
18
Nguyen, L. H. and J.-A. Goulet (2018).
Anomaly detection with the switching
Kalman filter for structural health mon-
itoring. Structural Control and Health
Monitoring 25 (4), e2136
The package OpenBDLM
19
allows using the state-space model’s
19
Gaudot, I., L. H. Nguyen, S. Khazaeli,
and J.-A. Goulet (2019, May). OpenBDLM,
an open-source software for structural
health monitoring using bayesian dynamic
linear models. In 13th Proceedings from
the 13th International Conference on
Applications of Statistics and Probability
in Civil Engineering (ICASP)
theory an d includes the formulation for the various components
defined for Bayesian dynamic linear models.