13
Model Calibration
Figure 13.1: Example of hard-coded finite-
element model for which we want to infer
the parameter values
=[
1
2
]
|
from sets
of observations (x
i
,y
i
).
In civil engineering, model calibration is employed for the task of
estimating the parameters of hard-coded physics- based models using
empirical observations
D
=
{
(
x
i
,y
i
)
, 8i 2{
1:
D}}
=
{D
x
, D
y
}
.
The term hard-coded physics-based refers to mathematical models
made of constitutive laws and rules that are themselves based on
physics or domain-specific knowledge. In contrast, models presented
in chapters 8, 9, and 12 are referred to as empirical models because
they are built on empirical data with little or no hard-coded rules
associated with the underlying physics behin d the phenomena stud-
ied. The hard-coded model of a system is described by a function
g
(
, x
), which d epends on the covariat e s
x
=[
x
1
x
2
··· x
X
]
|
2 R
X
as well as its model paramete rs
=[
1
2
···
P
]
|
.Theobserved
system responses
D
y
=
{y
1
,y
2
, ··· ,y
D
}
depend on covari at es
D
x
=
{x
1
, x
2
, ··· , x
D
}
. Figure 13.1 presents an example of a model
where the parameters
describe the boundary conditions and the
initial tension in cables. For this example, the covariates
x
and
x
describe the locations where predictions are made, as well as the
type of predicti ons , t hat is, displacement, rotation, strain, and so
on. The vector
x
describes covariates for observed locations, and
x
those for the unobser ved locations.
Nomenclature
Observed system responses
D
y
= {y
1
,y
2
, ··· ,y
D
}
Covariates
D
x
= {x
1
, x
2
, ··· , x
D
}
Parameter values
=[
1
2
···
P
]
|
Model predictions
g(, x)=[g(, x
1
) ··· g(, x
D
)]
|
Model calibration is not in itself a subfield of machine learning.
Nevertheless, i n th is chapter, we explore how the machine learning
methods presented in previous chapters can be employed to address
the challenges assoc i ate d wi t h the pr ob abi l i st i c in fe r en ce of mod el
parameters for hard-coded models. This application is classified
under the umbrella of unsupervised learning because, as we will
see, the main task consists in inferring hidden-state variables and
parameters that are t h ems el ves not observed.
Problem setups There are several typic al set u p s wher e we want
to employ em pi r i cal ob ser vations in conjunction wit h har d -coded
j.-a. goulet 214
models. For the example presented in figure 13.1, if we have a prior
knowledge for the joint distribution of model parameters
f
(
), we
can propagate this prior knowledge through the model in order
to quantify the unce r tai nty associated with predi c ti on s. T hi s un -
certainty propagati on can be, for instance, performed using either
Monte Carlo sampling (see
§
6.5) or first-order linearization (see
§
3.4.2). When the uncertainty in the prior knowledge
f
(
) is weakly
informative, it may le ad t o a large variability in model predi ct i ons .
In that situation, it becomes interesting to employ empirical obser-
vations to reduce the unce r tai nty related to the prior knowledge of
model parameters. The key with model calibration is that model
parameters are typically not directly observable. For instance, in
figure 13.1 it is often not possible to directly measure the bound-
ary condition properties or to measure the cable internal tension.
These properties have t o be i n fe rr ed f r om the obse rved structural
responses
D
y
. Another key aspect is that we typically build physics-
based models
g
(
, x
) because they can predict quantities that
cannot be observed. For example, we may want to learn about
model parameters
using observations of the st at i c re sponse of a
structure defined by the covariates
x
and then employ the model
to predict the unobserved responses
g
(
, x
), defined by other co-
variates
x
. The vector
x
may describe unobs er ved lo cat i on s or
quantities such as the d yn ami c behavior instead of the stat i c one
employed for parameter estimation. O n e last possible problem setup
is associated with the selection of model classes for describing a
phenomenon.
There are three main challenges associated with model calibra-
tion: observation errors, prediction errors, and model complexity.
The first challenge aris es because the observations available
D
y
are in most cases contaminated by observation errors. The second
is due to the discrepancy between the prediction of hard-coded
physics-based models
g
(
, x
) and the real system responses; that
is, even w he n par amet e r values are known, the model remains an
approximation of the re ali ty. The third challenge is related to the
diculties associated with choosing a model structure having the
right c om ple x i ty for the task at hand. In
§
13.1, we explore the
impact of these challenges on the least-squares model calibration
approach, which is stil l ex t en si vely used in practice. Then, the
subsequent sections explore how to address some of these chal-
lenges using a hierarchical Bayesian approach combining concepts
presented in chapters 6 and 8.
probabilistic machine learning for civil engineers 215
13.1 Least-Squares Model Calibration
!
What we will see in
§
13.1 is an example of
what not to do when calibrating a hard-
coded physics-based model as well as why
one should avoid it.
This section presents the pitfalls and limitations of the common
least-squares deterministic model calibration. With least-squares
model calibration, the goal is to find the model parameters
that minimize the sum of the square of the dierences between
predicted and measured values. The dierence between predicted
and observed val ue s at one obser ved location
x
i
is defined as the
residual
r(, (x
i
,y
i
)) = y
i
g(, x
i
).
For the entire data set containing
D
data points, the least-squares
loss function is de fin ed as
J(, D)=
D
X
i=1
r(, (x
i
,y
i
))
2
= kr(, D)k
2
.
Because of the square exponent in the loss function, its value is
positive J(, D) 0. The task of identifying the optimal parameter s
is formulated as a minimization problem where the goal is to
minimize the sum of the square of the residual at each observed
location,
= arg min
J(, D)
arg max
J(, D)
| {z }
Optimization problem
.
Gradient-based optimization methods such as those presented in
chapter 5 can be employed to i d entify
.
8
>
>
>
>
<
>
>
>
>
:
K =1.75 10
11
N·mm/rad
L =10m
P =5kN
I =6.66 10
9
mm
4
ˇ
E =35GPa
Figure 13.2: The reference model
ˇg
(
ˇ
E,x
)
that is employed to generate simulated
observations.
13.1.1 Illustrative Examples
This section presents three examples based on simulated data
using the reference beam model
ˇg
(
ˇ
E,x
) presented in figure 13.2
along with the values for its par amet e rs . O bs er vations consist in
displacement measurements made for positions
x
1
=
L/
2 and
x
2
=
L
. The first example corresponds to the idealized context where
there is no observation or pred i ct i on er ror s because the model
employed to infer paramet er values is the same as the reference
model employed to generat e sy nthetic data. The second example
employs a si mp li fi ed model to infer parameters. Finally, the third
example employs an overcomplex model to infer parameters. Note
that the last two examples i nc l ude obse r vation errors. For all three
cases, there is a single unknown parameter: the elastic modulus
E
,
which characterizes the flexural stiness of the beam.
j.-a. goulet 216
Example #1: Exact model without observation errors The first
example illustrates an idealized context where there are no obser-
vation or prediction er r ors because the model employed t o inf er
parameter values
g
(
E,x
) is the same as the one employed to gener-
ate synthetic data
ˇg
(
ˇ
E,x
). Figure 13.3 compares the predicted and
true deflection curves along with observed values. He re , because of
Example #1
Reference model
ˇg(
ˇ
E,x)
z }| {
Interpretation model
g(E,x)
z }| {
the absence of model or measurement er ror s, the optimal parameter
also coincides with its true value
=
E
=
ˇ
E
, and the predicted
deflection obtained using the optimal parameter value
coincides
with the true deflec t ion . Note that the los s function evaluated for
equals zero. Even if least-squares model calibrati on worked perfectly
in this example, this good performance cannot be generalized to
real case studies because in this oversimplified example, there are
no model or measureme nt errors.
0 1 2 3 4 5 6 7 8 9 10
12
10
8
6
4
2
0
Position [m]
Deflection [mm]
Predicted response
True deflection
Observations
0 10 20 30 40
0
5
10
15
20
Parameter value [GPa]
Loss function, J(, D)
E
ˇ
E
Figure 13.3: Least-squares model cali-
bration where the interpretation model
is identical to the reference model,
ˇg
(
ˇ
E,x
)=
g
(
ˇ
E,x
), and where there are
no measurement errors.
Example #2
Reference model
ˇg(
ˇ
E,x)
z }| {
Interpretation model
g(E,x)
z }| {
Example #2: S im pl ifi ed model with observation errors The second
example presents a more realistic case, where the model
g
(
E,x
)
employed to infer paramet er s contains a simplification in compari-
son with the real system described by the reference model
ˇg
(
ˇ
E,x
).
In this case, the simplification consists in omitting the rotational
spring that is responsible for the nonzero initial rotation at the
support. In addition, observations are aect e d by observation errors
v
i
: V N(v;0, 1) mm, where the observation model is d efi n ed as
y
i
g(
ˇ
E,x
i
)+v
i
,V
i
?? V
j
, 8i 6= j.
Figure 13.4 compares the predicted and true deflection curves
along with observed values. We see that because of the combination
of model simplification and observation errors, we obtain a higher
loss for the correct parameter value
ˇ
E
= 35 GPa than for the
optimal value
E
= 22 GPa. As a result, the model predictions
are also biased for any prediction location
x
. This second example
illustrates how deterministic least-squares model calibration may
identify wrong parameter values as well as lead to inaccurate
predictions for measured and unmeasured locations.
probabilistic machine learning for civil engineers 217
0 1 2 3 4 5 6 7 8 9 10
12
10
8
6
4
2
0
Position [m]
Deflection [mm]
Predicted response
True deflection
Observations
0 10 20 30 40
0
5
10
15
20
Parameter value [GPa]
Loss function, J(, D)
E
ˇ
E
Figure 13.4: Least-squares model calibra-
tion where the interpretation model is
simplified in comparison with the reference
model,
ˇg
(
ˇ
E,x
)
6
=
g
(
ˇ
E,x
), and where there
are measurement errors.
Example #3: Overcomplex model with observation errors The
third example illustrates the pitfalls of performing model calibra-
tion using a simplified yet overcomplex interpretation model, that
is, a model that contains simplifications yet also contains too many
parameters for the problem at hand. Here, overcomplexity is caused
by the use of two parameters for describing t h e el ast i c modu l us ,
E
1
and
E
2
. Like in the second exampl e, obs e r vations are aected by
observation errors v
i
: V N(v;0, 1) mm.
Example #3
Reference model
ˇg(
ˇ
E,x)
z }| {
Interpretation model
g(E
1
,E
2
,x)
z
}| {
Figure 13.5 compares the predicted and true deflection curves
along with observed values. Note how the over-parameterization
allowed to wrongly match both observations. Parameter values
identified are biased in comparison with the true value, and the
predictions made by the model are also biased. Because in practice
the true deflection and parameter values remain unknown, the
biggest danger is overfitting wher e we could b el i eve that the model
is capable of providing accurate predictions because the calibration
wrongly appears to be perfect. In the context of least-squares model
calibration, this issue can be mitigated using cross-validation (see
§
8.1.2), where only subsets of observations are employed to tr ai n
the model, and then the remaining data is employed to test the
model’s predictive capacity. This p rocedure would reveal that th e
interpretat i on model is poorly defined. In a more general way, we
can also use Bayesian model sele ct i on (s ee
§
6.8) for quantifying the
probability of each model cl ass .
0 1 2 3 4 5 6 7 8 9 10
12
10
8
6
4
2
0
Position [m]
Deflection [mm]
Predicted response
True deflection
Observations
0 25
0
5
10
15
20
Parameter value [GPa]
Loss function, J(, D)
E
1
E
2
ˇ
E
Figure 13.5: Least-squares model calibra-
tion where the interpretation model is
overcomplex in comparison with the ref-
erence model,
ˇg
(
ˇ
E,x
)
6
=
g
(
ˇ
E
1
,
ˇ
E
2
,x
), and
where there are measurement errors.
j.-a. goulet 218
13.1.2 Limitations of Deterministic Model Calibration
Deterministic model calibrati on using methods such as least-squares
residual minimization performs well when model and measurement
errors are negligible and when data sets are large; the issue is that
in practical engineering applications, the errors associated with
measurements and especially hard-coded models are not negligible,
and data sets are typically small. As illustrated in figure 13.6, for
engineering systems, no matter how much eort is devoted to
building a hard-coded physics-based model, it will always remain
a simplified version of the reality. Therefor e, m odel simplifications
introduce pre di ct i on err or s th at n eed t o be considered explicitly as
hidden variables to be inf er r ed in addition to mo d el parameters .
Figure 13.6: In complex engineering
contexts, no matter how good a hard-coded
physics-based model is, it never perfectly
represents the reality. (Photo: Tamar
Bridge, adapted from Peter Edwards)
13.2 Hierarchical Bayesian Estimation
This section presents a Bayesian hierarchical method for jointly
estimating model parameters as well as prediction errors. The justi-
fication for this approach is that no matter how good a hard-coded
model is, it remains an approximation of the reality. Therefore,
in addition to the model parameters, we need to infer prediction
errors. There is typically an issue of non-identifiability (see
§
6.3.4)
between model parame t er s and pre di ct i on er r ors . As we will see,
the problem is made identifiable using prior knowledge. In addition,
note that because of observation errors, observation s
y
i
are not
exact representations of the reality either, that is,
model() 6= reality
model() + error
w
= reality 6= observation
model()
| {z }
g ()
+ error
w
| {z }
w
+ error
v
|{z}
v
= reality + error
v
|{z}
v
= observation
| {z }
y
.
In the previous equation, observations
y
=[
y
1
y
2
··· y
D
]
|
are
modeled by the sum of model predictions
g
(
, x
)
2 R
D
,which
are a function of covariat e s
x
=[
x
1
x
2
··· x
D
]
|
2 R
XD
and
model parameters
=[
1
2
···
P
]
|
2 R
P
; prediction errors
w
(
x
)=[
w
(
x
1
)
w
(
x
2
)
··· w
(
x
D
)]
|
2 R
D
, which are also a function of
covariates x; an d meas u rement errors v =[v
1
v
2
··· v
D
]
|
2 R
D
.
13.2.1 Joint Posterior Formulation
The model predictions
g
(
, x
) depend on the known covariates
x
describing the prediction locations and unknown model parame-
ters
describing physical properties of the system. The observed
structural responses are described by
probabilistic machine learning for civil engineers 219
y = g(, x)+w(x)+v , (13.1)
Observations
Deterministic predictions from a hard-coded model
Model parameters
Covariates Measurement errors
Prediction errors
where measurement errors are assumed to be independent of the
covariates
x
. All the terms in equation 13.1 are considered as
deterministic quantities because, for a given set of observations
D
=
{D
x
, D
y
}
=
{
(
x
i
,y
i
)
, 8i 2{
1:
D}}
, quantities
x
,
y
,
,
w
(
x
), and
v
are not varying, yet values for
,
w
(
x
), and
v
remain
unknown.
The prior knowledge for model parameters is described by the
random variable
f
(
;
z
p
), where
z
p
are the hyperparameters,
that is, parameters of the prior probability density function (PDF).
When hyperpar amet e rs
z
p
are unknown, the random variable
Z
p
is describ ed by a hyper-prior
f
(
z
p
) and a hyper-posterior
f
(
z
p
|D
),
that is, the prior and posterior PDFs of hyperparameters. Unlike
the parameter values
, for which true values may exist, hyperpa-
rameters
z
p
are parameters of our prior knowledge, for which, by
nature, no true value exists. When hyperparame t er s
z
p
are assumed
to be fixed constants that are not learned from data, they can be
implicitly included in the prior PDF formulation f(; z
p
)=f().
The concept of prior, posterior, hyper-prior, and hyper-posteri or
not only applies to unknown parameters
but also to prediction
errors
w
(
x
) and measurement errors
v
. Note that, unlike for hyper-
parameters
z
p
and
z
w
, a true value can exist for
z
v
.Whenitexists,
the true value for
z
v
can typically be in fe r re d fr om th e st at i st i cal
precision of the measuring instruments employed to obtain
y
.In
this section, in order to simplify the notation and to allow focusing
the attention on model parameters
and prediction errors
w
(
x
),
we ass um e that z
p
and z
v
are known constants.
The formulation for the joint posterior PDF for unknown model
parameters
, prediction errors
w
(
x
), and prediction errors prior
PDF hyperparameters z
w
follows:
posterior
z }| {
f(, w(D
x
), z
w
|D)=
likelihood
z }| {
f(D
y
|, w(D
x
), D
x
) ·
prior
z }| {
f() · f (w(D
x
)|z
w
) ·
hyper-prior
z}|{
f(z
w
)
f(D
y
)
.
(13.2)
Model parameters
Hyperparameters for the prediction errors prior PDF
Prediction errors at location D
x
Data
j.-a. goulet 220
Likelihood The formulation for the joint posterior in equation 13.2
allows the explicit quantification of the depen de nc e between model
parameters and prediction errors at observed locations. In the
case where several sets of model parameters and prediction errors
{, w
(
D
x
)
}
can equally explain the observations, then the problem
is non-identifiable (see
§
6.3.4) and several sets of values will end
up having an equal poster i or pr ob abi l ity, given that the prior
probability of each set is also equal. Here, the likelihood of data
D
y
is formulated considering that
g
(
, D
x
) and
w
(
D
x
) are unknown
constants, so the only aleatory quantities are the observation errors
V N(v; 0,
v
). The formulation of the likelihood is then
f(D
y
|, w(D
x
), D
x
)=f
(
constants
z }| {
g(, D
x
)+w(D
x
)+
N(v;0,
v
)
z}|{
V )=D
y
= N(y = D
y
;
µ
y
z }| {
g(, D
x
)+w(D
x
),
y
z}|{
v
)
=
exp
1
2
D
y
(g(, D
x
)+w(D
x
))
|
1
v
D
y
(g(, D
x
)+w(D
x
))
(2)
D/2
p
det
v
/ exp
1
2
D
y
(g(, D
x
)+w(D
x
)
|
1
v
D
y
(g(, D
x
)+w(D
x
))
=exp
1
2
D
X
i=1
y
i
(g(, x
i
)+w(x
i
))
2
2
v
!
, for V
i
?? V
j
, 8i 6= j.
Prior It is necessary to define a mathematical formulation for the
prior PDF of prediction errors,
W f
(
w|z
w
). A convenient choice
consists in describing the prior PDF for prediction errors using a
Gaussian process analogous to the one described in
§
8.2. With the
assumption that our model is unbiased, that is, the prior expected
value for prediction er r ors is
0
, this Gaussian process is expressed
as
W N(w; 0,
w
), where
[
w
]
ij
=
w
(x
i
) ·
w
(x
j
) · ( x
i
, x
j
, `), and
w
(x)=fct(x, z
s
),
(x
i
, x
j
, `)=exp
1
2
(x
i
x
j
)
|
1
`
(x
i
x
j
)
,
z
w
=[z
s
`]
|
.
The prediction error standard deviation
w
(
x
) is typically covariate
dependent, so that it needs to be represented by a problem-specific
function
fct
(
x, z
s
), where
z
s
are its parameters. The correlation
structure can be represented by a square-exponential covariance
function, where
`
is a diagonal matrix, parameterized by the
length-scale parameters
`
=[
`
1
`
2
··· `
X
]
|
.
`
k
is the hyperpa-
rameter describing how the correlation decreases as the distance
probabilistic machine learning for civil engineers 221
(
x
k,i
x
k,j
)
2
increases. Note that this choice for the correlation
structure is not exclusive and others can be employed as described
in
§
8.2. The hyperparame t er s regr ouped in
z
w
need to be learned
from data.
Example #4
Reference model
ˇg(
ˇ
E,x)
z }| {
Interpretation model
g(E,x)
z }| {
D
x
= {5, 10}m
D
y
= {4.44, 11.28}mm
0 2 4 6 8 10
12
10
8
6
4
2
0
position [m]
Deflection [mm]
True deflection
Predicted
(a) Comparison of the real and pre-
dicted deflection obtained using the
true parameter value
ˇ
E.
0 2 4 6 8 10
12
10
8
6
4
2
0
position [m]
Deflection [mm]
(b) The prior knowledge for pre-
diction errors is described using a
Gaussian process.
Figure 13.7: Graphical representation
of the prior knowledge for prediction
errors arising from the use of a simplified
interpretation model.
Example #4: Joint posteri or es ti ma tio n We are revisiting example
#2, where the model employed to infer parameters
g
(
E,x
) contains
simplifications in comparison with the reference model
ˇg
(
ˇ
E,x
)
describing the real system studied. In this case, the simplification
consists in omitting the rotational spring allowing for a nonzero
initial rotation at the support. In addition, obs er vations are aected
by independents and identically dist ri b ut e d obse r vation errors
v
i
: V N(v ;0, 1) mm.
Figure 13.7a presents the true deflection obtained from the
reference model as well as from the interpretation model using the
true parameter val u e
ˇ
E
= 35 GPa. We can see that for any lo c ati on
other than
x
= 0, there is a systematic prediction error. The prior
knowledge for these the ore t ic al ly unknown prediction errors i s
modeled using a zer o-m ean Gaussian process,
f(w)=N(w ; 0,
w
),
where each term of the covariance matrix is defi ne d as
[
w
]
ij
=
i
z }| {
(a · x
i
) ·
j
z }| {
(a · x
j
) ·
ij
z }| {
exp
1
2
(x
i
x
j
)
2
`
2
. (13.3)
The prior knowledge for prediction errors is parameterized by
z
w
=
[
a`
]
|
. Figure 13.7b presents the marginal
±
w
confidence interval
around the predicted deflection. The prior knowledge for each pa-
rameter and hyperparameter is elastic modulus
N
(
E
; 20
,
20
2
) GPa,
model error prior scale factor
N
(
a
; 10
4
,
(5
10
4
)
2
), and length-
scale
N
(
`
;5
,
50
2
) m. Note th at al l of t hos e pr i ors ar e as su me d to be
independent from each other, so the joint prior can be described by
the product of the marginal s. The analytic formulation of the poste-
rior PDF for the unknown model parameter
=
E
, the prediction
errors
w
(5)
,w
(10), and the hyperparameters
a, `
is known up to a
constant, so that
f(E,w(5),w(10),a,`|D
y
) /N(y
1
; g(E,5) + w(5) , 1
2
) ···
·N(y
2
; g(E,10) + w(10) , 1
2
) ···
·N([w(5) w(10)]
|
; 0,
w
) ···
·N(E; 20, 20
2
) ···
·N(a; 10
4
, (5 10
4
)
2
) ···
·N(`;5, 50
2
).
j.-a. goulet 222
Using the Metropolis sampling method presented in
§
7.1, three par-
allel chains, contain in g a total of
S
= 10
5
joint s amp l es , are taken
from the unnormalized posterior
f
(
E,w
(5)
,w
(10)
,a,`|D
y
), where
each joint sample is described by
{E,w
(5)
,w
(10)
,a,`}
s
.Thefirst
half of each chain are discarded as burn-in samples; the Metropolis
acceptance rate is approximately 0.25, and the estimated potential
scale reduction (EPSR, i.e., chain stationarity metric; see
§
7.3.3)
is below 1.1 f or all parameters and hyperparamet er s. T he M arkov
chain Monte Carlo (MCMC) p os te r ior s amp le s are pre sented in
figure 13.8, where, when it exists, the true value is represented by
the symbol
. Note the almost perfect correlation in the posterior
between
w
1
(5) and
w
2
(10). It happens because, as illustrated in
figure 13.7, whenever the model either under- or overestimates the
displacement, it does for both locations. Notice that there is also
a clear dependence between prediction errors
w
i
and the elastic
modulus
E
, where small (large) valu es for
E
are compensated by
large positive (negative) prediction errors. These are examples of
non-identifiability regularized by prior knowledge, as described in
§6.3.4.
20 40 60 80
E [GPa]
20 40 60 80
0.5
1
1.5
·10
3
a [mm]
00.511.5
·10
3
a [mm]
20 40 60 80
50
100
150
l [m]
00.511.5
·10
3
50
100
150
050100150
l [m]
20 40 60 80
4
2
0
2
4
w
1
[mm]
00.511.5
·10
3
4
2
0
2
4
050100150
4
2
0
2
4
420 2 4
w
1
[mm]
20 40 60 80
5
0
5
w
2
[mm]
00.511.5
·10
3
5
0
5
050100150
5
0
5
420 2 4
5
0
5
50 5
w
2
[mm]
Figure 13.8: Bivariate scatter plots and
histograms describing the marginal poste-
rior PDFs. The symbol
indicates the true
values.
probabilistic machine learning for civil engineers 223
13.2.2 Predicting at Unobserved Locations
In practical applications, we are typically interested in predicting
the structure’s responses
u
at unobserved locations
x
. Analogous
to equation 13.1, the predicted values at unobserved locations are
defined as
u = g(E,x
)+w(x
).
When predicting at unobserved locations, we want to consider
the knowledge obtained at observed locations, as described by the
posterior samples displayed in figure 13.8. When using MCMC
samples (see
§
7.5), we can app roximate the posterior predictive
expected values and covariances by following
E[U|D]=
Z
u · f (u|D)du
1
S
S
X
s=1
u
s
(Posterior predictive mean)
cov(U|D)=E[(U E[U|D])(U E[U|D])
|
]
1
S 1
S
X
s=1
(u
s
E[U|D])(u
s
E[U|D])
|
. (Posterior predictive covariance)
In order to compute
E
[
U|D
] and
cov
(
U|D
), we have to gener-
ate realizations of
u
s
from
f
(
u|D
) using the MCMC samples
{E,w
(5)
,w
(10)
,a,`}
s
obtained from the unnormalized posterior
f
(
E,w
(5)
,w
(10)
,a,`|D
y
). Through that process, we must first
generate realizations for the covariance matrices
{
s
,
s
w
,
s
w
}
describing the model errors prior PDF. These realizations are then
used to update the posterior mean vector and covariance mat r i x
describing the model errors at prediction lo c at ion s, so that
8
>
>
<
>
>
:
w
s
(5)
w
s
(10)
a
s
`
s
9
>
>
=
>
>
;
!
8
<
:
s
s
w
s
w
9
=
;
!
(
µ
s
|w
=
s|
w
(
s
w
)
1
[w
s
(5) w
s
(10)]
|
s
|w
=
s
s|
w
(
s
w
)
1
s
w
.
Realizations of the model errors at prediction locations
w
s
can be
Note: From equation 13.3,
[
s
]
ij
=
i
z }| {
(a
s
·x
i
)·
j
z }| {
(a
s
·x
j
)·
ij
z }| {
exp
9
1
2
(x
i
x
j
)
2
`
2
s
[
s
w
]
ij
=(a
s
·x
i
)·(a
s
·x
j
)·exp
9
1
2
(x
i
x
j
)
2
`
2
s
[
s
w
]
ij
=(a
s
·x
i
)·(a
s
·x
j
)·exp
9
1
2
(x
i
x
j
)
2
`
2
s
sampled from the multivariate G aus si an PDF defined by
w
s
: W
N(w
; µ
s
|w
,
s
|w
).
Finally, we can obtain a realization of the structure’s behavior at
predicted locations by summing the realization of model errors
w
s
with the model predictions evaluated for the model parameter
MCMC sample E
s
,
u
s
= g(E
s
, x
)+w
s
.
j.-a. goulet 224
Figure 13.9a presents a comparison of the posterior predictive
model prediction samples
u
s
with the deflection of the reference
model. Figure 13.9b presents the same comparison, this time using
the posterior predictive mean vector and con fi de nc e interval. Notice
how the confidence interval and the smoothness of the realiz ati on s
match the true deflection profil e .
0 2 4 6 8
10
20
15
10
5
0
5
position [m]
deflection [mm]
(a)
0 2 4 6 8 10
20
15
10
5
0
5
position [m]
deflection [mm]
True deflection
Observations
(b)
Figure 13.9: Comparison of the posterior
predictive model predictions with the
displacement of the reference model.
The scatter plots in figure 13.8 show that significant uncer-
tainty remains regardi n g th e param et e r
E
and prediction errors
w
(5)
,w
(10). However, because there is a strong dependenc e in th e
joint posterior PDF describing these variables, the posterior un-
certainty for the defle ct i on r emai n s limi t ed . It h ap pens because
abnormally small (large) values for the elastic modulus
E
s
result
in large (small) negative displacements, which are compensated for
by large positive (negative) predict ion err or s
w
s
(5)
,w
s
(10). This
example illustrates why, when calibrating model parameters, it is
also necessary to infer model prediction errors because parameters
and prediction errors are typically dependent on each other.
The most dicult part of such a hierarchical Bayesian estimation
procedure is to define an appropriate function for describing the
prior knowledge of model prediction errors. This aspect is what
is limiting its practical application to complex case studies where
the prediction errors depend not only on the prediction locations
but also on the prediction types, for example, rotation, strain,
acceleration, and so on. When several formulations for the prior
knowledge of predi c ti on errors are available, the probability of each
of them can be estimated using Bayesian model selection (see
§
6.8)
or cross-validation (se e
§
8.1.2). Furt he r de t ai ls about hierarchical
Bayesian estimation applied to model calibration can be found in
Kennedy and O’Hagan;
1
Brynjarsd´ottir and O’Hagan.
2
1
Kennedy, M. C. and A. O’Hagan (2001).
Bayesian calibration of computer models.
Journal of the Royal Statistical Society:
Series B (Statistical Methodology) 63(3),
425–464
2
Brynjarsd´ottir, J. and A. O’Hagan (2014).
Learning about physical parameters: The
importance of model discrepancy. Inverse
Problems 30 (11), 114007