j.-a. goulet 146
tion of parameter s in the case of the l in ear regression presented
in §8.1, no closed-form analytic solution exists here to identify b
⇤
.
With linear regr ession, the derivative of the log-likelihood (i.e.,
the loss functi on ) was linear, leading to an analytic solution for
b
⇤
:
@J(b)
@b
= 0. With logistic regression, the derivative of the
log-likelihood is a nonlinear function, so we have to resort to an
optimization algorithm (see chapter 5) to identify b
⇤
.
Example: Logistic regression Figure 9.10 presents three examples
involving a single covariate
x
and a linear model
g
(
x
)=
b
1
x
+
b
0
,
and where paramete rs b =[
b
0
b
1
]
|
are estimated wi th respectively
5, 10, and 100 observations. The true parameter values employed to
generate simulated observations are
ˇ
b
=[
40
.
08]
|
. The correspond-
ing functions
g
(X
ˇ
b
) and
(
g
(X
ˇ
b
)) are represented by dashed lines,
and those est i mat ed using MLE parameters
g
(Xb
⇤
) and
(
g
(Xb
⇤
))
are represented by solid li ne s. We can observe that as the number
of ob se rvations increases, the class ifi er converges toward the one
employed to generate the data.
0
25
50
75
100
5
0
5
0
0.5
1
x
g(x)
g(x) (g(x)) (x)
y
i
(a) D =5, b
⇤
=[3.10.09]
|
(b) D =10, b
⇤
=[4.40.08]
|
(c) D =100, b
⇤
=[3.90.08]
|
Figure 9.10: Example of application of
logistic regression.
This case is a trivial one because, in this cl os ed -l oop simulation,
the model structure (i.e., as defined by the basi s functi ons in model
matrix X) was a perfect fit for the problem. In practic al cases, we
have to select an appropriate set of basis functions
j
(
x
i
)tosuit
the problem at hand. Like for the linear regression, the selection of
basis functions is prone to overfitting, so we have to employ either
the Bayesian model selection (see §6.8) or cross-validation (see
§8.1.2) for that purpose.
Civil engineering perspectives In the field of transportation en-
gineering, logistic regression h as been extensively employed for
discrete choice modeli ng
3
because of the interpretability of the
3
Ben-Akiva, M. E. and S. R. Lerman
(1985). Discrete choice analysis: Theory
and application to travel demand.MIT
Press; and McFadden, D. (2001). Economic
choices. American Economic Review 91 (3),
351–378
model parameters b in the context of behavioral economics. How-
ever, for most benchmark probl e ms, the predictive capacity of
logistic regression is outperformed by more modern techniques such
as Ga us s i an process classification and neural networks, which are
presented in the next two sections.
9.3 Gaussian Process Classification
Gaussian process classification (GPC) is th e extension of Gaussian
process regression (GPR; see §8.2) to classification problems. In the
context of GPR, a function
R
X
! R
is defined so that it transforms
an
X
-dimensions covariate domain into a s i ngl e output
g
(x)
2 R
.In
the context of classific at ion , the system response is
y 2 {
1
,
+1
}
.