j.-a. goulet 162
previous case, each cluster is well approximated by a multivariate
Normal. Gaussian mixture models may display a poor performance
when the clusters do not have el l i ps oid al shapes. In such a case,
we should resort to more ad vanced meth od s such as spectral clus-
tering.
3
Furt he r detai l s regard in g the derivation and theor et i cal
3
Ng, A., M. Jordan, and Y. Weiss (2002).
On spectral clustering: Analysis and
an algorithm. In Advances i n neural
information processing systems,Volume14,
pp. 849–856. MIT Press
justification of the EM methods are presented by Murphy
4
and
4
Murphy, K. P. (2012). Machine learning:
Aprobabilisticperspective.MITPress
Bishop.
5
5
Bishop, C. M. (2006). Pattern recognition
and machine learning.Springer
Estimating the number of clusters In the previous examples, we
have as su med th at we knew the number of clusters
K
. In mos t
practical applications, this is not the case and
K
must be estimated
using methods such as cross-validation (see
§
8.1.2) or Bayesian
model selection (see §6.8) .
10.1.2 K-Means
K-means
6
is one of the most widesp re ad clus te r in g methods due
6
Lloyd, S. (1982). Least squares quanti-
zation in PCM. IEEE Transactions on
Information Theory 28 (2), 129–137
to its simplicity. It can be seen as a special case of the Gaussian
mixture model presented i n
§
10.1.1, where the probability of each
cluster is
p
(
k
)=1
/K
, the covariance of each cluster is diagonal
⌃
k
=
2
I
, and we seek to infer the cluster means
µ
k
2 R
X
.With
Gaussian mixture models, we were computing the probability that
the covariates
x
i
belong to the cluster
k
; with K-means, the cluster
membership assignment
k
⇤
i
for a covariate
x
i
is deterministic so
that it corresponds to
p(k|x
i
, ✓)=
⇢
1, if k = k
⇤
i
0, otherwise,
where a covariate
x
i
is assigned to the cluster associated with the
mean it is the close s t to,
k
⇤
i
= arg min
k
||x
i
µ
k
||.
The mean vector for t he
k
th
cluster is estimated as the average
location of the covariates x
i
associated with a cluster k so that
µ
k
=
1
#{i : k
⇤
i
= k}
X
i:k
⇤
i
=k
x
i
,
where #
{i
:
k
⇤
i
=
k}
is the number of obs er ved covariates assigned
to the cluster k.
0 0.25 0.5 0.75 1
0
0.25
0.5
0.75
1
x
1
x
2
(a) Real clusters
0 0.25 0.5 0.75 1
0
0.25
0.5
0.75
1
x
1
x
2
(b) Clusters approximated with K-
means, where crosses describ e each
cluster center.
Figure 10.6: Example of application of
K-means. Notice how some samples were
misclassified due to the hypothesis related
to the circular shape of each cluster, that
is, ⌃
k
=
2
I.
The general procedure consists in initializing
µ
(0)
k
randomly,
iteratively assigning each observation to the cluster it is the closest
to, and r e c omp ut in g the mean of each cluster. Figure 10.6 presents
in (a) the t r u e clust e r and in (b) the cluster member sh i p assign -
ment computed using K-means, where the mean of each cluster is