4
Probability Distributions
The definition of probability distributions
f
X
(
x
) was left aside in
chapter 3. This chapter presents the formulation and properties for
the probability distributions employed in this book: the Normal
distribution for
x 2 R
, the log-normal for
x 2 R
+
, and the Beta for
x 2 (0, 1).
4.1 Normal Distribution
The most widely employed probability distribution is the Normal,
also known as the Gaussian, distribution. In this book, the names
Gaussian and Normal are emp loyed interchangeably when describ-
ing a probability distribu t ion . This section covers the math em at ic al
foundation for the univariate and multivariate Normal and then
details the properties explaining its widespread usage.
4.1.1 Univariate Normal
Univariate Normal
x 2 R : X N(x; µ,
2
)
4 2 0 2 4
0
0.1
0.2
0.3
µ
µ +
µ
x
f
X
(x)
(a) Probability density function (PDF)
4 2 0 2 4
0
0.2
0.4
0.6
0.8
1
µ
µ +
µ
x
F
X
(x)
(b) Cumulative distribution function (CDF)
Figure 4.1: Representation of the univari-
ate Normal for µ =0, =1.
The probability density function (PDF) for a Normal random
variable is defined over the real numbers
x 2 R
.
X N
(
x
;
µ,
2
)is
parameterized by its mean µ and variance
2
,soitsPDFis
f
X
(x)=N(x; µ,
2
)=
1
p
2⇡
exp
1
2
x µ
2
!
.
Figure 4.1 presents an e xam pl e of PDF and cumulative distr i bu t i on
function (CDF) with parameters
µ
= 0 and
= 1. The mode—
that is, the most likely value—corr e sponds to the mean. Changing
the mean
µ
causes a translation of the distribution. Increasing
the standard deviation
causes a proportional increase in the
PDF’s dispersion. The Normal CDF is presented in figure 4.1b. Its
formulation is obtained through integration, where th e integral can
j.-a. goulet 36
be formu l ate d using the error function erf(·),
F
X
(x)=
Z
x
1
1
p
2⇡
exp
1
2
x
0
µ
2
!
dx
0
=
1
2
1+erf
x µ
p
2
◆◆
.
f
X
(x)=
constant
z }| {
1
p
2⇡
exp
0
B
B
B
B
B
B
B
@
1
2
quadratic
z }| {
0
B
B
@
x µ
| {z }
linear
1
C
C
A
2
1
C
C
C
C
C
C
C
A
| {z }
exponential
4 2 0 2 4
10
0
10
µ
µ+
µ-
x
xµ
4 2 0 2 4
0
10
20
µ
µ+
µ-
x
xµ
2
4 2 0 2 4
0
1
µ
µ+
µ-
x
e
1
2
(
xµ
)
2
4 2 0 2 4
0
0.4
µ
µ+
µ-
x
1
p
2⇡
e
1
2
(
xµ
)
2
Figure 4.2: Illustration of the univariate
Normal probability density function
formulation for µ =0, =1.
Figure 4.2 illustrates the successive steps taken to construct
the univariate Normal PDF. Within the innermost parenthesis of
the PDF formulation is a linear function
xµ
,whichcenters
x
on
the mean
µ
and normalizes it with the standard deviation
.This
first term is then squared, leading to a positive number over all
its domain except at the mean, where it is equal to zero. Taking
the negative exponential of this second term leads to a bell-shaped
curve, where the value equals one (
exp
(0) = 1) at the mean
x
=
µ
and where there are inflexion points at
µ ±
.Atthisstep,thecurve
is proportional to t he fin al Normal P DF . Onl y t he nor mal i zat i on
constant is missing to ensure that
R
1
1
f
(
x
)
dx
= 1. The normal-
ization constant is obtained by int egr at in g the ex ponential term,
Z
+1
1
exp
1
2
x µ
2
!
dx =
p
2⇡. (4.1)
Dividing the exponent i al te rm by the normalization constant in
equation 4.1 results in the final formulation for the Normal PDF.
Note that for
x
=
µ
,
f
(
µ
)
6
= 1 because the PDF has been normal-
ized so its integral is one.
4.1.2 Multivariate Normal
The joint probability density function (PDF) for two Normal
random variables {X
1
,X
2
} is given by
f
X
1
X
2
(x
1
,x
2
)=
1
2⇡
1
2
p
1
2
exp
1
2(1
2
)
x
1
µ
1
1
2
+
x
2
µ
2
2
2
2
x
1
µ
1
1
◆✓
x
2
µ
2
2
!!
.
There are three terms within the parentheses inside the exponential.
The first two are analogous to the quadratic terms for the univari-
ate case. The third one includes a new parameter
describing the
correlation coecient between
X
1
and
X
2
. Together, these th re e
terms describe the equation of a 2-D ellipse centered at [µ
1
µ
2
]
|
. Multivariate Normal
x 2 R
n
: X N(x; µ
X
,
X
)
In a more general way, the probability density function for
n
random variables X =[
X
1
X
2
··· X
n
]
|
is describ ed by x
2 R
n
:
X
N
(x;
µ
X
,
X
), where
µ
X
=[
µ
1
µ
2
··· µ
n
]
|
is a vector
probabilistic machine learning for civil engineers 37
containing mean values and
X
is the covariance matrix,
X
= D
X
R
X
D
X
=
2
6
6
6
4
2
1
12
1
2
···
1n
1
n
2
2
···
2n
2
n
.
.
.
.
.
.
.
.
.
.
.
.
sym.
2
n
3
7
7
7
5
nn
.
D
X
is the standard deviation matrix containing the standard
D
X
:Standarddeviationmatrix
R
X
:Correlationmatrix
X
:Covariancematrix
deviation of each random variable on its main diagonal, and
R
X
is
the symmetric (sym.) correlation matrix containing the correlati on
coecient for each pair of random variab le s,
D
X
=
2
6
6
6
4
1
000
2
00
.
.
.
0
sym.
n
3
7
7
7
5
, R
X
=
2
6
6
6
4
1
12
···
1n
1 ···
2n
.
.
.
.
.
.
.
.
.
n1n
sym. 1
3
7
7
7
5
.
Note that a variable is linearly correlated with itself so the main
diagonal terms for the correlation matrix are [
R
X
]
ii
=1
, 8i
.The
multivariate Normal joint PDF is described by
f
X
(x)=
1
(2)
n/2
(det
X
)
1/2
exp
1
2
(x µ
X
)
|
X
1
(x µ
X
)
,
where the terms inside the exponential describe an
n
-dimensional
ellipsoid centered at
µ
X
. The directions of the princip al axes of this
ellipsoid are described by th e eige nvector (see §2.4.2) of the covari-
ance matrix
X
, and their lengths by the eigenvalues. Figure 4.3
presents an example of a covariance matrix decomp ose d into its
eigenvector and eigenvalues. The curves overlaid on the joint PDF
describe the marginal PDFs in the eigen space.
2
0
2
2
0
2
0
0.2
0.4
x
1
x
2
f
X
1
X
2
(x
1
,x
2
)
2 0 2
2
0
2
2
2
1
1
2
2
2
2
1
1
1
1
x
1
x
2
=
10.8
0.81
V =
1
2
=
0.71 0.71
0.71 0.71
=[1.80.2]
|
Figure 4.3: Example of bivariate PDF with
µ
1
=
µ
2
=0,
1
=
2
=1,and
=0
.
8, for
which the covariance matrix is decomposed
into its eigenvector and eigenvalues.
For the multivariate Normal joint PDF formulation, the term
on the left of the exponential is again the normali z at ion con st ant,
which now include s the det e rm i nant of the covariance matrix.
As presented in §2.4.1, the determinant quantifies how much the
covariance matrix
X
is scaling the space x. Figure 4.4 presents
examples of bivariate Normal PDF and CDF with parameters
µ
1
= 0,
1
= 2,
µ
2
= 0,
2
= 1, and
=0
.
6. For the bivariate
CDF, notice how evaluat i ng t h e up per bound for one variable leads
to the marginal CDF, represented by the bold red line, for the other
variable.
4.1.3 Properties
A multivariate Normal random variable follow several prope rt i es .
Here, we insi st on six:
j.-a. goulet 38
1.
It is completely defined by its mean vector
µ
X
and covariance
matrix
X
.
2.
Its marginal distributions are also Normal, and the PDF of any
marginal is given by
x
i
: X
i
N(x
i
;[µ
X
]
i
, [
X
]
ii
).
3.
The absence of correlation implies statistical independence. Note
that this is not gene r all y true for other types of random variables
(see §3.3.5),
ij
=0, X
i
?? X
j
.
4.
The central limit theorem (CLT) states that, under some con-
ditions, the asymptotic distribution obtained from the nor-
malized sum of independent identically distr i but ed (iid) ran-
dom variables (normally distributed or not) is Normal. Given
X
i
, 8i 2{
1
, ··· ,n}
, a set of iid random variables with ex-
pected value
E
[
X
i
]=
µ
X
and finite variance
2
X
, the PDF of
Y
=
P
n
i=1
X
i
approaches
N
(
X
,n
2
X
), for
n !1
. More
formally, the CLT states that
p
n
Y
n
µ
X
d
!N(0,
2
X
),
where
d
!
means converges in distribution. In practice , when
observing the outcomes of real-life phenomena, it is common to
obtain empirical distributions that are similar to the Normal
distribution. We can see the parall e l wher e th es e phen ome na are
themselves issued from the superposition of sever al ph en ome na.
This property is key in explaining the widespread usage of the
Normal probability distribution.
5
0
5
5
0
5
0
0.1
0.2
0.3
0.4
x
1
x
2
f
X
1
X
2
(x
1
,x
2
)
5 0 5
5
0
5
x
1
x
2
(a) Bivariate Normal PDF
5
0
5
5
0
5
0
0.5
1
x
1
x
2
F
X
1
X
2
(x
1
,x
2
)
5 0 5
5
0
5
x
1
x
2
(b) Bivariate Normal CDF
Figure 4.4: Examples of bivariate PDF and
CDF for
µ
1
=
µ
2
=0,
1
=2,
2
=1,and
=0.6.
5.
The output from linear functions of Normal random variables
are also Normal.Givenx : X
N
(x;
µ
X
,
X
) and a linear
function y = Ax + b, the properties of linear transformations
described in §3.4.1 allow obtaining
Y N(y; Aµ
X
+ b, A
X
A
|
).
Let us consider the simplified case of a linear function
z
=
x
+
y
for two random variables
x
:
X N
(
x
;
µ
X
,
2
X
)
,y
:
Y
N(y; µ
Y
,
2
Y
). Their sum is described by
Z N(z; µ
X
+ µ
Y
| {z }
µ
Z
,
2
X
+
2
Y
+2
XY
X
Y
| {z }
2
Z
).
probabilistic machine learning for civil engineers 39
In the case where both variables are statistically independent
X ?? Y
, the variance of their sum is equal to the sum of their
respective variance. For the general case describin g the sum
of a set of
n
correlated normal random variables
X
i
such that
X N(x; µ
X
,
X
),
Z =
P
n
i=1
X
i
N
z;
P
n
i=1
[µ
X
]
i
,
P
n
i=1
P
n
j=1
[]
ij
.
(4.2)
As we will see in the next chapters, the usage of linear models
is widespread in machine learning because of the analytical
tractability of linear functions of Normal random variables.
6.
Conditional distributions are Normal. For instance, we can
partition an ensemble of
n
random variables X in two subsets so
that
X =
X
i
X
j
, µ =
µ
i
µ
j
, =
i
ij
ji
j
,
where
i
describes t h e covariance matrix for the
i
th
subset of
random variables and
ij
=
|
ji
describes t h e covariance be-
tween the random variables belonging to subsets
i
and
j
.Itis
mentioned in §3.3.4 that a conditional probability density func-
tion is obtained from the division of a joint PDF by a marginal
PDF. The same concept applies to defi n e the conditional PDF of
X
i
given a vector of observations X
j
= x
j
,
f
X
i
|x
j
(x
i
|X
j
= x
j
| {z }
observations
)=
f
X
i
X
j
(x
i
, x
j
)
f
X
j
(x
j
)
= N(x
i
; µ
i|j
,
i|j
),
where the conditional mean and covariance are
µ
i|j
= µ
i
+
ij
1
j
(x
j
µ
j
)
i|j
=
i
ij
1
j
|
ij
.
(4.3)
If we simplify this setup for only two random variables
X
1
and
X
2
, where we want the conditional PDF of
X
1
given an observa-
tion X
2
= x
2
, then equation 4.3 simplifies to
µ
1|2
= µ
1
+ ⇢
1
x
2
µ
2
2
2
1|2
=
2
1
(1
2
).
In the special case wher e the pri or mean
µ
1
=
µ
2
= 0 and the
prior standard deviations
1
=
2
>
0, then the conditional mean
simplifies to the observation
x
2
times the correlation coecient
. Note that the conditional variance
2
1|2
is independent of the
observed value
x
2
; it only depends on the pr ior variance and the
correlation coecient . For the special case where
= 1, then
2
1|2
= 0.
j.-a. goulet 40
4.1.4 Example: Conditional Distributions
For the beam example illustrated in figure 4.5, our prior knowledge
for the resistance {X
1
,X
2
} of two adjacent beams is
X
1
N(x
1
; 500, 150
2
)[kN·m]
X
2
N(x
2
; 500, 150
2
)[kN·m],
and we know that the beam resist an ce s are correl at ed wit h
12
=
0
.
8. Such a correlation could arise because both beams were fab-
ricated with the same process, in the same factory. This prior
knowledge is described by the joint bivariate Normal PD F ,
(a) Multi-beam bridge span. (Photo:
Archives Radio-Canada)
(b) Concrete beams
Figure 4.5: Example of dependence be-
tween the resistance of beams.
f
X
1
X
2
(x
1
,x
2
)=N(x; µ
X
,
X
)
8
>
>
<
>
>
:
µ
X
=
500
500
X
=
150
2
0.8 · 150
2
0.8 · 150
2
150
2
.
If we observe that the resistance of the second beam
x
2
= 700
kN·m
,
we can employ conditional probabilities t o es t i mat e t he PD F of the
strength X
1
, given the observation x
2
,
f
X
1
|x
2
(x
1
|x
2
)=N(x
1
; µ
1|2
,
2
1|2
),
where
µ
1|2
= 500 + 0.8 150
observation
z}|{
700 500
150
= 660 kN·m
1|2
= 150
p
1 0.8
2
= 90 kN·m.
Figure 4.6 presents the joint and conditional PDFs correspon di n g
to this example. For the joint PDF, the highlighted pink slice cor r e-
sponding to
x
2
= 700 is proportional to the conditional probability
f
X
1
|x
2
(
x
1
|x
2
= 700). If we want to obtain the conditional distribu-
tion from the joint PDF, we have to divide it by the marginal PDF
f
X
2
(
x
2
= 700). This ensures that the conditional PDF for
x
1
inte-
grates to 1. This example is trivial, yet it sets the foundations for
the more advanced models that will be pres ented in the following
chapters.
0
1,000
0
700
1,000
·10
5
x
1
x
2
f
X
1
,X
2
(x
1
,x
2
)
0
1,000
0
700
1,000
·10
3
x
1
x
2
f
X
1
|X
2
(x
1
|x
2
)
µ
1|2
=660
1|2
=90
Figure 4.6: Joint prior PDF
f
X
1
X
2
(
x
1
,x
2
)
and conditional PDF
f
X
1
|x
2
(
x
1
|x
2
)de-
scribing the resistance of beams.
4.1.5 Example: Sum of Normal Random Variables
Figure 4.7: Steel cables each made from
multiple wires. (This example is adapted
from Armen Der Kiureghian’s CE229
course at UC Berkeley.)
Figure 4.7 presents steel cables where each one is made from dozens
of individual wires. Let us consider a cable made of 50 steel wires,
each having a resistance
x
i
:
X
i
N
(
x
i
; 10
,
3
2
)
kN
.Weuse
equation 4.2 to compare the cable resistance
X
cable
=
P
50
i=1
X
i
depending on th e correl at i on coecient
ij
. With the hypothesis
probabilistic machine learning for civil engineers 41
that
X
i
?? X
j
,
ij
= 0, all nondiagonal terms of the covariance
matrix [
X
]
ij
=0, 8i 6= j, which l ead s to
X
cable
N(x; 50 10 kN, 50 (3 kN)
2
| {z }
X
cable
=3
p
50 21 kN
).
With the hypothesis
ij
= 1, all terms in [
X
]
ij
=(3
kN
)
2
, 8i, j
,so
that
X
cable
N(x; 50 10 kN, 50
2
(3 kN)
2
| {z }
X
cable
=3kN50 =150 kN
).
Figure 4.8 presents the resulting PDFs for the cable resistan ce ,
given each hypothesis. These results show that if the uncertainty
in the resistance for each wire is independent, there will be some
cancellation; some wires will have a resistance above the mean,
and some will have a resistance below. The result i ng coecient of
variation for
=0is
cable
=
21
500
=0
.
04, which is approximately
seven times smaller than
wire
=
3
10
=0
.
3, the variability associated
with each wire. In the opposite case, if the resistance is linearly
correlated (
= 1), the uncertainty adds up as you in cr e ase the
number of wires, so
cable
=
150
500
=
wire
.
0 500 1,000
0
1
2
·10
2
=1
=0
x (cable)
f
X
(x)
Figure 4.8: Probability density function
of the cable resistance depending on the
correlation between the strength of each
wire.
4.2 Log-Normal Distribution
The log-normal distribution is obtained by transforming the Normal
distribution through the function
ln x
. Because the logarithm
function is only defined for positive values, the domain of the log-
normal distribution is x 2 R
+
.
4.2.1 Univariate Log-Normal
Univariate log-normal
x 2 R
+
: X ln N(x; , )
x
0
2 R : X
0
=lnX N(ln x; ,
2
)
µ
X
:meanofX
:meanof lnX (= µ
ln X
)
2
X
:varianceofX
2
:varianceof lnX(=
2
ln X
)
The random variable
X ln N
(
x
;
,
) is log-normal if
ln X
N
(
ln x
;
,
2
) is Normal. Given the transformation function
x
0
=
ln x,thechange of var iable rule presented in §3.4 requires that
N(x
0
;,
2
)
z }| {
f
X
0
(x
0
) dx
0
= f
X
(x)dx
f
X
0
(x
0
)
dx
0
dx
= f
X
(x)
|{z}
ln N(x;,)
,
where the derivative of ln x with respect to x is
dx
0
dx
=
d ln x
dx
=
1
x
.
Therefore, the analytic formulation for the log-normal PDF is given
by the product of the tr an sf orm ati on ’ s derivative and the Normal
j.-a. goulet 42
PDF evaluated for x
0
=lnx,
f
X
(x)=
1
x
·N(ln x; ,
2
)
=
1
x
·
1
p
2⇡⇣
exp
1
2
ln x
2
!
,x>0.
The univariate log-normal PDF is parameterized by the mean
(
µ
ln X
=
) and variance (
2
ln X
=
2
) defined in the log-transformed
space (
ln x
). The mean
µ
X
and variance
2
X
of the log-normal
random variable can be transformed in the log-space using the
relations
= µ
ln X
=lnµ
X
2
2
=
ln X
=
s
ln
1+
X
µ
X
2
=
p
ln(1 +
2
X
).
(4.4)
Note that for
X
<
0
.
3, the standard deviation in the log-space is
approximately equal to the coecient of variation in the original
space,
X
. Figure 4.9 presents an example of log-normal PDF
plotted (a) in the original space and (b) in the log-transformed
space. The mean and standard deviation are
{µ
X
=2
,
X
=1
}
in
the original space and
{
=0
.
58
,
=0
.
47
}
in the log-transformed
space.
0 1 2 3 4 5 6 7
0
0.2
0.4
µ
X
µ
X
+
X
µ
X
X
x
f
X
(x)
(a) Original space
3 2 1 0 1
0
1
+
ln x
f
ln X
(ln x)
(b) Log-transformed space
Figure 4.9: Univariate log-normal probabil-
ity density function f or
{µ
X
=2
,
X
=1
}
and { =0.58,=0.47}.
4.2.2 Multivariate Log-Normal
Multivariate log-normal
x 2 (R
+
)
n
: X ln N(x ; µ
ln X
,
ln X
)
X
1
,X
2
, ··· ,X
n
are jointly log-normal if
ln X
1
, ln X
2
, ··· , ln X
n
are
jointly Normal. The multivariate log-normal PDF is parameterized
by the mean values (
µ
ln X
i
=
), variances (
2
ln X
i
=
2
), and
correlation coecient s (
ln X
i
ln X
j
) defined in the log-transformed
space. Correlation coecients in the log-space
ln X
i
ln X
j
are related
to the correlation coecients in t he ori gi nal sp ace
X
i
X
j
using the
relation
ln X
i
ln X
j
=
1
i
j
ln(1 +
X
i
X
j
X
i
X
j
),
where
ln X
i
ln X
j
X
i
X
j
for
X
i
,
X
j
0
.
3. The PDF for two
random variables {X
1
,X
2
} such that {x
1
,x
2
} > 0is
f
X
1
X
2
(x
1
,x
2
)=
1
x
1
x
2
p
2⇡⇣
1
2
q
1
2
ln
exp
1
2(1
2
ln
)
ln x
1
1
1
2
+
ln x
2
2
2
2
2
ln
ln x
1
1
1
◆✓
ln x
2
2
2
2
!!
.
probabilistic machine learning for civil engineers 43
Figure 4.10 presents an example of bivariat e log-n orm al PDF
with parameters
µ
1
=
µ
2
=1
.
5,
1
=
2
=0
.
5, and
=0
.
9. The
general formulation for the multivariate log-normal PDF is
f
X
(x)=lnN(x; µ
ln X
,
ln X
)
=
1
(
Q
n
i=1
x
i
)(2)
n/2
(det
ln X
)
1/2
exp
1
2
(ln x µ
ln X
)
|
1
ln X
(ln x µ
ln X
)
,
where
µ
ln X
and
ln X
are respectively the mean vector and covari-
ance matrix defined in the log-space.
0
1
2
3
0
1
2
3
0
1
2
3
x
1
x
2
f
X
1
X
2
(x
1
,x
2
)
0 1 2 3
0
1
2
3
x
1
x
2
Figure 4.10: Example of bivariate log-
normal probability density function for
µ
1
= µ
2
=1.5,
1
=
2
=0.5, and =0.9.
4.2.3 Properties
Because the log-normal distribution is obtained through a trans-
formation of the Normal distribution, it inherits several of its
properties.
1.
It is completely defined by its mean vector
µ
ln X
and covariance
matrix
ln X
.
2.
Its marginal distributions are also log-normal, and the PDF of
any marginal is given by
x
i
: X
i
ln N(x
i
;[µ
ln X
]
i
, [
ln X
]
ii
).
3.
The absence of correlation implies statistical independence.
Remember that t hi s is not gener al ly t r ue for other types of
random variables (see §3.3.5),
ij
=0, X
i
?? X
j
.
4.
Conditional distributions are log-normal,sothePDFofX
i
given
an observation X
j
= x
j
is given by
f
X
i
|x
j
(x
i
|X
j
= x
j
| {z }
observations
)=lnN(x
i
; µ
ln i|j
,
ln i|j
),
where the conditional mean vector and covarian ce are
µ
ln i|j
= µ
ln i
+
ln ij
1
j
(ln x
j
µ
ln j
)
ln i|j
=
ln i
ln ij
1
j
|
ln ij
.
(4.5)
5.
The multiplication of jointly log-normal random variables is
jointly log-normal so that for
X ln N
(
x
;
X
,
X
) and
Y
ln N(y;
Y
,
Y
), where X ?? Y ,
Z = X · Y
ln N(z;
Z
,
Z
)
Z
=
X
+
Y
2
Z
=
2
X
+
2
Y
.
Because the product of log- nor mal r and om variables can be
transformed in the sum of Normal random variables, the proper-
ties of the central limit theorem presented in §4.1.3 still hold.
j.-a. goulet 44
4.3 Beta Distribution
The Beta distribution is defined over the interval (0
,
1). It can be
scaled by the transformation
x
0
=
x ·
(
b a
)+
a
to model bounded
quantities within any range (
a, b
). The Beta probability density
function (PDF) is defined by Beta
x 2 (0, 1) : X B(x; , )
E[X]=
+
var[X]=
↵
( + )
2
( + +1)
f
X
(x)=B(x; , )=
x
1
(1 x)
1
B(, )
8
<
:
>0
>0
B(, ) : Beta function,
where
and
are the two distribution parameters, and the Beta
function B(, ) is the normalization constant so that
B(, )=
Z
1
0
x
1
(1 x)
1
dx.
A common application of the Bet a PDF is to employ the interval
(0
,
1) to model the probabil i ty density of a probability itself. Let us
consider two mutually exclu si ve and collectively exhaustive events,
for example, any event
A
and its complement
A
,
S
=
{A, A}
.Ifthe
probability that t h e event
A
occurs is uncertain, it can be described
by a random var iabl e so that
Pr(A)=X
Pr(A)=1 X,
where
x 2
(0
,
1) :
X B
(
x
;
,
). The parameter
can be inter-
preted as pseudo-counts representing the number of observations
of the event
A
, and
is the number of observations of the comple-
mentary event
A
. This relation between pseudo-counts and the Beta
distribution, as well as practical applications , are furt h er de t ail e d
in chapter 6. Figure 4.11 presents examples of Beta PDFs for three
sets of parameters. Note how for
=
= 1, the Beta distribution is
analogous to the Uniform distribution U(x;0, 1).
0 0.2 0.4 0.6 0.8 1
Figure 4.11: Three examples of the Beta
probability density function evaluated for
dierent sets of parameters {, }.