j.-a. goulet 60
In the case where
y
is known and
X
is not, we can reorganize the
terms of equation 6.1 in order to obtain Bayes rule,
f(x|y)=
f(y|x) · f(x)
f(y)
,
which describes the posterior PDF (i. e. , our poster ior knowled ge)
of
X
given that we have observed
y
. Let us consider
X
, a random
variable so that
X ⇠ f
(
x
), and given a set of observations
D
=
{y
1
,y
2
, ··· ,y
D
}
that are realizations of the random var i abl e s Y = Note: From now on, the number of ele-
ments in a set or the number of variables
in a vector is defined by a
typewriter
-font
upper-case letter; for ex ample,
D
is the
number of observations in a set
D
,and
X
is the number of variables in the vector
x =[x
1
x
2
··· x
X
]
|
.
Asetofobservations
D = {y
1
,y
2
, ··· ,y
D
}
e.g., D = {1.2, 3.8, ··· , 0.22},(y 2 R)
D = {3, 1, ··· , 6},(y 2 Z
+
)
D = {blue, blue, ··· , red},
(y 2{blue, red, green})
[
Y
1
Y
2
··· Y
D
]
|
⇠ f
(y). Our posterior knowledge for
X
given the
observations D is described by the conditional PDF
f(x|D)
| {z }
posterior
=
likelihood
z }| {
f(D|x) ·
prior
z}|{
f(x)
f(D)
|{z}
evidence
.
Prior
f
(
x
) describes our prior knowledge for the values
x
that a
random variable
X
can take. The prior knowledge can be expressed
in multiple ways. For instance, it can be based on heur is t i cs such
as expert opinions. In the case where d ata is obtained s eq u entially,
the posterior knowledge at time
t
1 becomes the prior at time
t
.
In some cases, it also happens that no pr i or knowledge is available;
then, we should employ a non-informative prior, that is, a p r i or Note: A uniform prior and a non-
informative prior are not the same
thing. Some non-informative priors are
uniform, but not all uniform priors are
non-informative.
that reflects an absence of knowledge (see §6.4. 1 for further details
on non-informative priors ) .
Likelihood
f
(Y = y
|x
)
⌘ f
(
D|x
)describesthelikelihood, or the
conditional probability density, of observing the event
{
Y =
D}
,
given the values that
x
can take. Note that i n the special case of an
exact observation where D = y = x,then
f(D|x)=
⇢
1,y= x
0, otherwise.
In such a case, the observations are ex act so the prior does not
play a role; by observing
y
you have all the information about
x
because
y
=
x
. In the general case where
y
:
Y
=
fct
(
x
), that is,
y
is a realization from a stochastic process, which is a functi on of
x
—
f
(
D|x
)describestheprior pr ob abi l i ty density of observing
D
Here, the term prior refers to our state of
knowledge that has not yet been influenced
by the observations in D.
given a specific set of values for x.
Evidence
f
(Y = y)
⌘ f
(
D
) is called the evidence and consists in a
normalization constant ensuring that the posterior P DF int egr at es