Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

DZone's Guide to

# R: Think Bayes – More Posterior Probability Calculations

· Performance Zone
Free Resource

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Evolve your approach to Application Performance Monitoring by adopting five best practices that are outlined and explored in this e-book, brought to you in partnership with BMC.

As I mentioned in a post recently, I’ve been reading through Think Bayes and translating some of the examples form Python to R.

After my first post Antonios suggested a more idiomatic way of writing the function in R so I thought I’d give it a try to calculate the probability that combinations of cookies had come from each bowl.

In the simplest case we have this function which takes in the names of the bowls and the likelihood scores:

```f = function(names,likelihoods) {
# Assume each option has an equal prior
priors = rep(1, length(names)) / length(names)

# create a data frame with all info you have
dt = data.frame(names,priors,likelihoods)

# calculate posterior probabilities
dt\$post = dt\$priors*dt\$likelihoods / sum(dt\$priors*dt\$likelihoods)

# specify what you want the function to return
list(names=dt\$names, priors=dt\$priors, likelihoods=dt\$likelihoods, posteriors=dt\$post)
}```

We assume a prior probability of 0.5 for each bowl.

Given the following probabilities of of different cookies being in each bowl…

```mixes = {
'Bowl 1':dict(vanilla=0.75, chocolate=0.25),
'Bowl 2':dict(vanilla=0.5, chocolate=0.5),
}```

…we can simulate taking one vanilla cookie with the following parameters:

```Likelihoods = c(0.75,0.5)
Names = c("Bowl 1", "Bowl 2")
res=f(Names,Likelihoods)

> res\$posteriors[res\$name == "Bowl 1"]
[1] 0.6
> res\$posteriors[res\$name == "Bowl 2"]
[1] 0.4```

If we want to simulate taking 3 vanilla cookies and 1 chocolate one we’d have the following:

```Likelihoods = c((0.75 ** 3) * (0.25 ** 1), (0.5 ** 3) * (0.5 ** 1))
Names = c("Bowl 1", "Bowl 2")
res=f(Names,Likelihoods)

> res\$posteriors[res\$name == "Bowl 1"]
[1] 0.627907
> res\$posteriors[res\$name == "Bowl 2"]
[1] 0.372093```

That’s a bit clunky and the intent of ‘3 vanilla cookies and 1 chocolate’ has been lost. I decided to refactor the code to take in a vector of cookies and calculate the likelihoods internally.

First we need to create a data structure to store the mixes of cookies in each bowl that we defined above. It turns out we can do this using a nested list:

```bowl1Mix = c(0.75, 0.25)
names(bowl1Mix) = c("vanilla", "chocolate")
bowl2Mix = c(0.5, 0.5)
names(bowl2Mix) = c("vanilla", "chocolate")
Mixes = list("Bowl 1" = bowl1Mix, "Bowl 2" = bowl2Mix)

> Mixes
\$`Bowl 1`
vanilla chocolate
0.75      0.25

\$`Bowl 2`
vanilla chocolate
0.5       0.5```

Now let’s tweak our function to take in observations rather than likelihoods and then calculate those likelihoods internally:

```likelihoods = function(names, observations) {
scores = c(1,1)
names(scores) = names

for(name in names) {
for(observation in observations) {
scores[name] = scores[name] *  mixes[[name]][observation]
}
}
return(scores)
}

f = function(names,mixes,observations) {
# Assume each option has an equal prior
priors = rep(1, length(names)) / length(names)

# create a data frame with all info you have
dt = data.frame(names,priors)

dt\$likelihoods = likelihoods(Names, Observations)

# calculate posterior probabilities
dt\$post = dt\$priors*dt\$likelihoods / sum(dt\$priors*dt\$likelihoods)

# specify what you want the function to return
list(names=dt\$names, priors=dt\$priors, likelihoods=dt\$likelihoods, posteriors=dt\$post)
}```

And if we call that function:

```Names = c("Bowl 1", "Bowl 2")

bowl1Mix = c(0.75, 0.25)
names(bowl1Mix) = c("vanilla", "chocolate")
bowl2Mix = c(0.5, 0.5)
names(bowl2Mix) = c("vanilla", "chocolate")
Mixes = list("Bowl 1" = bowl1Mix, "Bowl 2" = bowl2Mix)
Mixes

Observations = c("vanilla", "vanilla", "vanilla", "chocolate")

res=f(Names,Mixes,Observations)

> res\$posteriors[res\$names == "Bowl 1"]
[1] 0.627907

> res\$posteriors[res\$names == "Bowl 2"]
[1] 0.372093```

Exactly the same result as before! #win

Evolve your approach to Application Performance Monitoring by adopting five best practices that are outlined and explored in this e-book, brought to you in partnership with BMC.

Topics:

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

### {{ parent.tldr }}

{{ parent.urlSource.name }}