Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

R: Think Bayes – More Posterior Probability Calculations

DZone's Guide to

R: Think Bayes – More Posterior Probability Calculations

· Performance Zone
Free Resource

Evolve your approach to Application Performance Monitoring by adopting five best practices that are outlined and explored in this e-book, brought to you in partnership with BMC.

As I mentioned in a post recently, I’ve been reading through Think Bayes and translating some of the examples form Python to R.

After my first post Antonios suggested a more idiomatic way of writing the function in R so I thought I’d give it a try to calculate the probability that combinations of cookies had come from each bowl.

In the simplest case we have this function which takes in the names of the bowls and the likelihood scores:

f = function(names,likelihoods) {
  # Assume each option has an equal prior
  priors = rep(1, length(names)) / length(names)
 
  # create a data frame with all info you have
  dt = data.frame(names,priors,likelihoods)
 
  # calculate posterior probabilities
  dt$post = dt$priors*dt$likelihoods / sum(dt$priors*dt$likelihoods)
 
  # specify what you want the function to return
  list(names=dt$names, priors=dt$priors, likelihoods=dt$likelihoods, posteriors=dt$post)  
}

We assume a prior probability of 0.5 for each bowl.

Given the following probabilities of of different cookies being in each bowl…

mixes = {
  'Bowl 1':dict(vanilla=0.75, chocolate=0.25),
  'Bowl 2':dict(vanilla=0.5, chocolate=0.5),
}

…we can simulate taking one vanilla cookie with the following parameters:

Likelihoods = c(0.75,0.5)
Names = c("Bowl 1", "Bowl 2")
res=f(Names,Likelihoods)
 
> res$posteriors[res$name == "Bowl 1"]
[1] 0.6
> res$posteriors[res$name == "Bowl 2"]
[1] 0.4

If we want to simulate taking 3 vanilla cookies and 1 chocolate one we’d have the following:

Likelihoods = c((0.75 ** 3) * (0.25 ** 1), (0.5 ** 3) * (0.5 ** 1))
Names = c("Bowl 1", "Bowl 2")
res=f(Names,Likelihoods)
 
> res$posteriors[res$name == "Bowl 1"]
[1] 0.627907
> res$posteriors[res$name == "Bowl 2"]
[1] 0.372093

That’s a bit clunky and the intent of ‘3 vanilla cookies and 1 chocolate’ has been lost. I decided to refactor the code to take in a vector of cookies and calculate the likelihoods internally.

First we need to create a data structure to store the mixes of cookies in each bowl that we defined above. It turns out we can do this using a nested list:

bowl1Mix = c(0.75, 0.25)
names(bowl1Mix) = c("vanilla", "chocolate")
bowl2Mix = c(0.5, 0.5)
names(bowl2Mix) = c("vanilla", "chocolate")
Mixes = list("Bowl 1" = bowl1Mix, "Bowl 2" = bowl2Mix)
 
> Mixes
$`Bowl 1`
  vanilla chocolate 
     0.75      0.25 
 
$`Bowl 2`
  vanilla chocolate 
      0.5       0.5

Now let’s tweak our function to take in observations rather than likelihoods and then calculate those likelihoods internally:

likelihoods = function(names, observations) {
  scores = c(1,1)
  names(scores) = names
 
  for(name in names) {
      for(observation in observations) {
        scores[name] = scores[name] *  mixes[[name]][observation]      
      }
    }  
  return(scores)
}
 
f = function(names,mixes,observations) {
  # Assume each option has an equal prior
  priors = rep(1, length(names)) / length(names)
 
  # create a data frame with all info you have
  dt = data.frame(names,priors)
 
  dt$likelihoods = likelihoods(Names, Observations)
 
  # calculate posterior probabilities
  dt$post = dt$priors*dt$likelihoods / sum(dt$priors*dt$likelihoods)
 
  # specify what you want the function to return
  list(names=dt$names, priors=dt$priors, likelihoods=dt$likelihoods, posteriors=dt$post)  
}

And if we call that function:

Names = c("Bowl 1", "Bowl 2")
 
bowl1Mix = c(0.75, 0.25)
names(bowl1Mix) = c("vanilla", "chocolate")
bowl2Mix = c(0.5, 0.5)
names(bowl2Mix) = c("vanilla", "chocolate")
Mixes = list("Bowl 1" = bowl1Mix, "Bowl 2" = bowl2Mix)
Mixes
 
Observations = c("vanilla", "vanilla", "vanilla", "chocolate")
 
res=f(Names,Mixes,Observations)
 
> res$posteriors[res$names == "Bowl 1"]
[1] 0.627907
 
> res$posteriors[res$names == "Bowl 2"]
[1] 0.372093

Exactly the same result as before! #win

Evolve your approach to Application Performance Monitoring by adopting five best practices that are outlined and explored in this e-book, brought to you in partnership with BMC.

Topics:

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}