# R: Think Bayes Euro Problem

# R: Think Bayes Euro Problem

Join the DZone community and get the full member experience.

Join For FreeLearn how error monitoring with Sentry closes the gap between the product team and your customers. With Sentry, you can focus on what you do best: building and scaling software that makes your users’ lives better.

I’ve got back to working my way through Think Bayes after a month’s break and started out with the one euro coin problem in Chapter 4:

A statistical statement appeared in “The Guardian” on Friday January 4, 2002:

When spun on edge 250 times, a Belgian one-euro coin came up heads 140 times and tails 110. ‘It looks very suspicious to me,’ said Barry Blight, a statistics lecturer at the London School of Economics. ‘If the coin were unbiased, the chance of getting a result as extreme as that would be less than 7%.’

But do these data give evidence that the coin is biased rather than fair?

We’re going to create a data frame with each row representing the probability that heads shows up that often. We need one row for each value between 0 (no heads) and 100 (all heads) and we’ll start with the assumption that each value can be chosen equally (a uniform prior):

library(dplyr) values = seq(0, 100) scores = rep(1.0 / length(values), length(values)) df = data.frame(score = scores, value = values) > df %>% sample_n(10) score value 60 0.00990099 59 101 0.00990099 100 10 0.00990099 9 41 0.00990099 40 2 0.00990099 1 83 0.00990099 82 44 0.00990099 43 97 0.00990099 96 100 0.00990099 99 12 0.00990099 11

Now we need to feed in our observations. We need to create a vector containing 140 heads and 110 tails. The ‘rep’ function comes in handy here:

observations = c(rep("T", times = 110), rep("H", times = 140)) > observations [1] "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" [29] "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" [57] "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" [85] "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "H" "H" [113] "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" [141] "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" [169] "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" [197] "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" [225] "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H"

Now we need to iterate over each of the observations and update our data frame appropriately.

for(observation in observations) { if(observation == "H") { df = df %>% mutate(score = score * (value / 100.0)) } else { df = df %>% mutate(score = score * (1.0 - (value / 100.0))) } } df = df %>% mutate(weighted = score / sum(score))

Now that we’ve done that we can calculate the maximum likelihood, mean, median and credible interval. We’ll create a ‘percentile’ function to help us out:

percentile = function(df, p) { df %>% filter(cumsum(weighted) > p) %>% head(1) %>% select(value) %>% as.numeric }

And now let’s calculate the values:

# Maximum likelihood > df %>% filter(weighted == max(weighted)) %>% select(value) %>% as.numeric [1] 56 # Mean > df %>% mutate(mean = value * weighted) %>% select(mean) %>% sum [1] 55.95238 # Median > percentile(df, 0.5) [1] 56 # Credible Interval percentage = 90 prob = (1 - percentage / 100.0) / 2 # lower > percentile(df, prob) [1] 51 # upper > percentile(df, 1 - prob) [1] 61

This all wraps up nicely into a function:

euro = function(values, priors, observations) { df = data.frame(score = priors, value = values) for(observation in observations) { if(observation == "H") { df = df %>% mutate(score = score * (value / 100.0)) } else { df = df %>% mutate(score = score * (1.0 - (value / 100.0))) } } return(df %>% mutate(weighted = score / sum(score))) }

which we can call like so:

values = seq(0,100) priors = rep(1.0 / length(values), length(values)) observations = c(rep("T", times = 110), rep("H", times = 140)) df = euro(values, priors, observations)

The next part of the problem requires us to change the prior distribution to be more weighted to values close to 50%. We can tweak the parameters we pass into the function accordingly:

values = seq(0,100) priors = sapply(values, function(x) ifelse(x < 50, x, 100 - x)) priors = priors / sum(priors) observations = c(rep("T", times = 110), rep("H", times = 140)) df = euro(values, priors, observations)

In fact even with the adjusted priors we still end up with the same posterior distribution:

> df %>% filter(weighted == max(weighted)) %>% select(value) %>% as.numeric [1] 56 > df %>% mutate(mean = value * weighted) %>% select(mean) %>% sum [1] 55.7435 > percentile(df, 0.5) [1] 56 > percentile(df, 0.05) [1] 51 > percentile(df, 0.95) [1] 61

What’s the best way to boost the efficiency of your product team and ship with confidence? Check out this ebook to learn how Sentry's real-time error monitoring helps developers stay in their workflow to fix bugs before the user even knows there’s a problem.

Published at DZone with permission of Mark Needham , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}