Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

DZone's Guide to

# R: Think Bayes Euro Problem

· Performance Zone ·
Free Resource

Comment (1)

Save
{{ articles[0].views | formatCount}} Views

Learn how error monitoring with Sentry closes the gap between the product team and your customers. With Sentry, you can focus on what you do best: building and scaling software that makes your users’ lives better.

I’ve got back to working my way through Think Bayes after a month’s break and started out with the one euro coin problem in Chapter 4:

A statistical statement appeared in “The Guardian” on Friday January 4, 2002:

When spun on edge 250 times, a Belgian one-euro coin came up heads 140 times and tails 110. ‘It looks very suspicious to me,’ said Barry Blight, a statistics lecturer at the London School of Economics. ‘If the coin were unbiased, the chance of getting a result as extreme as that would be less than 7%.’

But do these data give evidence that the coin is biased rather than fair?

We’re going to create a data frame with each row representing the probability that heads shows up that often. We need one row for each value between 0 (no heads) and 100 (all heads) and we’ll start with the assumption that each value can be chosen equally (a uniform prior):

```library(dplyr)

values = seq(0, 100)
scores = rep(1.0 / length(values), length(values))
df = data.frame(score = scores, value = values)

> df %>% sample_n(10)
score value
60  0.00990099    59
101 0.00990099   100
10  0.00990099     9
41  0.00990099    40
2   0.00990099     1
83  0.00990099    82
44  0.00990099    43
97  0.00990099    96
100 0.00990099    99
12  0.00990099    11```

Now we need to feed in our observations. We need to create a vector containing 140 heads and 110 tails. The ‘rep’ function comes in handy here:

```observations = c(rep("T", times = 110), rep("H", times = 140))
> observations
[1] "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T"
[29] "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T"
[57] "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T"
[85] "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "H" "H"
[113] "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H"
[141] "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H"
[169] "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H"
[197] "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H"
[225] "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H" "H"
```

Now we need to iterate over each of the observations and update our data frame appropriately.

```for(observation in observations) {
if(observation == "H") {
df = df %>% mutate(score = score * (value / 100.0))
} else {
df = df %>% mutate(score = score * (1.0 - (value / 100.0)))
}
}

df = df %>% mutate(weighted = score / sum(score))
```

Now that we’ve done that we can calculate the maximum likelihood, mean, median and credible interval. We’ll create a ‘percentile’ function to help us out:

```percentile = function(df, p) {
df %>% filter(cumsum(weighted) > p) %>% head(1) %>% select(value) %>% as.numeric
}```

And now let’s calculate the values:

```# Maximum likelihood
> df %>% filter(weighted == max(weighted)) %>% select(value) %>% as.numeric
[1] 56

# Mean
> df %>% mutate(mean = value * weighted) %>% select(mean) %>% sum
[1] 55.95238

# Median
> percentile(df, 0.5)
[1] 56

# Credible Interval
percentage = 90
prob = (1 - percentage / 100.0) / 2

# lower
> percentile(df, prob)
[1] 51

# upper
> percentile(df, 1 - prob)
[1] 61
```

This all wraps up nicely into a function:

```euro = function(values, priors, observations) {
df = data.frame(score = priors, value = values)

for(observation in observations) {
if(observation == "H") {
df = df %>% mutate(score = score * (value / 100.0))
} else {
df = df %>% mutate(score = score * (1.0 - (value / 100.0)))
}
}

return(df %>% mutate(weighted = score / sum(score)))
}
```

which we can call like so:

```values = seq(0,100)
priors = rep(1.0 / length(values), length(values))
observations = c(rep("T", times = 110), rep("H", times = 140))
df = euro(values, priors, observations)```

The next part of the problem requires us to change the prior distribution to be more weighted to values close to 50%. We can tweak the parameters we pass into the function accordingly:

```values = seq(0,100)
priors = sapply(values, function(x) ifelse(x < 50, x, 100 - x))
priors = priors / sum(priors)
observations = c(rep("T", times = 110), rep("H", times = 140))
df = euro(values, priors, observations)```

In fact even with the adjusted priors we still end up with the same posterior distribution:

```> df %>% filter(weighted == max(weighted)) %>% select(value) %>% as.numeric
[1] 56

> df %>% mutate(mean = value * weighted) %>% select(mean) %>% sum
[1] 55.7435

> percentile(df, 0.5)
[1] 56

> percentile(df, 0.05)
[1] 51

> percentile(df, 0.95)
[1] 61```

What’s the best way to boost the efficiency of your product team and ship with confidence? Check out this ebook to learn how Sentry's real-time error monitoring helps developers stay in their workflow to fix bugs before the user even knows there’s a problem.

Topics:

Comment (1)

Save
{{ articles[0].views | formatCount}} Views

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

# {{ parent.title || parent.header.title}}

### {{ parent.tldr }}

{{ parent.urlSource.name }}