The beta-binomial model is the "Hello World" example of Bayesian statistics. I would call it a toy model, except it is actually useful. It's not nearly as complicated as most models used in application but it illustrates the basics of Bayesian inference. Because it's a conjugate model, the calculations work out trivially.

For more on the beta-binomial model itself, see A Bayesian View of Amazon Resellers and Functional Folds and Conjugate Models.

I mentioned in a recent post that the Kullback-Leibler divergence from the prior distribution to the posterior distribution is a measure of how much information was gained.

Here's a little Python code for computing this. Enter the *a* and *b* parameters of the prior and the posterior to compute how much information was gained.

```
from scipy.integrate import quad
from scipy.stats import beta as beta
from scipy import log2
def infogain(post_a, post_b, prior_a, prior_b):
p = beta(post_a, post_b).pdf
q = beta(prior_a, prior_b).pdf
(info, error) = quad(lambda x: p(x) * log2(p(x) / q(x)), 0, 1)
return info
```

This code works well for medium-sized inputs. It has problems with large inputs because the generic integration routine `quad`

needs some help when the beta distributions become more concentrated.

You can see that surprising input carries more information. For example, suppose your prior is `beta(3, 7)`

. This distribution has a mean of 0.3 and so your expecting more failures than successes. With such a prior, a success changes your mind more than a failure does. You can quantify this by running these two calculations.

```
print( infogain(4, 7, 3, 7) )
print( infogain(3, 8, 3, 7) )
```

The first line shows that a success would change your information by 0.1563 bits, while the second shows that a failure would change it by 0.0297 bits.

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}