# Information Theory and Beta-Binomial Bayesian Model

# Information Theory and Beta-Binomial Bayesian Model

### You've heard of the Hello World example, but Bayesian statistics puts a twist on it with the beta-binomial model. Come learn how it works!

Join the DZone community and get the full member experience.

Join For Free**The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.**

The beta-binomial model is the "Hello World" example of Bayesian statistics. I would call it a toy model, except it is actually useful. It's not nearly as complicated as most models used in application but it illustrates the basics of Bayesian inference. Because it's a conjugate model, the calculations work out trivially.

For more on the beta-binomial model itself, see A Bayesian View of Amazon Resellers and Functional Folds and Conjugate Models.

I mentioned in a recent post that the Kullback-Leibler divergence from the prior distribution to the posterior distribution is a measure of how much information was gained.

Here's a little Python code for computing this. Enter the *a* and *b* parameters of the prior and the posterior to compute how much information was gained.

```
from scipy.integrate import quad
from scipy.stats import beta as beta
from scipy import log2
def infogain(post_a, post_b, prior_a, prior_b):
p = beta(post_a, post_b).pdf
q = beta(prior_a, prior_b).pdf
(info, error) = quad(lambda x: p(x) * log2(p(x) / q(x)), 0, 1)
return info
```

This code works well for medium-sized inputs. It has problems with large inputs because the generic integration routine `quad`

needs some help when the beta distributions become more concentrated.

You can see that surprising input carries more information. For example, suppose your prior is `beta(3, 7)`

. This distribution has a mean of 0.3 and so your expecting more failures than successes. With such a prior, a success changes your mind more than a failure does. You can quantify this by running these two calculations.

```
print( infogain(4, 7, 3, 7) )
print( infogain(3, 8, 3, 7) )
```

The first line shows that a success would change your information by 0.1563 bits, while the second shows that a failure would change it by 0.0297 bits.

**Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.**

Published at DZone with permission of John Cook , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}