# Information Theory and Beta-Binomial Bayesian Model

# Information Theory and Beta-Binomial Bayesian Model

### You've heard of the Hello World example, but Bayesian statistics puts a twist on it with the beta-binomial model. Come learn how it works!

Join the DZone community and get the full member experience.

Join For FreeHortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

The beta-binomial model is the "Hello World" example of Bayesian statistics. I would call it a toy model, except it is actually useful. It's not nearly as complicated as most models used in application but it illustrates the basics of Bayesian inference. Because it's a conjugate model, the calculations work out trivially.

For more on the beta-binomial model itself, see A Bayesian View of Amazon Resellers and Functional Folds and Conjugate Models.

I mentioned in a recent post that the Kullback-Leibler divergence from the prior distribution to the posterior distribution is a measure of how much information was gained.

Here's a little Python code for computing this. Enter the *a* and *b* parameters of the prior and the posterior to compute how much information was gained.

```
from scipy.integrate import quad
from scipy.stats import beta as beta
from scipy import log2
def infogain(post_a, post_b, prior_a, prior_b):
p = beta(post_a, post_b).pdf
q = beta(prior_a, prior_b).pdf
(info, error) = quad(lambda x: p(x) * log2(p(x) / q(x)), 0, 1)
return info
```

This code works well for medium-sized inputs. It has problems with large inputs because the generic integration routine `quad`

needs some help when the beta distributions become more concentrated.

You can see that surprising input carries more information. For example, suppose your prior is `beta(3, 7)`

. This distribution has a mean of 0.3 and so your expecting more failures than successes. With such a prior, a success changes your mind more than a failure does. You can quantify this by running these two calculations.

```
print( infogain(4, 7, 3, 7) )
print( infogain(3, 8, 3, 7) )
```

The first line shows that a success would change your information by 0.1563 bits, while the second shows that a failure would change it by 0.0297 bits.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub. Join the discussion.

Published at DZone with permission of John Cook , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}