# Information Theory and Beta-Binomial Bayesian Model

# Information Theory and Beta-Binomial Bayesian Model

You've heard of the Hello World example, but Bayesian statistics puts a twist on it with the beta-binomial model. Come learn how it works!

Join the DZone community and get the full member experience.

Join For Free**Cloudera Data Flow, the answer to all your real-time streaming data problems. Manage your data from edge to enterprise with a no-code approach to developing sophisticated streaming applications easily. Learn more today.**

The beta-binomial model is the "Hello World" example of Bayesian statistics. I would call it a toy model, except it is actually useful. It's not nearly as complicated as most models used in application but it illustrates the basics of Bayesian inference. Because it's a conjugate model, the calculations work out trivially.

For more on the beta-binomial model itself, see A Bayesian View of Amazon Resellers and Functional Folds and Conjugate Models.

I mentioned in a recent post that the Kullback-Leibler divergence from the prior distribution to the posterior distribution is a measure of how much information was gained.

Here's a little Python code for computing this. Enter the *a* and *b* parameters of the prior and the posterior to compute how much information was gained.

```
from scipy.integrate import quad
from scipy.stats import beta as beta
from scipy import log2
def infogain(post_a, post_b, prior_a, prior_b):
p = beta(post_a, post_b).pdf
q = beta(prior_a, prior_b).pdf
(info, error) = quad(lambda x: p(x) * log2(p(x) / q(x)), 0, 1)
return info
```

This code works well for medium-sized inputs. It has problems with large inputs because the generic integration routine `quad`

needs some help when the beta distributions become more concentrated.

You can see that surprising input carries more information. For example, suppose your prior is `beta(3, 7)`

. This distribution has a mean of 0.3 and so your expecting more failures than successes. With such a prior, a success changes your mind more than a failure does. You can quantify this by running these two calculations.

```
print( infogain(4, 7, 3, 7) )
print( infogain(3, 8, 3, 7) )
```

The first line shows that a success would change your information by 0.1563 bits, while the second shows that a failure would change it by 0.0297 bits.

** Cloudera Enterprise Data Hub. One platform, many applications. Start today.**

Published at DZone with permission of John Cook , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

## {{ parent.title || parent.header.title}}

{{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}