Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

DZone's Guide to

# Information Theory and Beta-Binomial Bayesian Model

You've heard of the Hello World example, but Bayesian statistics puts a twist on it with the beta-binomial model. Come learn how it works!

· Big Data Zone ·
Free Resource

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Cloudera Data Flow, the answer to all your real-time streaming data problems. Manage your data from edge to enterprise with a no-code approach to developing sophisticated streaming applications easily. Learn more today.

The beta-binomial model is the "Hello World" example of Bayesian statistics. I would call it a toy model, except it is actually useful. It's not nearly as complicated as most models used in application but it illustrates the basics of Bayesian inference. Because it's a conjugate model, the calculations work out trivially.

For more on the beta-binomial model itself, see A Bayesian View of Amazon Resellers and Functional Folds and Conjugate Models.

I mentioned in a recent post that the Kullback-Leibler divergence from the prior distribution to the posterior distribution is a measure of how much information was gained.

Here's a little Python code for computing this. Enter the a and b parameters of the prior and the posterior to compute how much information was gained.

``````    from scipy.integrate import quad
from scipy.stats import beta as beta
from scipy import log2

def infogain(post_a, post_b, prior_a, prior_b):

p = beta(post_a, post_b).pdf
q = beta(prior_a, prior_b).pdf

(info, error) = quad(lambda x: p(x) * log2(p(x) / q(x)), 0, 1)
return info``````

This code works well for medium-sized inputs. It has problems with large inputs because the generic integration routine `quad` needs some help when the beta distributions become more concentrated.

You can see that surprising input carries more information. For example, suppose your prior is `beta(3, 7)`. This distribution has a mean of 0.3 and so your expecting more failures than successes. With such a prior, a success changes your mind more than a failure does. You can quantify this by running these two calculations.

``````    print( infogain(4, 7, 3, 7) )
print( infogain(3, 8, 3, 7) )``````

The first line shows that a success would change your information by 0.1563 bits, while the second shows that a failure would change it by 0.0297 bits.

Cloudera Enterprise Data Hub. One platform, many applications. Start today.

Topics:
bayesian ,scipy ,information theory ,big data ,data science ,python ,statistics

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.