Over a million developers have joined DZone.

Confidence vs. Credibility Intervals

DZone's Guide to

Confidence vs. Credibility Intervals

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

Tomorrow, for the final lecture of my Mathematical Statistics course, I will try to illustrate - using Monte Carlo simulations - the difference between classical statistics, and the Bayesien approach.

The (simple) way I see it is the following,

  • for frequentists, a probability is a measure of the the frequency of repeated events, so the interpretation is that parameters are fixed (but unknown), and data are random
  • for Bayesians, a probability is a measure of the degree of certainty about values, so the interpretation is that parameters are random and data are fixed

Or to quote Frequentism and Bayesianism: A Python-driven Primer, a Bayesian statistician would say "given our observed data, there is a 95% probability that the true value of http://latex.codecogs.com/gif.latex?\theta falls within the credible region" while a Frequentist statistician would say "there is a 95% probability that when I compute a confidence interval from data of this sort, the true value of http://latex.codecogs.com/gif.latex?\theta will fall within it".

To get more intuition about those quotes, consider a simple problem, with Bernoulli trials, with insurance claims. We want to derive some confidence interval for the probability to claim a loss. There were http://latex.codecogs.com/gif.latex?n = 1047 policies. And 159 claims.

Consider the standard (frequentist) confidence interval. What does it mean that


is the (asymptotic) 95% confidence interval? The way I see it is very simple. Let us generate some samples, of size http://latex.codecogs.com/gif.latex?n, with the same probability as the empirical one, i.e. http://latex.codecogs.com/gif.latex?\widehat{\theta} (which is the meaning of "from data of this sort"). For each sample, compute the confidence interval with the relationship above. It is a 95% confidence interval because in 95% of the scenarios, the empirical value lies in the confidence interval. From a computation point of view, it is the following idea:

> xbar <- 159
> n <- 1047
> ns <- 100
> M=matrix(rbinom(n*ns,size=1,prob=xbar/n),nrow=n)

I generate 100 samples of size http://latex.codecogs.com/gif.latex?n. For each sample, I compute the mean, and the confidence interval, from the previous relationship:

> fIC=function(x) mean(x)+c(-1,1)*1.96*sqrt(mean(x)*(1-mean(x)))/sqrt(n)
> IC=t(apply(M,2,fIC))
> MN=apply(M,2,mean)

Then we plot all those confidence intervals. In red when they do not contain the empirical mean:

> k=(xbar/n<IC[,1])|(xbar/n>IC[,2])
> plot(MN,1:ns,xlim=range(IC),axes=FALSE,
+ xlab="",ylab="",pch=19,cex=.7,
+ col=c("blue","red")[1+k])
> axis(1)
> segments(IC[,1],1:ns,IC[,2],1:
+ ns,col=c("blue","red")[1+k])
> abline(v=xbar/n)

Now, what about the Bayesian credible interval? Assume that the prior distribution for the probability to claim a loss has a http://latex.codecogs.com/gif.latex?\mathcal{B}(\alpha,\beta) distribution. We've seen in the course that, since the Beta distribution is the conjugate of the Bernoulli one, the posterior distribution will also be Beta. More precisely:


Based on that property, the confidence interval is based on quantiles of that (posterior) distribution

> u=seq(.1,.2,length=501)
> v=dbeta(u,1+xbar,1+n-xbar)
> plot(u,v,axes=FALSE,type="l")
> I=u<qbeta(.025,1+xbar,1+n-xbar)
> polygon(c(u[I],rev(u[I])),c(v[I],
+ rep(0,sum(I))),col="red",density=30,border=NA)
> I=u>qbeta(.975,1+xbar,1+n-xbar)
> polygon(c(u[I],rev(u[I])),c(v[I],
+ rep(0,sum(I))),col="red",density=30,border=NA)
> axis(1)

What does it mean, here, that we have a 95% credible interval? Well, this time, we do not draw using the empirical mean, but some possible probability, based on that posterior distribution (given the observations):

> pk <- rbeta(ns,1+xbar,1+n-xbar)

In green, below, we can visualize the histogram of those values:

> hist(pk,prob=TRUE,col="light green",
+ border="white",axes=FALSE,
+ main="",xlab="",ylab="",lwd=3,xlim=c(.12,.18))

And here again, let us generate samples and compute the empirical probabilities:

> M=matrix(rbinom(n*ns,size=1,prob=rep(pk,
+ each=n)),nrow=n)
> MN=apply(M,2,mean)

Here, there is a 95% chance that those empirical means lie in the credible interval, defined using quantiles of the posterior distribution. We can actually visualize all those means: in black the mean used to generate the sample, and then, in blue or red, the averages obtained on those simulated samples:

> abline(v=qbeta(c(.025,.975),1+xbar,1+
+ n-xbar),col="red",lty=2)
> points(pk,seq(1,40,length=ns),pch=19,cex=.7)
> k=(MN<qbeta(.025,1+xbar,1+n-xbar))|
+ (MN>qbeta(.975,1+xbar,1+n-xbar))
> points(MN,seq(1,40,length=ns),
+ pch=19,cex=.7,col=c("blue","red")[1+k])
> segments(MN,seq(1,40,length=ns),
+ pk,seq(1,40,length=ns),col="grey")

More details and example on Bayesian statistics, seen with the eyes of a (probably) not Bayesian statistician in my slides, from my talk in London, last Summer:

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.


Published at DZone with permission of Arthur Charpentier, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}