Over a million developers have joined DZone.

Confidence vs. Credibility Intervals

DZone's Guide to

Confidence vs. Credibility Intervals

· Big Data Zone
Free Resource

Access NoSQL and Big Data through SQL using standard drivers (ODBC, JDBC, ADO.NET). Free Download 

Tomorrow, for the final lecture of my Mathematical Statistics course, I will try to illustrate - using Monte Carlo simulations - the difference between classical statistics, and the Bayesien approach.

The (simple) way I see it is the following,

  • for frequentists, a probability is a measure of the the frequency of repeated events, so the interpretation is that parameters are fixed (but unknown), and data are random
  • for Bayesians, a probability is a measure of the degree of certainty about values, so the interpretation is that parameters are random and data are fixed

Or to quote Frequentism and Bayesianism: A Python-driven Primer, a Bayesian statistician would say "given our observed data, there is a 95% probability that the true value of http://latex.codecogs.com/gif.latex?\theta falls within the credible region" while a Frequentist statistician would say "there is a 95% probability that when I compute a confidence interval from data of this sort, the true value of http://latex.codecogs.com/gif.latex?\theta will fall within it".

To get more intuition about those quotes, consider a simple problem, with Bernoulli trials, with insurance claims. We want to derive some confidence interval for the probability to claim a loss. There were http://latex.codecogs.com/gif.latex?n = 1047 policies. And 159 claims.

Consider the standard (frequentist) confidence interval. What does it mean that


is the (asymptotic) 95% confidence interval? The way I see it is very simple. Let us generate some samples, of size http://latex.codecogs.com/gif.latex?n, with the same probability as the empirical one, i.e. http://latex.codecogs.com/gif.latex?\widehat{\theta} (which is the meaning of "from data of this sort"). For each sample, compute the confidence interval with the relationship above. It is a 95% confidence interval because in 95% of the scenarios, the empirical value lies in the confidence interval. From a computation point of view, it is the following idea:

> xbar <- 159
> n <- 1047
> ns <- 100
> M=matrix(rbinom(n*ns,size=1,prob=xbar/n),nrow=n)

I generate 100 samples of size http://latex.codecogs.com/gif.latex?n. For each sample, I compute the mean, and the confidence interval, from the previous relationship:

> fIC=function(x) mean(x)+c(-1,1)*1.96*sqrt(mean(x)*(1-mean(x)))/sqrt(n)
> IC=t(apply(M,2,fIC))
> MN=apply(M,2,mean)

Then we plot all those confidence intervals. In red when they do not contain the empirical mean:

> k=(xbar/n<IC[,1])|(xbar/n>IC[,2])
> plot(MN,1:ns,xlim=range(IC),axes=FALSE,
+ xlab="",ylab="",pch=19,cex=.7,
+ col=c("blue","red")[1+k])
> axis(1)
> segments(IC[,1],1:ns,IC[,2],1:
+ ns,col=c("blue","red")[1+k])
> abline(v=xbar/n)

Now, what about the Bayesian credible interval? Assume that the prior distribution for the probability to claim a loss has a http://latex.codecogs.com/gif.latex?\mathcal{B}(\alpha,\beta) distribution. We've seen in the course that, since the Beta distribution is the conjugate of the Bernoulli one, the posterior distribution will also be Beta. More precisely:


Based on that property, the confidence interval is based on quantiles of that (posterior) distribution

> u=seq(.1,.2,length=501)
> v=dbeta(u,1+xbar,1+n-xbar)
> plot(u,v,axes=FALSE,type="l")
> I=u<qbeta(.025,1+xbar,1+n-xbar)
> polygon(c(u[I],rev(u[I])),c(v[I],
+ rep(0,sum(I))),col="red",density=30,border=NA)
> I=u>qbeta(.975,1+xbar,1+n-xbar)
> polygon(c(u[I],rev(u[I])),c(v[I],
+ rep(0,sum(I))),col="red",density=30,border=NA)
> axis(1)

What does it mean, here, that we have a 95% credible interval? Well, this time, we do not draw using the empirical mean, but some possible probability, based on that posterior distribution (given the observations):

> pk <- rbeta(ns,1+xbar,1+n-xbar)

In green, below, we can visualize the histogram of those values:

> hist(pk,prob=TRUE,col="light green",
+ border="white",axes=FALSE,
+ main="",xlab="",ylab="",lwd=3,xlim=c(.12,.18))

And here again, let us generate samples and compute the empirical probabilities:

> M=matrix(rbinom(n*ns,size=1,prob=rep(pk,
+ each=n)),nrow=n)
> MN=apply(M,2,mean)

Here, there is a 95% chance that those empirical means lie in the credible interval, defined using quantiles of the posterior distribution. We can actually visualize all those means: in black the mean used to generate the sample, and then, in blue or red, the averages obtained on those simulated samples:

> abline(v=qbeta(c(.025,.975),1+xbar,1+
+ n-xbar),col="red",lty=2)
> points(pk,seq(1,40,length=ns),pch=19,cex=.7)
> k=(MN<qbeta(.025,1+xbar,1+n-xbar))|
+ (MN>qbeta(.975,1+xbar,1+n-xbar))
> points(MN,seq(1,40,length=ns),
+ pch=19,cex=.7,col=c("blue","red")[1+k])
> segments(MN,seq(1,40,length=ns),
+ pk,seq(1,40,length=ns),col="grey")

More details and example on Bayesian statistics, seen with the eyes of a (probably) not Bayesian statistician in my slides, from my talk in London, last Summer:

The fastest databases need the fastest drivers - learn how you can leverage CData Drivers for high performance NoSQL & Big Data Access.


Published at DZone with permission of Arthur Charpentier, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}