Tomorrow, for the final lecture of my *Mathematical Statistics* course, I will try to illustrate - using Monte Carlo simulations - the difference between classical statistics, and the Bayesien approach.

The (simple) way I see it is the following,

- for frequentists, a probability is a measure of the the frequency of repeated events, so the interpretation is that
**parameters are fixed (but unknown), and data are random** - for Bayesians, a probability is a measure of the degree of certainty about values, so the interpretation is that
**parameters are random and data are fixed**

Or to quote Frequentism and Bayesianism: A Python-driven Primer, a Bayesian statistician would say "given our observed data, there is a 95% probability that the true value of falls within the credible region" while a Frequentist statistician would say "there is a 95% probability that when I compute a confidence interval from data of this sort, the true value of will fall within it".

To get more intuition about those quotes, consider a simple problem, with Bernoulli trials, with insurance claims. We want to derive some confidence interval for the probability to claim a loss. There were = 1047 policies. And 159 claims.

Consider the standard (frequentist) confidence interval. What does it mean that

is the (asymptotic) 95% confidence interval? The way I see it is very simple. Let us generate some samples, of size , with the same probability as the empirical one, i.e. (which is the meaning of "from data of this sort"). For each sample, compute the confidence interval with the relationship above. It is a 95% confidence interval because in 95% of the scenarios, the empirical value lies in the confidence interval. From a computation point of view, it is the following idea:

> xbar <- 159 > n <- 1047 > ns <- 100 > M=matrix(rbinom(n*ns,size=1,prob=xbar/n),nrow=n)

I generate 100 samples of size . For each sample, I compute the mean, and the confidence interval, from the previous relationship:

> fIC=function(x) mean(x)+c(-1,1)*1.96*sqrt(mean(x)*(1-mean(x)))/sqrt(n) > IC=t(apply(M,2,fIC)) > MN=apply(M,2,mean)

Then we plot all those confidence intervals. In red when they do not contain the empirical mean:

> k=(xbar/n<IC[,1])|(xbar/n>IC[,2]) > plot(MN,1:ns,xlim=range(IC),axes=FALSE, + xlab="",ylab="",pch=19,cex=.7, + col=c("blue","red")[1+k]) > axis(1) > segments(IC[,1],1:ns,IC[,2],1: + ns,col=c("blue","red")[1+k]) > abline(v=xbar/n)

Now, what about the Bayesian credible interval? Assume that the prior distribution for the probability to claim a loss has a distribution. We've seen in the course that, since the Beta distribution is the conjugate of the Bernoulli one, the posterior distribution will also be Beta. More precisely:

Based on that property, the confidence interval is based on quantiles of that (posterior) distribution

> u=seq(.1,.2,length=501) > v=dbeta(u,1+xbar,1+n-xbar) > plot(u,v,axes=FALSE,type="l") > I=u<qbeta(.025,1+xbar,1+n-xbar) > polygon(c(u[I],rev(u[I])),c(v[I], + rep(0,sum(I))),col="red",density=30,border=NA) > I=u>qbeta(.975,1+xbar,1+n-xbar) > polygon(c(u[I],rev(u[I])),c(v[I], + rep(0,sum(I))),col="red",density=30,border=NA) > axis(1)

What does it mean, here, that we have a 95% credible interval? Well, this time, we do not draw using the empirical mean, but some possible probability, based on that posterior distribution (given the observations):

> pk <- rbeta(ns,1+xbar,1+n-xbar)

In green, below, we can visualize the histogram of those values:

> hist(pk,prob=TRUE,col="light green", + border="white",axes=FALSE, + main="",xlab="",ylab="",lwd=3,xlim=c(.12,.18))

And here again, let us generate samples and compute the empirical probabilities:

> M=matrix(rbinom(n*ns,size=1,prob=rep(pk, + each=n)),nrow=n) > MN=apply(M,2,mean)

Here, there is a 95% chance that those empirical means lie in the credible interval, defined using quantiles of the posterior distribution. We can actually visualize all those means: in black the mean used to generate the sample, and then, in blue or red, the averages obtained on those simulated samples:

> abline(v=qbeta(c(.025,.975),1+xbar,1+ + n-xbar),col="red",lty=2) > points(pk,seq(1,40,length=ns),pch=19,cex=.7) > k=(MN<qbeta(.025,1+xbar,1+n-xbar))| + (MN>qbeta(.975,1+xbar,1+n-xbar)) > points(MN,seq(1,40,length=ns), + pch=19,cex=.7,col=c("blue","red")[1+k]) > segments(MN,seq(1,40,length=ns), + pk,seq(1,40,length=ns),col="grey")

More details and example on Bayesian statistics, seen with the eyes of a (probably) not Bayesian statistician in my slides, from my talk in London, last Summer:

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}