The Statistics of Easter
This morning, there was an interesting post entitled “why does Easter move around so much?” online on http://economist.com/blogs/economist-explains/…
In my time series classes, I keep saying that sometimes, series can exhibit seasonlity, but the seasonal effect can be quite irregular. It is the cas for river levels, where snowmelt can have a huge impact, and it is irregular. Similarly, chocolate sales (even monthly, or quarterly) depends on Easter. Because it can be either in March, or in April, the seasonal pattern is not as regular as flower sales for instance (Valentine beeing always on February 14th, as far as I remember). If we look at the word eggson http://google.com/trends/q=eggs…, we do observe a cycle related to Easter.
The title of the article published by http://economist.com/blogs/economist-explains/… claims that there is a lot of variability on Eater’s day. Let us check ! The answer to the question “When is Easter ?” can be the following (if we want a short answer): Easter Sunday is the first Sunday after the first full moon after vernal equinox. For more details, see e.g. http://ortelius.de/east. The algorithm used to compute the date of Easter can is online, on http://smart.net/~mmontes/….
> century = year/100 > G = year % 19 > K = (century - 17)/25 > I = (century - century/4 - (century - K)/3 + 19*G + 15) % 30 > I = I - (I/28)*(1 - (I/28)*(29/(I + 1))*((21 - G)/11)) > J = (year + year/4 + I + 2 - century + century/4) % 7 > L = I - J > EasterMonth = 3 + (L + 40)/44 > EasterDay = L + 28 - 31*(EasterMonth/4)
Actually, this algorithm can be found in some R packages. Here we use the date of Easter from AD 1000 and AD 3000,
> library(timeDate) > E=Easter(1000:3000) > D=as.Date(E) > table(months(D))/2001 april march 0.7651174 0.2348826
(April being before March, in the alphabetical order) If we look at the distribution of the date, it is the following, the starting point being March 1st,
> J=as.numeric(D-as.Date(paste("01/03/",1000:3000,sep=""),"%d/%m/%Y")) > hist(J,breaks=seq(20,55),col="light green")
And if we look at the autocorrelation function, we can observe that indeed, after 19 years, there is a strong correlation (that could be seen in the algorithm given previously),
But in order to get a better understanding of the dynamics, we can also look at transiftion matrices. Define
> Q=quantile(J,seq(0,1,by=.25)) > Q=Q-1 > C=cut(J,Q)
Then, the one year transition matrix is (in %)
> k=1; n=length(C) > B=data.frame(X1=(C[1:(n-k)]),X2=(C[(k+1):n])) > (T=table(B$X1,B$X2)) (20,31] (31,39] (39,46] (46,55] (20,31] 0 0 265 277 (31,39] 316 0 13 182 (39,46] 224 264 0 0 (46,55] 1 247 211 0 > P=T/apply(T,1,sum) > round(P*1000)/10 (20,31] (31,39] (39,46] (46,55] (20,31] 0.0 0.0 48.9 51.1 (31,39] 61.8 0.0 2.5 35.6 (39,46] 45.9 54.1 0.0 0.0 (46,55] 0.2 53.8 46.0 0.0
I.e. if Easter was early in the year (say in March, in the first quartile), then very likely, the year after, it will be late in the year (with 50% chance in the third quartile, and 50% chance in the fourth one).