Over a million developers have joined DZone.

The Statistics of Easter

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

This morning, there was an interesting post entitled “why does Easter move around so much?” online on http://economist.com/blogs/economist-explains/…

In my time series classes, I keep saying that sometimes, series can exhibit seasonlity, but the seasonal effect can be quite irregular. It is the cas for river levels, where snowmelt can have a huge impact, and it is irregular. Similarly, chocolate sales (even monthly, or quarterly) depends on Easter. Because it can be either in March, or in April, the seasonal pattern is not as regular as flower sales for instance (Valentine beeing always on February 14th, as far as I remember). If we look at the word eggson http://google.com/trends/q=eggs…, we do observe a cycle related to Easter.

The title of the article published by http://economist.com/blogs/economist-explains/… claims that there is a lot of variability on Eater’s day. Let us check ! The answer to the question “When is Easter ?” can be the following (if we want a short answer): Easter Sunday is the first Sunday after the first full moon after vernal equinox. For more details, see e.g. http://ortelius.de/east. The algorithm used to compute the date of Easter can is online, on http://smart.net/~mmontes/….

> century = year/100
> G = year % 19
> K = (century - 17)/25
> I = (century - century/4 - (century - K)/3 + 19*G + 15) % 30
> I = I - (I/28)*(1 - (I/28)*(29/(I + 1))*((21 - G)/11))
> J = (year + year/4 + I + 2 - century + century/4) % 7
> L = I - J
> EasterMonth = 3 + (L + 40)/44
> EasterDay = L + 28 - 31*(EasterMonth/4)

Actually, this algorithm can be found in some R packages. Here we use the date of Easter from AD 1000 and AD 3000,

> library(timeDate)
> E=Easter(1000:3000)
> D=as.Date(E)
> table(months(D))/2001

    april     march 
0.7651174 0.2348826

(April being before March, in the alphabetical order) If we look at the distribution of the date, it is the following, the starting point being March 1st,

> J=as.numeric(D-as.Date(paste("01/03/",1000:3000,sep=""),"%d/%m/%Y"))
> hist(J,breaks=seq(20,55),col="light green")

And if we look at the autocorrelation function, we can observe that indeed, after 19 years, there is a strong correlation (that could be seen in the algorithm given previously),

> plot(acf(J))

But in order to get a better understanding of the dynamics, we can also look at transiftion matrices. Define

> Q=quantile(J,seq(0,1,by=.25))
> Q[1]=Q[1]-1
> C=cut(J,Q)

Then, the one year transition matrix is (in %)

> k=1; n=length(C)
> B=data.frame(X1=(C[1:(n-k)]),X2=(C[(k+1):n]))
> (T=table(B$X1,B$X2))

          (20,31] (31,39] (39,46] (46,55]
  (20,31]       0       0     265     277
  (31,39]     316       0      13     182
  (39,46]     224     264       0       0
  (46,55]       1     247     211       0
> P=T/apply(T,1,sum)
> round(P*1000)/10

          (20,31] (31,39] (39,46] (46,55]
  (20,31]     0.0     0.0    48.9    51.1
  (31,39]    61.8     0.0     2.5    35.6
  (39,46]    45.9    54.1     0.0     0.0
  (46,55]     0.2    53.8    46.0     0.0

I.e. if  Easter was early in the year (say in March, in the first quartile), then very likely, the year after, it will be late in the year (with 50% chance in the third quartile, and 50% chance in the fourth one).

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.


Published at DZone with permission of Arthur Charpentier, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}