Over a million developers have joined DZone.

Temperatures Series as Random Walks

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

Last year, I did mention in a post that unit-root tests are dangerous, because they might lead us to strange models. For instance, in a post, I did obtain that the temperature observed in January 2013, in Montréal, might be considered as a random walk process (or at least an integrated process). The code to extract the data has changed (since the website has been updated), so here, we use

library(RCurl)
library(XML)
options(RCurlOptions = list(useragent = "R"))
HEURE=0:23
extracttemp=function(Y,M,D){
url=paste(
"http://climate.weather.gc.ca/climateData/hourlydata_e.html?timeframe=1&Prov=QC&StationID=5415&Year=",Y,"&Month=",
M,"&Day=",D,sep="")
wp <- getURLContent(url)
doc <- htmlParse(wp, asText = TRUE) 
docName(doc) <- url
tmp <- readHTMLTable(doc)
basejour=data.frame(Year=Y,Month=M,Day=D,
Hour=HEURE,Temp=as.numeric(as.character(data.frame(tmp[2])[,2]))[2:25])
return(basejour)}
B=NULL
for(y in 1955:2013){
for(d in 1:31){
B=rbind(B,extracttemp(y,1,d))}}

Here are all the temperatures observed, and 2013,

plot(B$X,B$Temp,cex=.5,col="light blue",xlab="January, in Montreal",ylab="Temperature (Celsius)")
I=which(B$Year==2013)
lines(B$X[I],B$Temp[I],col="red")

In the previous post, one test only was used, and one year was considered. I was wondering if this behavior was observed only with temperature of 2013 (or not), and how the other tests (mentioned in a previous post too) were performing.

I might need a function, because those tests cannot be used if there is a missing value, even only one. So I did use the value observed one hour before (just to make sure that the tests can be done)

correcty=function(Y){
I=which(is.na(Y))
	if(length(I)==0){Yc=Y}
	if(length(I)>0){Yc=Y;for(i in I) Yc[i]=Yc[i-1]}	
return(Yc)
}

Now, we can compute the p-values, for all the years, and the three different three (keeping in mind that two test if the series is non-stationary, and one if the series is stationary)

DF=matrix(NA,2013-1954,3)
library(urca)
for(y in 1955:2013){
Z=B$Temp[which(B$Year==y)]
	Zc=correcty(Z)
	DF[y-1954,2]=as.numeric(pp.test(Zc)$p.value)	
	DF[y-1954,1]=as.numeric(kpss.test(Zc)$p.value)	
	DF[y-1954,3]=as.numeric(adf.test(Zc)$p.value)
}

Visually, if red means stationary, and blue means non-stationary, we get

DFP=DF
DFP[,1]=DF[,1]<.05
DFP[,2:3]=DF[,2:3]>.05
library(RColorBrewer)
CL=brewer.pal(6, "RdBu")
plot(0:1,0:1,xlim=c(1950,2015),ylim=c(0,3),axes=FALSE,xlab="",ylab="")
axis(1)
text(1952,.5,"KPSS")
text(1952,1.5,"PP")
text(1952,2.5,"ADF")
for(y in 1955:2013){
for(i in 1:3){
polygon(y+c(-1,-1,1,1)/2.2,i-.5+c(-1,1,1,-1)/2.2,col=CL[1+(DFP[y-1954,i]==1)*5],border=NA)}}

Quite frequently, we conclude that the temperature is a random walk. Which does not make sense (from a physical point of view). But again, it might come from the fact that temperature are stationary, but with some fractional behavior (as suggested in the previous post).


Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:

Published at DZone with permission of Arthur Charpentier, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}