Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Temperatures Series as Random Walks

DZone's Guide to

Temperatures Series as Random Walks

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Last year, I did mention in a post that unit-root tests are dangerous, because they might lead us to strange models. For instance, in a post, I did obtain that the temperature observed in January 2013, in Montréal, might be considered as a random walk process (or at least an integrated process). The code to extract the data has changed (since the website has been updated), so here, we use

library(RCurl)
library(XML)
options(RCurlOptions = list(useragent = "R"))
HEURE=0:23
extracttemp=function(Y,M,D){
url=paste(
"http://climate.weather.gc.ca/climateData/hourlydata_e.html?timeframe=1&Prov=QC&StationID=5415&Year=",Y,"&Month=",
M,"&Day=",D,sep="")
wp <- getURLContent(url)
doc <- htmlParse(wp, asText = TRUE) 
docName(doc) <- url
tmp <- readHTMLTable(doc)
basejour=data.frame(Year=Y,Month=M,Day=D,
Hour=HEURE,Temp=as.numeric(as.character(data.frame(tmp[2])[,2]))[2:25])
return(basejour)}
B=NULL
for(y in 1955:2013){
for(d in 1:31){
B=rbind(B,extracttemp(y,1,d))}}

Here are all the temperatures observed, and 2013,

plot(B$X,B$Temp,cex=.5,col="light blue",xlab="January, in Montreal",ylab="Temperature (Celsius)")
I=which(B$Year==2013)
lines(B$X[I],B$Temp[I],col="red")

In the previous post, one test only was used, and one year was considered. I was wondering if this behavior was observed only with temperature of 2013 (or not), and how the other tests (mentioned in a previous post too) were performing.

I might need a function, because those tests cannot be used if there is a missing value, even only one. So I did use the value observed one hour before (just to make sure that the tests can be done)

correcty=function(Y){
I=which(is.na(Y))
	if(length(I)==0){Yc=Y}
	if(length(I)>0){Yc=Y;for(i in I) Yc[i]=Yc[i-1]}	
return(Yc)
}

Now, we can compute the p-values, for all the years, and the three different three (keeping in mind that two test if the series is non-stationary, and one if the series is stationary)

DF=matrix(NA,2013-1954,3)
library(urca)
for(y in 1955:2013){
Z=B$Temp[which(B$Year==y)]
	Zc=correcty(Z)
	DF[y-1954,2]=as.numeric(pp.test(Zc)$p.value)	
	DF[y-1954,1]=as.numeric(kpss.test(Zc)$p.value)	
	DF[y-1954,3]=as.numeric(adf.test(Zc)$p.value)
}

Visually, if red means stationary, and blue means non-stationary, we get

DFP=DF
DFP[,1]=DF[,1]<.05
DFP[,2:3]=DF[,2:3]>.05
library(RColorBrewer)
CL=brewer.pal(6, "RdBu")
plot(0:1,0:1,xlim=c(1950,2015),ylim=c(0,3),axes=FALSE,xlab="",ylab="")
axis(1)
text(1952,.5,"KPSS")
text(1952,1.5,"PP")
text(1952,2.5,"ADF")
for(y in 1955:2013){
for(i in 1:3){
polygon(y+c(-1,-1,1,1)/2.2,i-.5+c(-1,1,1,-1)/2.2,col=CL[1+(DFP[y-1954,i]==1)*5],border=NA)}}

Quite frequently, we conclude that the temperature is a random walk. Which does not make sense (from a physical point of view). But again, it might come from the fact that temperature are stationary, but with some fractional behavior (as suggested in the previous post).


Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}