Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Temperatures Series as Random Walks

DZone's Guide to

Temperatures Series as Random Walks

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Last year, I did mention in a post that unit-root tests are dangerous, because they might lead us to strange models. For instance, in a post, I did obtain that the temperature observed in January 2013, in Montréal, might be considered as a random walk process (or at least an integrated process). The code to extract the data has changed (since the website has been updated), so here, we use

library(RCurl)
library(XML)
options(RCurlOptions = list(useragent = "R"))
HEURE=0:23
extracttemp=function(Y,M,D){
url=paste(
"http://climate.weather.gc.ca/climateData/hourlydata_e.html?timeframe=1&Prov=QC&StationID=5415&Year=",Y,"&Month=",
M,"&Day=",D,sep="")
wp <- getURLContent(url)
doc <- htmlParse(wp, asText = TRUE) 
docName(doc) <- url
tmp <- readHTMLTable(doc)
basejour=data.frame(Year=Y,Month=M,Day=D,
Hour=HEURE,Temp=as.numeric(as.character(data.frame(tmp[2])[,2]))[2:25])
return(basejour)}
B=NULL
for(y in 1955:2013){
for(d in 1:31){
B=rbind(B,extracttemp(y,1,d))}}

Here are all the temperatures observed, and 2013,

plot(B$X,B$Temp,cex=.5,col="light blue",xlab="January, in Montreal",ylab="Temperature (Celsius)")
I=which(B$Year==2013)
lines(B$X[I],B$Temp[I],col="red")

In the previous post, one test only was used, and one year was considered. I was wondering if this behavior was observed only with temperature of 2013 (or not), and how the other tests (mentioned in a previous post too) were performing.

I might need a function, because those tests cannot be used if there is a missing value, even only one. So I did use the value observed one hour before (just to make sure that the tests can be done)

correcty=function(Y){
I=which(is.na(Y))
	if(length(I)==0){Yc=Y}
	if(length(I)>0){Yc=Y;for(i in I) Yc[i]=Yc[i-1]}	
return(Yc)
}

Now, we can compute the p-values, for all the years, and the three different three (keeping in mind that two test if the series is non-stationary, and one if the series is stationary)

DF=matrix(NA,2013-1954,3)
library(urca)
for(y in 1955:2013){
Z=B$Temp[which(B$Year==y)]
	Zc=correcty(Z)
	DF[y-1954,2]=as.numeric(pp.test(Zc)$p.value)	
	DF[y-1954,1]=as.numeric(kpss.test(Zc)$p.value)	
	DF[y-1954,3]=as.numeric(adf.test(Zc)$p.value)
}

Visually, if red means stationary, and blue means non-stationary, we get

DFP=DF
DFP[,1]=DF[,1]<.05
DFP[,2:3]=DF[,2:3]>.05
library(RColorBrewer)
CL=brewer.pal(6, "RdBu")
plot(0:1,0:1,xlim=c(1950,2015),ylim=c(0,3),axes=FALSE,xlab="",ylab="")
axis(1)
text(1952,.5,"KPSS")
text(1952,1.5,"PP")
text(1952,2.5,"ADF")
for(y in 1955:2013){
for(i in 1:3){
polygon(y+c(-1,-1,1,1)/2.2,i-.5+c(-1,1,1,-1)/2.2,col=CL[1+(DFP[y-1954,i]==1)*5],border=NA)}}

Quite frequently, we conclude that the temperature is a random walk. Which does not make sense (from a physical point of view). But again, it might come from the fact that temperature are stationary, but with some fractional behavior (as suggested in the previous post).


Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}