DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. Taking a Random Walk

Taking a Random Walk

Arthur Charpentier user avatar by
Arthur Charpentier
·
Feb. 06, 13 · Interview
Like (0)
Save
Tweet
Share
3.12K Views

Join the DZone community and get the full member experience.

Join For Free

consider the following time series,

what does this look like? i know, it's a stupid game, but i keep using it in my time series courses. it does look like a random walk, doesn’t it? if we use philipps-perron test, yes, it does,

> pp.test(x)

	phillips-perron unit root test

data:  x 
dickey-fuller = -2.2421, truncation lag parameter = 6, p-value = 0.4758

if we look at the autocorrelation function, we do observe some persistence,

> acf(x,100)

perhaps this persistence can be related to long range dependence, or to some fractional random walk. a natural idea could be estimate hurst parameter, using for instance beran (1992) estimator – based on whittle (1956) – where we assume that the autocorrelation function satisfies

as for some (the so called hurst index). but here, we start to observe unexpected ouputs,

> library(longmemo)
> (d  <- whittleest(x))
'whittleest' whittle estimator for  fractional gaussian noise ('fgn');	 call:
whittleest(x = x)
	  time series of length  n = 759.

h = 0.9899335
coefficients 'eta' =
    estimate std. error z value   pr(>|z|)
h 0.98993350 0.02468323 40.1055 < 2.22e-16
 <==> d := h - 1/2 = 0.49 (0.025)

 $ vcov       : num [1, 1] 0.000609
  ..- attr(*, "dimnames")=list of 2
  .. ..$ : chr "h"
  .. ..$ : chr "h"
 $ periodogr.x: num [1:379] 1479.3 1077.3 371.7 287.2 51.2 ...
 $ spec       : num [1:379] 62.5 31.7 21.3 16.1 12.9 ...

or more precisely some non-expected values for hurst parameter, which should be in

> confint(d)
      2.5 %   97.5 %
h 0.9415553 1.038312

oops, perhaps, we did miss something, because it looks like there is extremely strong persistence on our time series,

> plot(d)

it is probablty time to ask where i found that series… to be honest, i did borrow  it from a great canadian website http://climate.weatheroffice.gc.ca/climatedata/ . for instance, it you want the temperature we did experience a few days ago, you can use

> y=2013
> m=1
> d=25
> url=paste(
"http://climate.weatheroffice.gc.ca/climatedata/hourlydata_e.html?
timeframe=1&prov=qc&stationid=5415&hlyrange=1953-01-01|2013-02-
01&year=",y,"&month=",m,"&day=",d,sep="")
> page=scan(url,what="character")

yes, that series is the temperature we did experience in montréal last month (hourly time seies). on the graph below, you can actually compare it with temperature experienced in januarys over the past 60 years,

so it is not that surprising to see long range dependence models appearing (i did write a paper on that topic precisely a few years ago). what i found puzzeling is that persistence is large, extremely large. and the problem is that i do not see how we can explain ‘jumps’ that we do observe on that series. for instance the behavior of the series while i was in europe, before january 20th: within 3 days, the temperature went down, from 0°c to -20°c, and up from -20°c to 0°c, and then down again, from 0°c to -20°c (a nice и if we use cyrillic letters). or how can we explain the oscillating behavior observed the week after, where the temperature went up, from -25°c to (almost) +10°c in a few days. within 10 days, we did observe also two ‘jumps’ (or ‘ crashes ‘ if we want to use the terminology of financial time series) with a decrease of 25 degrees in less than 24 hours ! obviously, we need to find other classes of model to replicate that kind of behavior we observe on temperatures…

Persistence (computer science) Time series Testing IT Papers (software) Graph (Unix) Crash (computing)

Published at DZone with permission of Arthur Charpentier, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Debugging Threads and Asynchronous Code
  • AIOps Being Powered by Robotic Data Automation
  • Image Classification With DCNNs
  • Simulate Network Latency and Packet Drop In Linux

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: