Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

R: Seasonal Periods

DZone's Guide to

R: Seasonal Periods

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

I get ques­tions about this almost every week. Here is an exam­ple from a recent com­ment on this blog:

I have two large time series data. One is sep­a­rated by sec­onds inter­vals and the other by min­utes. The length of each time series is 180 days. I’m using R (3.1.1) for fore­cast­ing the data. I’d like to know the value of the “fre­quency” argu­ment in the ts() func­tion in R, for each data set. Since most of the exam­ples and cases I’ve seen so far are for months or days at the most, it is quite con­fus­ing for me when deal­ing with equally sep­a­rated sec­onds or min­utes. Accord­ing to my under­stand­ing, the “fre­quency” argu­ment is the num­ber of obser­va­tions per sea­son. So what is the “sea­son” in the case of seconds/​minutes? My guess is that since there are 86,400 sec­onds and 1440 min­utes a day, these should be the val­ues for the “freq” argu­ment. Is that correct?


The same ques­tion was asked on cross​val​i​dated​.com.

Yes, the “fre­quency” is the num­ber of obser­va­tions per sea­son. This is the oppo­site of the def­i­n­i­tion of fre­quency in physics, or in Fourier analy­sis, where “period” is the length of the cycle, and “fre­quency” is the inverse of period. When using the ts() func­tion in R, the fol­low­ing choices should be used.

Data frequency
Annual 1
Quar­terly 4
Monthly 12
Weekly 52

Actu­ally, there are not 52 weeks in a year, but 365.25÷7 = 52.18 on aver­age. But most func­tions which use ts objects require inte­ger frequency.

Once the fre­quency of obser­va­tions is smaller than a week, then there is usu­ally more than one way of han­dling the fre­quency. For exam­ple, hourly data might have a daily sea­son­al­ity (frequency=24), a weekly sea­son­al­ity (frequency=24x7=168) and an annual sea­son­al­ity (frequency=24x365.25=8766). If you want to use a ts object, then you need to decide which of these is the most important.

An alter­na­tive is to use a msts object (defined in the forecast pack­age) which han­dles mul­ti­ple sea­son­al­ity time series. Then you can spec­ify all the fre­quen­cies that might be rel­e­vant. It is also flex­i­ble enough to han­dle non-​​integer frequencies.

Data fre­quen­cies

minute hour day week year
Daily


7 365.25
Hourly

24 168 8766
Half-​​​​hourly

48 336 17532
Min­utes
60 1440 10080 525960
Sec­onds 60 3600 86400 604800 31557600

You won’t nec­es­sar­ily want to include all of these fre­quen­cies — just the ones that are likely to be present in the data. For exam­ple, any nat­ural phe­nom­ena (e.g., sun­shine hours) is unlikely to have a weekly period, and if your data are mea­sured in one-​​minute inter­vals over a 3 month period, there is no point includ­ing an annual frequency.

For exam­ple, the taylor data set from the forecast pack­age con­tains half-​​hourly elec­tric­ity demand data from Eng­land and Wales over about 3 months in 2000. It was defined as

taylor <- msts(x, seasonal.periods=c(48,336)

One con­ve­nient model for mul­ti­ple sea­sonal time series is a TBATS model:

taylor.fit <- tbats(taylor)
plot(forecast(taylor.fit))

(Warn­ing: this takes a few minutes.)

If an msts object is used with a func­tion designed for ts objects, the largest sea­sonal period is used as the “fre­quency” attribute.


Related Posts:

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}