R: Seasonal Periods

DZone 's Guide to

R: Seasonal Periods

· Big Data Zone ·
Free Resource

I get ques­tions about this almost every week. Here is an exam­ple from a recent com­ment on this blog:

I have two large time series data. One is sep­a­rated by sec­onds inter­vals and the other by min­utes. The length of each time series is 180 days. I’m using R (3.1.1) for fore­cast­ing the data. I’d like to know the value of the “fre­quency” argu­ment in the ts() func­tion in R, for each data set. Since most of the exam­ples and cases I’ve seen so far are for months or days at the most, it is quite con­fus­ing for me when deal­ing with equally sep­a­rated sec­onds or min­utes. Accord­ing to my under­stand­ing, the “fre­quency” argu­ment is the num­ber of obser­va­tions per sea­son. So what is the “sea­son” in the case of seconds/​minutes? My guess is that since there are 86,400 sec­onds and 1440 min­utes a day, these should be the val­ues for the “freq” argu­ment. Is that correct?

The same ques­tion was asked on cross​val​i​dated​.com.

Yes, the “fre­quency” is the num­ber of obser­va­tions per sea­son. This is the oppo­site of the def­i­n­i­tion of fre­quency in physics, or in Fourier analy­sis, where “period” is the length of the cycle, and “fre­quency” is the inverse of period. When using the ts() func­tion in R, the fol­low­ing choices should be used.

Data frequency
Annual 1
Quar­terly 4
Monthly 12
Weekly 52

Actu­ally, there are not 52 weeks in a year, but 365.25÷7 = 52.18 on aver­age. But most func­tions which use ts objects require inte­ger frequency.

Once the fre­quency of obser­va­tions is smaller than a week, then there is usu­ally more than one way of han­dling the fre­quency. For exam­ple, hourly data might have a daily sea­son­al­ity (frequency=24), a weekly sea­son­al­ity (frequency=24x7=168) and an annual sea­son­al­ity (frequency=24x365.25=8766). If you want to use a ts object, then you need to decide which of these is the most important.

An alter­na­tive is to use a msts object (defined in the forecast pack­age) which han­dles mul­ti­ple sea­son­al­ity time series. Then you can spec­ify all the fre­quen­cies that might be rel­e­vant. It is also flex­i­ble enough to han­dle non-​​integer frequencies.

Data fre­quen­cies

minute hour day week year

7 365.25

24 168 8766

48 336 17532
60 1440 10080 525960
Sec­onds 60 3600 86400 604800 31557600

You won’t nec­es­sar­ily want to include all of these fre­quen­cies — just the ones that are likely to be present in the data. For exam­ple, any nat­ural phe­nom­ena (e.g., sun­shine hours) is unlikely to have a weekly period, and if your data are mea­sured in one-​​minute inter­vals over a 3 month period, there is no point includ­ing an annual frequency.

For exam­ple, the taylor data set from the forecast pack­age con­tains half-​​hourly elec­tric­ity demand data from Eng­land and Wales over about 3 months in 2000. It was defined as

taylor <- msts(x, seasonal.periods=c(48,336)

One con­ve­nient model for mul­ti­ple sea­sonal time series is a TBATS model:

taylor.fit <- tbats(taylor)

(Warn­ing: this takes a few minutes.)

If an msts object is used with a func­tion designed for ts objects, the largest sea­sonal period is used as the “fre­quency” attribute.

Related Posts:


Published at DZone with permission of Rob J Hyndman , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}