Over a million developers have joined DZone.

Fitting Models to Short Time Series

· Big Data Zone

Read this eGuide to discover the fundamental differences between iPaaS and dPaaS and how the innovative approach of dPaaS gets to the heart of today’s most pressing integration problems, brought to you in partnership with Liaison.

Fol­low­ing my post on fit­ting mod­els to long time series, I thought I’d tackle the oppo­site prob­lem, which is more com­mon in busi­ness environments.

I often get asked how few data points can be used to fit a time series model. As with almost all sam­ple size ques­tions, there is no easy answer. It depends on the num­ber of model para­me­ters to be esti­mated and the amount of ran­dom­ness in the data. The sam­ple size required increases with the num­ber of para­me­ters to be esti­mated, and the amount of noise in the data.

Using least squares esti­ma­tion, or some other non-​​regularized esti­ma­tion method, it is pos­si­ble to esti­mate a model only if you have more obser­va­tions than para­me­ters.  (If you use the LASSO, or some other reg­u­lar­iza­tion tech­nique, it is pos­si­ble to esti­mate a model with fewer obser­va­tions than para­me­ters.) How­ever, there is no guar­an­tee that a fit­ted model will be any good for fore­cast­ing, espe­cially when the data are noisy.

Some text­books pro­vide rules-​​of-​​thumb giv­ing min­i­mum sam­ple sizes for var­i­ous time series mod­els. These are mis­lead­ing and unsub­stan­ti­ated in the­ory or prac­tice. Fur­ther, they ignore the under­ly­ing vari­abil­ity of the data and often over­look the num­ber of para­me­ters to be esti­mated as well. There is, for exam­ple, no jus­ti­fi­ca­tion what­ever for the magic num­ber of 30 often given as a min­i­mum for ARIMA modelling.

The only rea­son­able approach is to first check that there are enough obser­va­tions to esti­mate the model, and then to test if the model per­forms well out-​​of-​​sample. With short series, there is not enough data to allow some obser­va­tions to be with­eld for test­ing pur­poses. How­ever, the AIC can be used as a proxy for the one-​​step fore­cast out-​​of-​​sample MSE (see here). The AIC allows both the num­ber of para­me­ters and the amount of noise to be taken into account.

What tends to hap­pen with short series is that the AIC sug­gests very sim­ple mod­els because any­thing with more than one or two para­me­ters will pro­duce poor fore­casts due to the esti­ma­tion error.  I applied the auto.arima() func­tion from the fore­cast pack­age in R to all the series from the M-​​competition with fewer than 20 obser­va­tions. There were a total of 144 series, of which 32 had mod­els with zero para­me­ters (ran­dom walks), 95 had mod­els with one para­me­ter, 15 had mod­els with two para­me­ters and 2 series had mod­els with three para­me­ters. For what it’s worth, here is the code.

n <- unlist(lapply(M1,function(x){length(x$x)}))
n <- n[n<20]
series <- names(n)
nparam <- numeric(length(n))
for(i in 1:length(n))
  fit <- auto.arima(M1[[series[i]]]$x)
  nparam[i] <- length(fit$coef)

Sea­sonal mod­els bring their own dif­fi­cul­ties because the sea­son­al­ity usu­ally takes up m-1 degrees of free­dom where m is the sea­sonal period (e.g., m=12 for monthly data). Fourier terms are one way to reduce the prob­lem — use­ful when­ever the ratio of m to sam­ple size is large. Fur­ther com­ments on sea­son­al­ity and sam­ple size are in my short Fore­sight paper with Andrey Kostenko: “Min­i­mum sam­ple size require­ments for sea­sonal fore­cast­ing mod­els”, although I wrote that for a sta­tis­ti­cally unso­phis­ti­cated audi­ence, so there is no men­tion of the LASSO or AIC as pos­si­ble solutions.

Discover the unprecedented possibilities and challenges, created by today’s fast paced data climate and why your current integration solution is not enough, brought to you in partnership with Liaison


Published at DZone with permission of Rob J Hyndman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}