Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

ARIMA Forecasting With SAS

DZone's Guide to

ARIMA Forecasting With SAS

The ARIMA procedure analyzes and forecasts equally spaced univariate time series data, transfer function data, and intervention data.

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

ARIMA stands for auto-regressive integrated moving average. It is also known as the Box-Jenkins model, as the ARIMA has been technique popularized by Box and Jenkins. For ARIMA forecasting, data needs to be stationary.

The ARIMA procedure analyzes and forecasts equally spaced univariate time series data, transfer function data, and intervention data by using auto-regressive integrated moving averages.

PROC ARIMA in SAS can be used to forecast.

Identification Stage

The Identification Stage computes auto-correlation, inverse autocorrelations, partial autocorrelations, and cross-correlations. Stationarity tests can be peformed to identify whether differencing is necessary. It also provides desscriptive statistics.

proc arima data=retail ;
   identify var=sales nlag=22;
run;

nlag controls the number of lags for which auto correlation is shown. It should be always less than the number of observation in your dataset.

var is used to specify the name of variable that need to foreacst.

The identify statement produces panels of plots for auto-correlation and trend analysis.

  • Time series plot of the series.

  • Auto-correlation function plot (ACF).

  • Inverse autocorrelation function plot (IACF).

  • Partial autocorrelation function plot (PACF).

Image title

Differencing

If you plot sales, it seems that sales are changing from period to period. So data is non-stationary.

Now, we need to convert data into stationary data. It can be done as shown below.

proc arima data=LIBREF.FORECAST ;
   identify var=sales(1) nlag=22;
run;

Image title

If we see the sales plot, it is non-stationery.

White Noise Test

In this case, white noise is rejected as a p-value for all lags less than or equal to 0.05. This is considered a good fit model.

Image titleDescriptive Statistics

The identify statement prints descriptive statistics for the sales series.

Image titleEstimation and Diagnostic Stage

The estimate statement is used to specify the ARIMA model to fit to the variable specified in the previous identify statement and to estimate the parameters of that model.

The estimate statement also produces diagnostic statistics to help you judge the adequacy of the model.

proc arima data = LIBREF.FORECAST;
identify var = Sales(1) nlag = 20 ;
estimate p = 1  q = 1;
run;

Image titleForecasting Stage

The FORECAST statement is used to forecast future values of the time series and to generate confidence intervals for these forecasts from the ARIMA model produced by the preceding ESTIMATE statement.

proc arima data = LIBREF.FORECAST;
identify var = Sales(1) nlag = 20 ;
estimate p = 1  q = 1;
run;
forecast lead=12 interval=month id=period out=results;
quit; 
  • lead specifies how many period ahead to forecast. (12 months is our example).

  • id specifies the ID variable (which is generally SAS date, time, and datetime).

  • interval indicates the data are monthly.

  • out allows us to write the forecast data to the datasets results.

Image title

Image title

Now, you know how to use the ARIMA model for forecatsing.

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:
sas ,arima ,tutorial ,big data

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}