What is ARIMA?

Author:Murphy  |  View: 25628  |  Time: 2025-03-23 19:58:30
Photo by Markus Spiske on Unsplash

Background

In my previous posts we have covered autoregession (AR) and moving-average (MA) models. However, do you know what is better than these two models? A single model that combines them!

_Autoregressive Integrated Moving Average_ better known as ARIMA, is probably the most used time series forecasting model and is combination of the individual aforementioned models. In this article, I want to dive into the theory and framework behind the ARIMA model. Then, we will go through a simple Python walkthrough in carrying out forecast with ARIMA using the statsmodels package!

#

Overview

As stated above, ARIMA stands for AutoRegressive Integrated Moving Average is basically just a combination of the three (in reality two) components:

AutoRegressive (AR):

This is just autoregression, where we forecast future values using a linear combination of the previously observed values:

Equation generated by author in LaTeX.

Here y is the time series we are forecasting at multiple time steps, ϕ are the coefficients of the lags, ε is the error term (often normally distributed) and p is the number of lagged components, also known as the order.

If you want to learn more about autoregression, checkout my previous post on it here:

How To Forecast Time-Series Using Autoregression

Integrated (I):

The middle part of the ARIMA model is named _integrated. This is the number (order d) of differencing required to make the time series stationary_.

Stationarity is where the time series has a constant mean and variance, meaning the statistical properties of the series does not change through time. Differencing de-trends a time series and tends to make the mean constant. You can apply differencing several times, but often the series is sufficiently stationary after a single differencing step.

It is important to note, that this integrated part only makes the mean constant. We need to apply another transform such as the _logarithmic and Box-Cox transform_ to generate a constant variance (more on this later).

If you want to learn more about stationarity, checkout my previous blog posts about it here:

Time-Series Stationarity Simply Explained

Box-Cox Transform for Time Series

Moving Average (MA):

The last component is the moving average where you forecast using past forecast errors instead of the actual observed values:

Equation generated by author in LaTeX.

Here y is the time series we are forecasting at multiple time steps, μ is the mean, θ are the coefficients of the lagged forecast errors, ε are **** the forecast error terms and q is the number of lagged error components.

If you want to learn more about moving average model, checkout my previous post on it here:

How To Forecast With Moving Average Models

Final Form

Combining all these components together, we can write the full model as:

Equation generated by author in LaTeX.

Where y' refers to the differenced version of the time series.

This is the full ARIMA equation and is just a linear summation of the three components. The model is usefully written in a short-hand way as ARIMA(p, d, q) where p, d and q refer to the order of autoregressors, differencing and moving-averages components respectively.

Requirements

As we touched upon earlier, the differencing component is there to help make the time series stationary. This is because the ARIMA model requires the data to be stationary for it to adequately model it. The mean is stabilised through differencing and the variance can be stabilised through the Box-Cox transform as we mentioned above.

Order Selection

One of the preprocessing steps is to determine the optimal orders (p, d, q) of our ARIMA model. The simplest one is the order of differencing d as this can be verified by carrying out a statistical test for stationarity. The most popular one is the _Augmented Dickey-Fuller (ADF)_, where the null hypothesis is that the time series is not stationary.

The autoregressive and moving-average orders (p,q) can be deduced by analysing the partial [autocorrelation function](/towards-data-science/autocorrelation-for-time-series-analysis-86e68e631f77) (PACF) and autocorrelation function respectively. The gist of of this method is to plot a correlogram of the various lags/forecast errors of the time series to determine which are statistically significant. If this seems arbitrary at the moment don't worry, in the Python tutorial later we will walkthrough this process.

Autocorrelation For Time Series Analysis

Partial Autocorrelation for Time Series Analysis

However, a more thorough technique is to simply iterate over all the possible combinations of orders and choose the model with the best score against a metric such as _Akaike's Information Criterion (AIC) or Bayesian Information Criterion (BIC)_. This is analogous to regular hyperparameter tuning and definitely the more robust method, but is more computational expensive of course.

Estimation

After choosing our orders, we then need to find their optimal corresponding coefficients. This where the need for stationarity comes in. As declared above, a stationary time series has constant statistical properties such as mean and variance. Therefore, all the data points are part of the same probability distribution, which makes fitting our model easier. Furthermore, forecasts are treated as random variables and will now belong to the same probability distribution as the newly generated stationary time series. Overall, it helps make the data in the future be somewhat like the past.

See this statsexchange thread for the reasons why stationarity is important for ARIMA.

As the stationary data belongs to some distribution (frequently the normal distribution), we can estimate the coefficients using Maximum Likelihood Estimation (MLE). MLE deduces the optimal values of the coefficients that produce the highest probability of obtaining that data. The MLE for normally distributed data, is the same result as carrying ordinary least squares. Therefore, least squares is also frequently used for this exact reason.

Link here for a great explanation of MLE.

ARIMA Python Tutorial

Data

The data we will use in this tutorial is the classic US airline passenger dataset:

Data from Kaggle with a CC0 licence.

Plot generated by author in Python.

The data is not stationary as there is a strong positive trend and the yearly seasonality fluctuations are increasing through time, hence the variance is increasing. For this modelling task we will be using the statsmodel package, which handily carries out differencing for us and produces a constant mean. However, we still need to apply the Box-Cox transform to retrieve a stabilised variance:

Plot generated by author in Python.

The seasonal fluctuations are now on a consistent level!

Modelling

In the previous sections, I mentioned how you can find the autoregressive and moving-average orders by plotting the autocorrelation and partial autocorrelation functions. Let's show an example of how you can do it here:

Plots generated by author in Python.

The blue region signifies where the points are no longer statistically significant and from the plot we see the last lag that is statistically significant for both plot is ~12th. Therefore, we would take the order of p and q to be 12.

Now, let's fit the model using the ARIMA function and generate the forecasts:

Analysis

Plot the forecasts:

Plot generated by author in Python.

The ARIMA model has captured the data very well!

Summary and Further Thoughts

In this article we have discussed one of the most common forecasting models used in practise, ARIMA. This model combines: autoregression, differencing and moving-average models into a single univariate case. ARIMA is simple to apply in Python the statsmodels package, which does a lot of the heavy lifting for you when fitting an ARIMA model.

Full code used in this article can be found at my GitHub here:

Medium-Articles/arima.py at main · egorhowell/Medium-Articles

Another Thing!

I have a free newsletter, Dishing the Data, where I share weekly tips for becoming a better Data Scientist. There is no "fluff" or "clickbait," just pure actionable insights from a practicing Data Scientist.

Dishing The Data | Egor Howell | Substack

Connect With Me!

References and Further Reading

Tags: Artificial Intelligence Data Science Machine Learning Mathematics Statistics

Comment