ARIMA Tutorial
0. Foreword
1. Introduction
ARIMA stands for “AutoRegressive Integrated Moving Average.” It is a widely used statistical method and time series forecasting model for analyzing and forecasting time series data. ARIMA models are particularly valuable for modeling and forecasting data that exhibits temporal patterns and trends.
Here’s what each component of ARIMA means:
AutoRegressive (AR): The “AR” component represents the autoregressive part of the model. This component models the relationship between the current value of the time series and its past values. In other words, it accounts for the influence of previous observations on the current one. The order of the autoregressive component, denoted as “p,” indicates how many past observations are included in the model.
Integrated (I): The “I” component stands for “integrated.” It represents the number of differencing operations needed to make the time series stationary. Stationarity is a key assumption in time series analysis, and differencing helps remove trends and seasonality. The order of differencing, denoted as “d,” indicates how many times differencing is applied to the data.
Moving Average (MA): The “MA” component represents the moving average part of the model. This component models the relationship between the current value of the time series and past forecast errors. It accounts for the influence of past prediction errors on the current observation. The order of the moving average component, denoted as “q,” indicates how many past forecast errors are included in the model.
In summary, ARIMA models are characterized by three main parameters: p, d, and q. These parameters are selected based on the characteristics of the time series data being analyzed. The modeling process typically involves:
- Identifying the order of differencing (d) required to make the data stationary.
- Determining the orders of autoregressive (p) and moving average (q) components through techniques like autocorrelation and partial autocorrelation plots.
- Estimating the model parameters.
- Fitting the ARIMA model to the data.
- Evaluating the model’s performance and making forecasts.
ARIMA models are versatile and have been successfully applied to a wide range of time series data, including economic and financial data, weather data, and many others. They are a fundamental tool in time series analysis and forecasting.
Data analysis
1 |
|
Autocorrelation
Partial Autocorrelation
Fitting cases
Case 1:
statsmodels
1 |
|
Case 2:
pmdarima
Evaluation indicators
Akaike Information Criterion
The Akaike Information Criterion (AIC) is a statistical measure used in model selection and statistical modeling to evaluate the goodness of fit of a model while penalizing it for complexity.
\[AIC = -2 \cdot \ln(L) + 2k\]
Where:
- “L” represents the maximized value of the likelihood function for the model, which measures how well the model fits the observed data.
- “k” represents the number of parameters (or degrees of freedom) in the model.
Bayesian Information Criterion
The BIC, or Bayesian Information Criterion, is a statistical criterion used for model selection, similar to the Akaike Information Criterion (AIC). It is named after the Bayesian statistician Gideon Schwarz.
\[BIC = -2 \cdot \ln(L) + k \cdot \ln(n)\]
Where:
- “L” represents the maximized value of the likelihood function for the model, which measures how well the model fits the data.
- “k” represents the number of parameters (or degrees of freedom) in the model.
- “n” represents the sample size (the number of data points).