Time series analysis is widely used for forecasting and predicting future points in a time series. AutoRegressive Integrated Moving Average (ARIMA) models are widely used for time series forecasting and are considered one of the most popular approaches. In this tutorial, we will learn how to build and evaluate ARIMA models for time series forecasting in Python.
The ARIMA model is a statistical model utilized for analyzing and predicting time series data. The ARIMA approach explicitly caters to standard structures found in time series, providing a simple yet powerful method for making skillful time series forecasts.
ARIMA stands for AutoRegressive Integrated Moving Average. It combines three key aspects:
- Autoregression (AR): A model that uses the correlation between the current observation and lagged observations. The number of lagged observations is referred to as the lag order or p.
- Integrated (I): The use of differencing of raw observations to make the time series stationary. The number of differencing operations is referred to as d.
- Moving Average (MA): A model takes into account the relationship between the current observation and the residual errors from a moving average model applied to past observations. The size of the moving average window is the order or q.
The ARIMA model is defined with the notation ARIMA(p,d,q) where p, d, and q are substituted with integer values to specify the exact model being used.
Key assumptions when adopting an ARIMA model:
- The time series was generated from an underlying ARIMA process.
- The parameters p, d, q must be appropriately specified based on the raw observations.
- The time series data must be made stationary via differencing before fitting the ARIMA model.
- The residuals should be uncorrelated and normally distributed if the model fits well.
In summary, the ARIMA model provides a structured and configurable approach for modeling time series data for purposes like forecasting. Next we will look at fitting ARIMA models in Python.
In this tutorial, we will use Netflix Stock Data from Kaggle to forecast the Netflix stock price using the ARIMA model.
Data Loading
We will load our stock price dataset with the “Date” column as index.
import pandas as pd
net_df = pd.read_csv("Netflix_stock_history.csv", index_col="Date", parse_dates=True)
net_df.head(3)
Data Visualization
We can use pandas ‘plot’ function to visualize the changes in stock price and volume over time. It’s clear that the stock prices are increasing exponentially.
net_df[["Close","Volume"]].plot(subplots=True, layout=(2,1));
Rolling Forecast ARIMA Model
Our dataset has been split into training and test sets, and we proceeded to train an ARIMA model. The first prediction was then forecasted.
We received a poor outcome with the generic ARIMA model, as it produced a flat line. Therefore, we have decided to try a rolling forecast method.
Note: The code example is a modified version of the notebook by BOGDAN IVANYUK.
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error, mean_absolute_error
import math
train_data, test_data = net_df[0:int(len(net_df)*0.9)], net_df[int(len(net_df)*0.9):]
train_arima = train_data['Open']
test_arima = test_data['Open']
history = [x for x in train_arima]
y = test_arima
# make first prediction
predictions = list()
model = ARIMA(history, order=(1,1,0))
model_fit = model.fit()
yhat = model_fit.forecast()[0]
predictions.append(yhat)
history.append(y[0])
When dealing with time series data, a rolling forecast is often necessary due to the dependence on prior observations. One way to do this is to re-create the model after each new observation is received.
To keep track of all observations, we can manually maintain a list called history, which initially contains training data and to which new observations are appended each iteration. This approach can help us get an accurate forecasting model.
# rolling forecasts
for i in range(1, len(y)):
# predict
model = ARIMA(history, order=(1,1,0))
model_fit = model.fit()
yhat = model_fit.forecast()[0]
# invert transformed prediction
predictions.append(yhat)
# observation
obs = y[i]
history.append(obs)
Model Evaluation
Our rolling forecast ARIMA model showed a 100% improvement over simple implementation, yielding impressive results.
# report performance
mse = mean_squared_error(y, predictions)
print('MSE: '+str(mse))
mae = mean_absolute_error(y, predictions)
print('MAE: '+str(mae))
rmse = math.sqrt(mean_squared_error(y, predictions))
print('RMSE: '+str(rmse))
MSE: 116.89611817706545
MAE: 7.690948135967959
RMSE: 10.811850821069696
Let’s visualize and compare the actual results to the predicted ones . It’s clear that our model has made highly accurate predictions.
import matplotlib.pyplot as plt
plt.figure(figsize=(16,8))
plt.plot(net_df.index[-600:], net_df['Open'].tail(600), color="green", label="Train Stock Price")
plt.plot(test_data.index, y, color="red", label="Real Stock Price")
plt.plot(test_data.index, predictions, color="blue", label="Predicted Stock Price")
plt.title('Netflix Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('Netflix Stock Price')
plt.legend()
plt.grid(True)
plt.savefig('arima_model.pdf')
plt.show()
In this short tutorial, we provided an overview of ARIMA models and how to implement them in Python for time series forecasting. The ARIMA approach provides a flexible and structured way to model time series data that relies on prior observations as well as past prediction errors. If you’re interested in a comprehensive analysis of the ARIMA model and Time Series analysis, I recommend taking a look at Stock Market Forecasting Using Time Series Analysis.
Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master’s degree in Technology Management and a bachelor’s degree in Telecommunication Engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.