Put on your comfiest lo-fi, grab an oversized sweater, your favorite hot beverage and let’s go python.
It’s that time again in the northern hemisphere: a time for apples, pumpkins and various configurations of cinnamon, nutmeg, ginger, allspice and cloves. And as the grocery islands begin to prepare for Halloween, Thanksgiving, and the winter holidays, it’s a good time to dust off my statistical modeling skills. Put away your seasoned lattes and let’s make some seasonal, feature-oriented models. The complete codebook can be found here.
Hypothesis:
Pumpkin Spice’s popularity as a Google search term in the US will have strong seasonality, as it is associated with American fall holidays and seasonal food dishes.
Null hypothesis:
Using data from last week or last year will be more predictive of this week’s level of popularity for the search term “pumpkin spice.”
Data:
Data from the last 5 years of Google Trends, extracted on October 7, 2023. (1)
- Make a naive model where last week/last year’s data is this week’s prediction. Specifically, it is not enough for my final model to be accurate or inaccurate in a vacuum. My final model must overcome using historical data as direct prediction.
- Splitting the train test will give me two sets of data, one for the algorithm to learn from. The other is to test how well my algorithm worked.
- The seasonal decomposition will give me a rough idea of how predictable my data is by trying to separate the overall annual trend from seasonal patterns and noise. A smaller noise scale will mean that more data can be captured in an algorithm.
- A series of statistical tests to determine whether the data is stationary. If the data is not stationary, I will have to take a first difference (run a time delta function where the data for each time step only shows the difference with the data from the previous time step. This will force the data to become stationary).
- Make some SARIMA models, using autocorrelation plot inferences for the moving average term and partial autocorrelation plot inferences for the autoregressive term. SARIMA is an option for time series modeling and I will try ACF and PACF inference before trying a brute force approach with Auto Arima.
- Try using Auto Arima, which will loop through many terms and select the best combination of terms. I want to experiment to see if the parameters you provide me for a SARIMA model produce a better performing model.
- Test ETS models, using seasonal decomposition inference about whether x is additive or multiplicative over time. The ETS models focus more on seasonality and overall trend than the SARIMA family models, and can give me an advantage in capturing the relationship pumpkin spice has over time.
Performance mapping KPI:
- Try using the MAPE score because it is an industry standard in many workplaces and people may be used to it. It’s easy to understand.
- Try to use the RMSE score because it is more useful.
- Plot predictions against test data and visually verify performance.
As we can see in the graph above, this data shows great potential for seasonal modeling. There is a clear increase in the second half of each year, with a decrease and another increase before falling to our baseline.
However, each year’s main peak is higher every year except 2021, which makes sense, given the pandemic, when celebrating the season may not have been on people’s minds.
Note: These imports appear differently in the notebook itself, than in the notebook I trust. seasonal_mod.py
which has many of my imports built in.
These are the libraries I used to create the codebook. I opted for statsmodels instead of scikit-learn for their time series packages; I like statsmodels better for most linear regression problems.
I don’t know about you, but I don’t want to write several lines of code every time I make a new model and then more code to verify. So instead I made some functions to keep my code DRY and avoid making mistakes.
These three little functions work together, so I just need to run metrics_graph()
with y_true
and y_preds
as input and will give me a blue line of true data and a red line of predictive data, along with MAPE and RMSE. That will save me time and hassle.
Using last year’s data as a benchmark for success:
My background in retail management influenced my decision to test last week’s data and last year’s data as a direct prediction of this year’s data. Often in retail, we use data from last season (1 unit of time ago) as a direct prediction, to secure inventory during Black Friday, for example. Last week’s data didn’t perform as well as last year’s.
Last week’s data predicting this week’s data showed a MAPE score of just over 18, with an RMSE of about 11. In comparison, last year’s data as a direct prediction of this year’s data showed a MAPE score of about 12 with an RMSE of about 7.
Therefore, I chose to compare all the statistical models I built with a naive model using last year’s data. This model obtained the timing of spikes and declines more accurately than our naive weekly model; however, I still thought I could do better. The next step in modeling was to perform a seasonal decomposition.
The following function helped me run my season decomposition and I will keep it as reusable code for all future models in the future.
Below is how I used that seasonal decomposition.
The additive model had a recurring annual pattern in the residuals, evidence that an additive model was not able to completely decompose all recurring patterns. It was a good reason to try a multiplicative model for annual peaks.
Now the residuals from the multiplicative decomposition were much more promising. They were much more random and on a much smaller scale, showing that a multiplicative model would capture the data better. The fact that the residuals were so small (on a scale between 1.5 and -1) meant that the modeling was very promising.
But now I wanted a function to run SARIMA models specifically, entering only the order. I wanted to experience running c
,t
and ct
versions of the SARIMA model with those orders as well, since the seasonal decomposition favored a multiplicative type of model over an additive type of model. Using the c
, t
and ct
in it trend =
parameter, I was able to add multipliers to my SARIMA model.
I’ll skip the description of the part where I looked at the AFC and PACF charts and the part where I also tested PMD auto arima to find the best terms to use on the SARIMA models. If you’re interested in those details, check out my full codebook.
My best SARIMA model:
So, my best SARIMA model had a higher MAPE score than my naive model, from almost 29 to almost 12, but a lower RMSE by about one unit, from almost 7 to almost 6. My biggest problem using this model is that really underestimated the 2023 peak. , there is a good amount of area between the red and blue lines from August to September 2023. There are reasons to like it better than my annual naive model or worse than my annual naive model, depending on your opinions on RMSE vs MAPE. However, it wasn’t over yet. My final model was definitely better than my naive annual model.
I used an ETS (exponential smoothing) model for my final model, which allowed me to explicitly use the seasonal
parameter to use a multiplicative approach.
Now you may be thinking “but this model has a higher MAPE score than the annual naive model.” And you would be right, about 0.3%. However, I think it’s a more than fair deal considering I now have an RMSE of about 4 and a half instead of 7. While this model struggles a little more in December 2022 than my best SARIMA model, it has less storage area. difference. amount for that peak than the larger peak for fall 2023, which matters more to me. You can find that model here.
I’ll wait until 10/7/2024 and do another data pull and see how the model did with last year’s data.
In short, I was able to disprove the null hypothesis, my final model outperformed a naive annual model. I have shown that the popularity of pumpkin spice on Google is very seasonal and can be predicted. Between the naive SARMA models and the ETS models, ETS was able to better capture the relationship between time and popularity of pumpkin spice. The multiplicative relationship of pumpkin spice with time implies that the popularity of pumpkin spice is based on more than one independent variable other than time in the expression time * unknown_independant_var = pumpkin_spice_popularity
.
What I learned and future work:
My next step is to use some version of Meta Graph API to search for “pumpkin spices” used in commercial items. I wonder how that data correlates with my Google Trends data. I also learned that when the seasonal decomposition points toward a multiplicative model, I will reach an ETS much earlier in my process.
Additionally, I am interested in automating much of this process. Ideally, I would like to create a Python module where the input is a CSV directly from Google Trends and the output can be a usable model with good enough documentation that a non-technical user can create and test their own predictive models. In the case that a user chooses data that is difficult to predict (i.e. a naive or random walk model would be more suitable), I hope to create the module to explain it to users. You could then collect data from an application using that module to show seasonality findings in many untested data.
Keep an eye out for that app for next year’s pumpkin spice season!
(1) Google Trends, N/A (https://www.google.com/trends)