Time series forecasting is a crucial task in various fields of business and science. There are two co-existing approaches to time series forecasting, statistical methods and machine learning methods, and both come with different strengths and limitations. Hybrid methods promise to advance time series forecasting by combining the best aspects of statistics and machine learning. This blog post gives a deeper understanding of the different approaches to forecasting and seeks to give hints on choosing an appropriate algorithm.
Approaches for Time Series Forecasting
Statistical methods such as the Holt-Winters or ARIMA have been practiced for decades. They stand out due to their robustness and flexibility. Moreover, these methods work well when few data is available and can exploit a priori knowledge. However, statistical methods mostly assume linear relationships in the data, which is not necessarily the case in real-world data, inhibiting forecasting performance.
In the recent past, machine learning methods such as Long Short-Term Memory Networks (LSTMs) or Convolutional Neural Networks (CNNs) were successfully applied to forecasting tasks. Their major advantage is that they do not have the assumption of linearity and have the exceptional capability of universally approximating almost any function. In addition to that, machine learning methods can exploit cross-series information to enhance an individual forecast. Besides these strengths, machine learning methods face several limitations. Apart from demanding huge amounts of data and extensive computational times, methods based on machine learning have trouble extrapolating data.
Although machine learning methods successfully overcome the drawback of ARIMA models in nonlinear relationships, they have produced mixed results for purely linear time series. Thus, neither ARIMA nor neural networks are solely sufficient to model a real-world time series [1].
Why Hybrid?
Hybrid methods promise to advance time series forecasting by combining the best of statistical and machine learning methods.
The fundamental idea is that this combination compensates for the limitations of one approach with the strengths of the other.
For instance, the effectiveness of statistical methods with limited data availability can counteract the extensive data requirements of machine learning methods. Apart from that, the consideration of a priori knowledge can simplify the expected forecasting task and decrease the computational effort. Furthermore, hybrid methods can incorporate cross-learning, a capability that many statistical methods lack. Finally, hybrid methods provide a solution to the dilemma of the assumption of linearity. As real-world time series may be purely linear, purely nonlinear, or often contain a combination of those two patterns, hybrid methods can be effective where traditional approaches reach their limits [2].
State of the Art
Hybrid methods are more than just ensembles! While ensemble methods are a simplistic yet effective combination of forecasts such as averages or weighted averages, hybrid methods follow a different strategy. They can be understood as a sophisticated combination of statistical and machine learning methods that interact with each other.
Research on hybrid methods is not a novelty, as the work began almost twenty years ago. The basic concept was first proposed by Zhang, who describes a hybridization of ARIMA and MLP. The underlying principle is that the MLP learns the deviation of the ARIMA prediction from the actual value and seeks to adjust it to obtain a more accurate result. This methodology is plausible as a time series is composed of a linear and a nonlinear component. The ARIMA is fitted to capture the linear component, so consequently, the residuals from the linear model account for the nonlinear relationship. The MLP takes the past residuals as input to learn a function that can be used to forecast the deviation of the ARIMA predictions. Finally, the hybrid forecast is obtained by adding the predictions of both models [3].
Breakthrough with Slawek Smyl’s ES-RNN Model
Even though various authors advanced the idea of hybrid methods, it took several years to shake up the world of forecasting again. It was Slawek Smyl’s winning submission to the M4 Competition in 2018 that caused a great stir among forecasters. He combined exponential smoothing (ES) with recurrent neural networks to an ES-RNN model that outperformed every other submission. The core of his model is a stack of dilated LSTMs that conducts a trend forecast for the exponential smoothing, which is a slightly modified Holt-Winters [4]. His success demonstrates the enormous potential of hybrid methods for time series forecasting.
In the first step of the model, the data points of the time series are fed into the Holt-Winters, where each time series is handled differently depending on its frequency. There are three types of models: non-seasonal models for yearly and daily data, single seasonality models for monthly, quarterly, and weekly data, and double seasonality models for hourly data. The Holt-Winters allows for the estimation of the level and seasonality components for all points of the series and forms a vector of per-series parameters.
In the following step the values are de-seasonalized, adaptively normalized, and squashed. The de-seasonalization is achieved by dividing the observed value y_{i} by the respective seasonality components from Holt-Winters. Dividing by the last value of the input window results in an adaptive normalization. This adaptive normalization comes with several advantages such as saving more information on the trend’s strength. Taking the log of the division performs a squashing of the values which counteracts that potential outliers have an overly disturbing effect on learning.
The resulting vector of preprocessed values is the input for the RNN architecture which is built up of dilated LSTM-based stacks. Dilated LSTMs yield the advantage of improving long-term memory performance compared to standard LSTMs. In his winning model, Smyl applied a technique introduced by him a few years earlier: an ensemble of specialists. The key idea is that large datasets of unknown sources can be split into subsets and individual models for these subsets improve forecasting performance compared to a single global model. Following this, the principle of an ensemble of specialists is to concurrently train several models and force them to specialize on a subset of the dataset [5].
Finally, the method’s last step performs the reverse transformation of the dilated LSTM stacks’ trend forecast, which includes re-seasonalization and de-normalization and obtains the actual forecast.
The First Step in Forecasting: Understanding the Time Series
Forecasting can not commence without understanding the time series thoroughly. Thus, it is important to take into account whether the behavior of observations changes over time. This section defines some fundamental concepts of time series analysis and introduces basic terminology.
If the variance of the disturbance term in each observation is constant over time, the time series is considered homoscedastic. On the contrary, heteroscedasticity is defined by the variance not being constant.
Stationary is another important concept in time series analysis and requires the properties of a time series to be independent of the point in time they are observed. This implies that the mean and the variance of a stationary time series remain constant and, consequently, time series that exhibit trend or seasonality are not stationary. Homoscedasticity, therefore, is a necessary, but not sufficient criterion for stationarity. Apart from the stationary time series, there are trend stationary time series. In this case, the process has a stationary behavior around a trend.
Whereas trend is a non-periodic change in data, seasonality is present when a time series is affected by a periodic change such as the month of the year or the day of the week. Periodical changes that are typical for a longer duration than one year are referred to as cycles.
Within the analysis, we consider eight well-known time series that differ in terms of stationarity, trend, seasonality, and cycle. Table 1 provides details for each time series and presents the three identified clusters:
Cluster | Time series | Homogeneity of variance | Trend | Seasonality / Cyclicity |
Stationary | Industrial Production Index | Homoscedastic | None | Half year seasonality |
Trend stationary | GBP USD Daily Exchange Rate | Heteroscedastic | None | None |
Sunspot | Heteroscedastic | None | Cycles of approx. 11 years | |
Non-stationary | Airline Passengers | Heteroscedastic | Positive, almost linear | Yearly seasonality |
Canadian Lynx | Homoscedastic | None | Cycles of approx. 10 years | |
Daily Minimum Temperatures Melbourne | Heteroscedastic | None | Yearly seasonality | |
Daily Total Female Births California | Homoscedastic | None | Weekly seasonality | |
Rossmann Store Sales | Heteroscedastic | None | Multiple seasonalities |
Comparing Statistics, Machine Learning & Hybrid Methods
To compare the different forecasting methods, their performance on the previously introduced time series is assessed. In particular, this blog post presents results for the following methods:
- Statistics: Holt-Winters, ARIMA
- Machine learning: MLP, LSTM
- Hybrid: ES-RNN
More details about the evaluation can be found in my master’s thesis. Table 2 summarizes the respective performance for each test set and indicates the forecasting method’s rank based on the Root Mean Squared Error (RMSE). Furthermore, the improvement over baseline (IOB) provides information about the forecast’s relative improvement compared to either a naive baseline or a forecast using linear regression.
Cluster | Time series | Performance indicator | Holt-Winters | ARIMA | MLP | LSTM | ES-RNN |
Stationary | Industrial Production Index | Rank | 1 | 2 | 4 | 5 | 3 |
IOB | +68.75% | +59.16% | +44.39% | -46.62% | +57.11% | ||
Trend stationary | GBP USD Daily Exchange Rate | Rank | 1 | 2 | 2 | 4 | 5 |
IOB | 0.00% | -6.90% | -6.90% | -17.24% | -89.66% | ||
Sunspot | Rank | 4 | 1 | 3 | 2 | 5 | |
IOB | -30.15% | +56.21% | +15.13% | +33.18% | -166.33% | ||
Non-stationary | Airline Passengers | Rank | 1 | 2 | 3 | 5 | 4 |
IOB | +79.30% | +31.82% | +28.32% | -60.53% | +0.30% | ||
Canadian Lynx | Rank | 4 | 2 | 1 | 3 | 5 | |
IOB | -47.25% | +46.21% | +48.50% | +19.95% | -130.19% | ||
Daily Minimum Temperatures Melbourne | Rank | 2 | 1 | 3 | 4 | 5 | |
IOB | -3.37% | -1.75% | -5.87% | -23.97% | -66.23% | ||
Daily Total Female Births California | Rank | 4 | 5 | 1 | 3 | 2 | |
IOB | +1.02% | -0.53% | +25.69% | +1.71% | +15.07% | ||
Rossmann Store Sales | Rank | 4 | 2 | 1 | 3 | 5 | |
IOB | -5.24% | +21.20% | +39.13% | -4.92% | -33.35% |
The results show that statistical methods outperform the machine learning and hybrid methods in several cases. In these particular cases, the periodicity can be precisely specified and the statistical methods can play their strengths effectively and efficiently.
As stated before, the ES-RNN combines exponential smoothing and recurrent neural networks, more precisely Holt-Winters and LSTM. The results show clearly that the hybrid method benefits above all from the former. Whenever the Holt-Winters delivers a reliable forecast, the ES-RNN’s performance is better than the LSTM’s performance. Only for the dataset Daily Female Births in California the ES-RNN outperforms its constituent parts, which suggests a substantial share of nonlinearity in this dataset. This is further confirmed by the performance of the machine learning methods which outperform the statistical methods on this dataset.
The following section discusses some selected forecasts in more detail.
Results for Statistical Methods
Figure 3 presents the Holt-Winters forecast for the Airline Passengers dataset. As the graph shows, the Holt-Winters provides a reliable one-step-ahead forecast. This desirable forecasting performance is due to the precise determination of periodicity for this particular dataset.
Whereas the Holt-Winters builds upon exponential smoothing, ARIMA follows a different approach. Its core principle is to describe the autocorrelation of a time series which leads to the fact that the periodicity of a time series does not have to be specified beforehand. This can yield promising results if a time series does not exhibit distinct periodicity or has overlying seasonalities. This is the case for the Rossmann Store Sales dataset, where weekly, yearly, and even monthly seasonalities overlap in an equivocal manner. Figure 4 depicts the ARIMA one-step-ahead forecast for this dataset.
Results for Machine Learning Methods
Machine learning methods such as MLPs or LSTMs are known to face difficulties when extrapolating time series presenting a trend. Despite this shortcoming, these methods can be a valuable tool when there is no trend obvious and the time series is assumed to be mostly nonlinear. The Sunspot dataset is analyzed in the literature with a focus on nonlinear modeling [3] and exhibits no trend. Consequently, the LSTM provides a promising forecast for this dataset which is displayed in Figure 5.
If you are interested in further insights on how machine learning methods can be applied successfully to time series forecasting problems, see Constantin’s blog post about a ride-hailing use case.
Results for the Hybrid Method
The success of Slawek Smyl’s hybrid time series forecasting creates high expectations for the performance in the conducted investigation. The results for the eight datasets show that the combination of Holt-Winters and an LSTM is promising when the periodicity of a time series can be precisely specified. The precise specification enables the Holt-Winters to simplify the forecasting task for the LSTM and, consequently, facilitates the hybrid method to obtain accurate forecasts. As this precise specification is not always feasible, the ES-RNN failed to meet the high expectations for some datasets. For instance, on the Sunspot dataset, where the Holt-Winters was not able to perform well either, the ES-RNN delivered poor results. However, there were datasets such as the Daily Female Births in California displayed in Figure 6, where the ES-RNN was able to shine and provide a decent forecast.
Summary
So what do these findings mean? First, it is of crucial importance to gain a deep understanding of the time series before conducting a forecast. Furthermore, any forecasting method, be it statistical, machine learning, or hybrid, has its unique strengths and limitations. Therefore, a major part of the success lies in choosing the appropriate method for the forecasting problem.
General assessments are difficult but there are some rules of thumb to go by. If the time series is assumed to be nonlinear to a large extent, machine learning methods are a promising way to go. The potential of hybrid methods lies in handling the nonlinearity of a time series while incorporating the strengths of statistical methods. In a use case that presents nonlinearity but moreover exhibits clear trend and seasonality, hybrid methods are an option worth considering. Finally, if the time series can be assumed to be mostly linear, the classical methods of statistics will play their strengths in a simple but yet unrivaled effective and efficient manner.
Conclusion
In the domain of time series forecasting, the no free lunch theorem describes the circumstance that none of the existing forecasting methods is universally better than any other method. Therefore, each forecasting task has to be examined separately, and the most suitable method has to be selected. With this in mind, it becomes obvious that hybrid methods do not generally make statistical or machine learning methods obsolete, but can be a valuable tool in the forecaster’s toolbox.
If you found this blog post interesting and would like to learn more about time series forecasting and hybrid methods, you can deep dive here.
References
[1] Khandelwal, I., Adhikari, R., and Verma, G. (2015). Time series forecasting using hybrid ARIMA and ANN models based on DWT decomposition. Procedia Computer Science, 48(1):173–179.
[2] Panigrahi, S. and Behera, H. (2017). A hybrid ETS-ANN model for time series forecasting. Engineering Applications of Artificial Intelligence, 66:49–59.
[3] Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing, 50:159–175.
[4] Smyl, S. (2020). A hybrid method of exponential smoothing and recurrent neu- ral networks for time series forecasting. International Journal of Forecasting, 36(1):75–85.
[5] Smyl, S. (2017). Ensemble of specialized neural networks for time series forecasting. Retrieved from https://bit.ly/3fj2owR.