REWE digital:
Demand Forecasting for REWE’s Delivery Service

Since 2015, REWE and inovex have been jointly pursuing various big data initiatives in order to further optimise the major German retailer’s supply chain. These initiatives include holding training courses and implementing data science and big data technologies, as well as the development of applications based on them.

Logo von Rewe

inovex and REWE’s collaboration in the area of supply chain optimisation has focused particularly intensively on REWE’s IT subsidiary, REWE digital. The company is the strategic hub for all the REWE Group’s online activities.
This particular project involved developing demand forecasting for REWE’s delivery service, leveraging big data technologies to enable easy scalability.

Availability vs write-offs – achieving a cost-optimal trade-off

A retailer’s main task is to operate supply chains that are optimally tailored to its customers and to constantly adapt them to changing customer requirements. Two significant key figures for assessing the quality of inventory planning are the availability rate and the write-off rate.

These measure whether a forecast was too high, meaning that customers bought less than predicted (inventory oversupply), or whether it was too low, resulting in the goods being sold out before the demand was fully met (understocking). Both situations generate costs for the retailer and balancing them is a trade-off. The goal of demand forecasting is to minimise these amounts.

Online vs. brick-and-mortar retail

When it comes to forecasting demand for REWE’s delivery service, there are a number of specific circumstances to consider. Customers make their purchases via the website, which means that they are for future dates. Shoppers place their orders in advance and select a (later) time for their desired delivery. Even though most orders are placed at short notice for delivery in the next three days, this situation gives online retailers a decisive advantage over their brick-and-mortar counterparts: they receive advance notice of a proportion of their future demand.

Retailers in the online business can also learn more about customer requirements than brick-and-mortar retailers. Online customers’ original shopping baskets provide information on what they actually want – even if they ultimately adjust quantities of certain items during the checkout process due to a lack of availability. Since it is hard to obtain this information in a brick-and-mortar store, out-of-stock situations are difficult to identify offline.

Both of these particular features were taken into account when designing the forecast models. As a result, REWE achieves comparatively precise sales forecasts.

Dynamic, fully automated prediction models

With REWE’s full-line stores carrying approximately 30,000 articles, the retailer’s product range is very extensive. The range held by its delivery warehouses, on the other hand, is slightly limited in comparison, but it still contains all the product groups commonly found in REWE stores. Thus, the delivery service also serves the typical ‘long tail’ of retail distribution, in which a relatively small proportion of items are in very high demand, while a large section of the product range is in moderate to low demand. Individual items also differ greatly in their sales behaviour due to key factors such as sell-through level, seasonal variations, trends, and other influencing factors, including pricing and advertising.

Depiction of differences in the sales behavior of different products

Bearing in mind these differences, then, it is easy to see that it would make little sense to create a uniform forecasting model for all articles. This hypothesis, which is supported by statistics, led to the development of a whole series of very different models. These models are now dynamically and completely automatically assigned to the items to enable REWE to achieve the most accurate forecast possible for each individual item.

In order to optimally control the trade-off between oversupply and understocking costs, the forecast models can estimate any target quantile. This target quantile corresponds to the REWE service level vis-à-vis the customer, meaning the likelihood that a customer could find themselves looking at an empty shelf when an item is out of stock.

Technology stack

Technologically, the new demand forecasting system was implemented using modern distributed systems based on Apache Spark in combination with Scikit-Learn and Pandas. The use of Spark enables and guarantees the almost linear scaling of the system whenever new delivery warehouses or items are added. Scikit-Learn in combination with Pandas – the current standard among Python libraries for machine-learning algorithms – has been used in both prototyping and productive implementation. The translation from prototype to production was therefore easy, as there was no “language gap” to be overcome. PySpark ensures interoperability between Spark and the Python stack.

In addition to these core tools, a number of other components from the open-source community were also used.

Agile implementation

The project was developed agilely using the Scrum framework as an organisational model, and the new solution was realised in several stages. When it comes to data science projects, it is particularly difficult to estimate the complexity of a solution at the beginning of a project. The accuracy – and therefore, the usefulness – of prediction models cannot be measured until they have been created. From our point of view, therefore, the agile method was decisive in ensuring the project’s success.

From the very outset, the project’s main focuses were customer value and an easy-to-implement solution. The project not only improved the quality of the sales prediction models, but also included their integration into the existing REWE system landscape. Manual intervention options and a comprehensive monitoring solution to oversee the models and their forecasts were also developed.

The first productive system was rolled out in a delivery warehouse and included about 50% of the product range. The additional delivery warehouses were successively rolled out, and coverage was expanded to include over 95% of the product range, from everyday necessities and perishable items to fruit and vegetables. Manual intervention by store or logistics employees is required only in exceptional cases. 
The accuracy of the new forecasting solution was measured against the aforementioned key figures, with the availability rate being a particular priority. The new prediction tools halved the number of unavailable items while maintaining the same write-off rate.

Technology Stack
  • Zeppelin
  • Jupiter
  • Grafana
  • Prometheus
  • Java
  • Scikit-Learn
  • Python
  • R
  • Apache Spark
  • Apache Kafka
  • PySpark
  • Google Cloud Platform
  • hadoop

Get in touch!

Florian Wilhelm

Head of Data Science, Contact for Data Management & Analytics