Promotion Forecasting: Case Study with a Retail Giant
Discover how a demand forecasting algorithm using machine learning, NLP and domain expertise led to an impressive 18% reduction in out-of-stocks and shortages across national operations in just one year

Introduction: The Challenge at Hand
In the retail landscape, Auchan, a global leader, faced a critical challenge: mastering the art of promotion forecasting. This narrative unfolds the story of how, using machine learning, NLP, and deep domain knowledge, we achieved a breakthrough — reducing stockouts and overstock by 18% in just one year.
Setting the Scene
At every store where the promotion is running, accurately forecasting the demand for these promotional items is a crucial challenge. The objective is clear but complex: aligning the supply accurately with the constantly changing customer demand to avoid surplus inventory and guarantee customer satisfaction.
In Auchan Retail International's data forecasting team, I embarked on a mission. Our target? To craft a forecasting model adaptable across a diverse country with minimal changes. The model, birthed for Auchan Ukraine, would later find its way to Romania and France, becoming an integral part of their promotion strategy.
The Forecasting Challenge
I had to provide daily forecasts for all food products in Auchan Ukraine's 22 hypermarkets. These forecasts were split between regular and fresh items and extended up to 55 days ahead. The aim was to predict demand at the store and SKU level for the entire promotional operation.
Data and Granularity
Forecasting sales in a store can be challenging, especially when dealing with a wide range of products and unpredictable sales patterns during promotions. This approach tested our limits.
Our strategy centred around available promotional pricing, display and dates, as about a third of the products lacked historical promotional data. Our modeling efforts focused on leveraging sales data for products on shelves and in promotions. We designed features specifically for promotions, ensuring that every aspect is captured.
About Feature Engineering
Both models benefited from a diverse range of features, including common temporal features such as dayofweek, week, month ..etc. Incorporating sales aggregates, product and store attributes, and sophisticated target encoding within promotion.
The most important features was of course about the promotion we included: Promotion attributes such as mechanics, promotion discount rate, product display during promotion, number of weeks before last promotion, discount amount, promotion duration, one of the most important feature at daily level was the promotion momentum which revealed that the 4th day was outstanding for a 7th day promotion.
Of course, we also included a lot of historical promotion aggregates, some momentum, static and dynamic aggregates. These features allowed us to effectively capture the complex dynamics of retail demand.
A Dual-Model Strategy
I chose to use a machine learning algorithm rather than a statistical model, or even a traditional linear model, because of the wide range of products that can be modelled and predicted at once, and because machine learning is particularly good at using external factors such as promotions, events, prices … etc.
And choosing LightGBM was pivotal due to its efficiency and ability to manage non-linear, complex features. The direct forecasting approach as well as full feature approach complemented by a tailored cross-validation strategy, helped us manage erratic promotion patterns and maintain stability.
Those cross-validation strategy was polar opposite :
- First model benefited from a reducing window method to mitigate the historical effect of old promotions, where consumer promotional behavior changes rapidly.
- Second model used an expanding windows method
The First Model: Our robust Baseline
I built a feature-rich Tree Based algorithm based on an extensive EDA and the observation that past promotions' average was a strong estimator. The algorithm is trained and driven entirely by promotion sales and attributes (as well as categorical feature for caracterize product, store, promotion, event ..etc.).
To understand the model's structure better, think of it as a ‘Feature Full Forecasting Model,' a term I explain in detail in my article (It's a derived of Recency Aggregation Method part).
The model aimed to predict total sales for upcoming promotions at specific stores and SKUs based on past promotional sales data and attributes. We used a Tree-Based algorithm to predict the future based on past promotional sales, prices, and attributes data within a specific time frame and future promotion attributes.
To improve our first model, we used the ‘Mirror SKU' approach. This involved creating a mirror image of each new product using existing data and reconstructing all necessary features for accurate forecasting.
The Second Model: Addressing the Extremes

The second model uses a ‘Direct Model Forecasting' approach and is more dynamic. Each day, product and store have their own features, and the algorithm is trained to predict the next day's sales based on whether the product will be on promotion or on the backshelf.
The prediction is based on several factors, including the sales of the nth last day, the average of the nth last day of promotion, and the mean value of the last four Wednesdays in promotion ..etc.. The features are tailored to take advantage of the low granularity to capture more accurate predictions by including seasonality, events, trends, promotion momebtum, or price momentum …etc. This model is used with SKUs that have enough a history of sales.
Selecting the Right Metrics
We worked with demand planners to define the scope of evaluation. This included the aggregated demand of the entire operation at SKU and STORE level, excluding operations with no sales for the entire 14 days.
We used metrics such as WMAPE (Weighted Mean Average Percent Error), bias, and the number of accurate operations at a certain threshold, in conjunction with business metrics such as total overstock and total missing quantities compared to planner forecast. The metrics have been computed through a thorough backtest of a complete year to verify that our algorithm is robust in any timeframe and takes into account trends and seasonality.
Data Preprocessing
Promotion is a complex data process for all stakeholders so data preparation proved essential. This included : reconstructing promotions from promotion catalog datasets, utilizing EDA to determine actual promotion start dates, and filling data gaps where there was a promotion but no sales. We also refined promo mechanics and addressed pricing anomalies such as promotion price above latest shelf price which improved the model's accuracy. I "fine-tuned" data by correcting promotion date inaccuracies and addressing various shortages revealed through EDA during data preprocessing. This step was vital for aligning our forecasts with real-world sales.
One notable finding was factual shortages with no sales at the beginning of the promotion. We hypothesized that it was a delivery issue and stockout after sales acceleration was a lack of unit command by the planner. To find those shortage the easy way was to track no sales on high rotation product observe their latest sales, verify that no promotion within same day same brand and same family cannibalised the demand, if all that condition was there then the probability that the product was out of stock due to lack of command was high.
Another shortage that made sense was end of promotion out of stock, with moderate to high rotation product there was pattern with cumulative no sales or where promotion momentum break and decline until no sales and when we found those we can make hypothesis about out of stock potential. How to find this case easily? Reverse the process, find the last day of no sales or drastic decline within promotion and look back to try to find the momentum change, compare it to same product which is in promotion in another store to confirm your bias.
Addressing Limited Historical Data
For new products with sparse promotional history, I developed a "mirror SKU" method, significantly enhancing forecast accuracy. That method was tailored for transparency and followed this heuristic :
For each sku without historical sales (our target)…
- Generate high-quality SKUs candidates based on product sales attributes, selecting from the finest subset as Brand X subfamily.
- Compute Product description distance (using NLP algorithm such as TFIDF and cosine distance) between candidates and target.
- Computr Promotion price distance between target and candidates as an absolute difference
- Rank the recommendation from 2. & 3 by minimise both metrics
- Extraction of 3 mirror's SKU.
Combining that method with cross-learning from the first model we defined above resulted in a 25/30% improvement in metrics for new product forecasting.
Beyond Implementation
After the model's deployment, we provided detailed sales analyses and confidence intervals. These insights empowered demand planners, making decision-making more data-driven.
Collaborating closely with stakeholders was crucial for both adapting the model to retail complexities and customizing it to specific needs. Their insights were invaluable in refining the algorithm for practical application in retail settings.
The Impact
Our model not only outperformed traditional forecasting methods but also marked a 15% improvement over previous demand planner forecasts. This led to over 30,000 hours saved annually for planner and an 18% reduction in overstock and shortages for Auchan Ukraine at national level resulting in a profit of $100 000.
Conclusion
This journey underscores the transformative power of Data Science in retail forecasting. It's a testament to how targeted, data-driven strategies can lead to substantial operational improvements and efficiency.