XPER: Unveiling the Driving Forces of Predictive Performance
Co-authored with S. Hué, C. Hurlin, and C. Pérignon.
I – From explaining model forecasts to explaining model performance
Trustability and acceptability of sensitive AI systems largely depend on the capacity of the users to understand the associated models, or at least their forecasts. To lift the veil on opaque AI applications, Explainable Ai (XAI) methods such as post-hoc interpretability tools (e.g. SHAP, LIME), are commonly utilized today, and the insights generated from their outputs are now widely comprehended.
Beyond individual forecasts, we show in this article how to identify the drivers of the performance metrics (e.g. AUC, R2) of any classification or regression model using the eXplainable PERformance (XPER) methodology. Being able to identify the driving forces of the statistical or economic performance of a predictive model lies at the very core of modeling and is of great importance for both data scientists and experts basing their decisions on such models. The XPER library outlined below has proven to be an efficient tool to decompose performance metrics into individual feature contributions.
While they are grounded in the same mathematical principles, XPER and SHAP are fundamentally different and simply have different goals. While SHAP pinpoints the features that significantly influence the model's individual predictions, XPER identifies the features that contribute the most to the performance of the model. The latter analysis can be conducted at the global (model) level or local (instance) level. In practice, the feature with the strongest impact on individual forecasts (say feature A) may not be the one with the strongest impact on performance. Indeed, feature A drives individual decisions when the model is correct but also when the model makes an error. Conceptually, if feature A mainly impacts erroneous predictions, it may rank lower with XPER than it does with SHAP.
What is a performance decomposition used for? First, it can enhance any post-hoc interpretability analysis by offering a more comprehensive insight into the model's inner workings. This allows for a deeper understanding of why the model is, or is not, performing effectively. Second, XPER can help identify and address heterogeneity concerns. Indeed, by analyzing individual XPER values, it is possible to pinpoint subsamples in which the features have similar effects on performance. Then, one can estimate a separate model for each subsample to boost the predictive performance. Third, XPER can help to understand the origin of overfitting. Indeed, XPER permits us to identify some features which contribute more to the performance of the model in the training sample than in the test sample.
II – XPER values
The XPER framework is a theoretically grounded method that is based on Shapley values (Shapley, 1953), a decomposition method from coalitional game theory. While the Shapley values decompose a payoff among players in a game, XPER values decompose a performance metric (e.g., AUC, R2) among features in a model.
Suppose that we train a classification model using three features and that its predictive performance is measured with an AUC equal to 0.78. An example of XPER decomposition is the following:

The first XPER value