Neural Basis Models for Interpretability

Author:Murphy | View: 23138 | Time: 2025-03-23 12:28:43

The widespread application of Machine Learning and Artificial Intelligence across various domains brings about heightened challenges regarding risks and ethical assessments. As seen in case studies like the criminal recidivism model reported on by ProPublica, machine learning algorithms can be incredibly biased and, as a result, robust explainability mechanisms are needed to ensure trust and safety when these models are deployed in high-stakes areas.

So, how do we balance interpretability with accuracy and model expressivity? Well, Meta AI researchers have proposed a new approach they dubbed Neural Basis Models (NBMs)[1], a sub-family of generalized additive models that achieve state-of-the-art (SOTA) performance on benchmark datasets while retaining glass-box interpretability.

In this article, I aim to explain the NBM and what makes it a beneficial model. As usual, I encourage everyone to read the original paper.

If you're interested in interpretable machine learning and other aspects of ethical AI, consider checking out some of my other articles and following me!

Interpretable and Ethical AI

Background: GAMs

NBM is considered a Generalized Additive Model (GAM). GAMs are inherently interpretable models that learn a shape function for each feature, and predictions are made by "querying" the shape function. Since these shape functions are independent, the impact of a feature on the prediction can be understood by visualizing these shape functions, making them highly explainable. Interactions between variables are modelled by passing multiple variables into the same function and constructing the shape function based on that (usually limiting the number of variables to 2 for interoperability), a configuration that is called a GA2M.

Equations for GAMs and GA2Ms (Figure from Radenovic et al. [1])

The various GAM and GA2M models use different mechanisms for developing these shape functions. The Explainable Boosting Machine (EBM) [2] uses a set of boosted trees trained on each feature, Neural Additive Models (NAMs) [3] use a deep neural network for each feature, and NODE-GAM [4] uses ensembles of oblivious neural trees[6]. I recommend reading the following articles on the EBM and NODE-GAM/NAM for a more detailed explanation of these models.

NBM Approach

Neural Basis Models (NBM) is a new subfamily of Generalized Additive Models (GAMs) that utilizes basis decomposition of shape functions.

NBM Architecture (Figure from Radenovic et al. [1])

Unlike other GAM models (like NAM[3]), which effectively train independent models for each feature to construct the shape function, the NBM architecture instead relies on a small number of basis functions shared among all features and learned jointly for a given task. What are these functions? Well, the Swiss-army knife of function approximations: the deep neural network.

Effectively, a common MLP backbone that takes in 1 input and outputs B values is trained and applied to each input feature. These outputs are then linearly combined to form the final prediction for the given feature, and the linear combination weights differ for each feature. Another way to think of this architecture is through the lens of encoder-decoder networks. All the features share the same encoder (the common MLP backbone), but each has its own decoder (the linear transformation of the encoding). The decoded values for each feature are then summed together to create the final prediction.

This can easily be extended to include feature interactions as well. If we want to model pairwise interactions, we include an MLP that takes two inputs instead of one.

NBM and NB2M Equations (Figure from Radenovic et al. [1])

One benefit of using shared MLP backbones instead of different MLPs for each feature is the significantly smaller size of the model. This makes the NBM incredibly suited to tasks related to extremely high-dimensional data.

Performance and Benefits

To test their architecture, Radenovic et al. (2022) compared NBM to various other models like Linear Regression, the EBM [2], NAM [3], XGBoost[5], and an MLP. Their first evaluation was on a mix of tabular and image datasets.

Performance Comparison across Baselines (Figure from Radenovic et al. [1])

Overall, the NBM holds its ground, outperforming the other interpretable models of the datasets and even outperforming the MLP on some datasets.

Radenovic et al. (2022) also did another evaluation on purely tabular datasets, focusing on getting a good comparison between SOTA GAM models.

Performance comparison against other GAMS: (Figure from Radenovic et al. [1])

This comparison clearly shows the power of the NBM, beating out the competitors in almost every dataset. As mentioned before, the scalability of the NBM is also exceptional. As shown below, the number of parameters in an NBM is nearly 70 times less than in a NAM for high-dimensional data tasks.

NAM/NBM Parameter Comparison. X-axis is the dimensionality of the data (Figure from Radenovic et al. [1])

Conclusion

Overall, NBMs are incredibly powerful and lightweight models that are inherently interpretable due to being a GAM. However, this doesn't mean it's a silver-bullet solution to high-stakes machine-learning problems. There are still a significant number of considerations that need to be taken into account when utilizing these models. For one, an inherently interpretable model means almost nothing if the features inputted into the model are not interpretable.

Additionally, while the size of NBMs scales well compared to NAMs, the interpretability doesn't. No single human can look at thousands of feature attribution charts, especially if those attribution charts also include pairwise interactions. This means that pre-processing methods such as feature selection are still needed with large parameter spaces, something even the authors acknowledged. However, none of this discredits the author, as this is still an incredibly useful model that is relatively easy to implement and tune.

The fact that this model is a GAM is also very nice for machine learning applications such as on mobile devices and other non-powerful devices as users can train the model and deploy the generated feature attribution functions instead of the full model for extremely fast and memory-light inference without any loss in accuracy.

Resources and References

NBM Code: https://github.com/facebookresearch/nbm-spam
NBM Open Review: https://openreview.net/forum?id=fpfDusqKZF
If you are interested in Interpretable Machine Learning or Time Series Forecasting, consider following me: https://medium.com/@upadhyan.
See my other articles on interpretable machine learning: https://medium.com/@upadhyan/list/interpretable-and-ethical-ai-f6ee1f0b476d

References

[1] Radenovic, F., Dubey, A., & Mahajan, D. (2022). Neural basis models for interpretability. Advances in Neural Information Processing Systems, 35, 8414–8426.

[2] Yin L., Rich C., Johannes G., and Giles H. (2013) Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 623–631. 2013.

[3] Agarwal, R., Melnick, L., Frosst, N., Zhang, X., Lengerich, B., Caruana, R., & Hinton, G. E. (2021). Neural additive models: Interpretable machine learning with neural nets. Advances in neural information processing systems, 34, 4699–4711.

[4] Chang, C.H., Caruana, R., & Goldenberg, A. (2022). NODE-GAM: Neural Generalized Additive Model for Interpretable Deep Learning. In International Conference on Learning Representations.

[5] Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 785–794).

[6] Popov, S., Morozov, S., & Babenko, A. (2019). Neural oblivious decision ensembles for deep learning on tabular data. Eight International Conference on Learning Representations.

Tags: Artificial Intelligence Deep Learning Explainable Ai Machine Learning Neural Networks