Critical Tools for Ethical and Explainable AI

Author:Murphy | View: 27886 | Time: 2025-03-23 18:03:39

Machine learning models have revolutionized numerous fields by delivering remarkable predictive capabilities. However, as these models become increasingly ubiquitous, the need to ensure fairness and interpretability has emerged as a critical concern. Building fair and transparent models is an ethical imperative for building trust, avoiding bias, and mitigating unintended consequences. Fortunately, Python offers a plethora of powerful tools and libraries that empower data scientists and machine learning practitioners to address these challenges head-on. In fact, the variety of tools and resources out there can make it daunting for data scientists and stakeholders to know which ones to use.

This article delves into fairness and interpretability by introducing a carefully curated selection of Python packages encompassing a wide range of interpretability tools. These tools enable researchers, developers, and stakeholders to gain deeper insights into model behaviour, understand the influence of features, and ensure fairness in their machine-learning endeavours.

Disclaimer: I will only focus on three different packages since these 3 contain a majority of the interpretability and fairness tools anyone may need. However, a list of honourable mentions can be found at the very end of the article.

InterpretML

GitHub: https://github.com/interpretml/interpret

Documentation: https://interpret.ml/docs/getting-started.html

Interpretable models play a pivotal role in machine learning, promoting trust by shedding light on their decision-making mechanisms. This transparency is crucial for regulatory compliance, ethical considerations, and gaining user acceptance. InterpretML [1] is an open-source package developed by Microsoft's research team that incorporates many crucial machine-learning interpretability techniques in one library.

Post-Hoc Explanations

First, InterpretML includes many post-hoc explanation algorithms to shed light on the internals of black-box models. These include:

Shapley Additive Explanations (SHAP): A feature importance explanation approach based on game theory.
Local Interpretable Model-agnostic Explanations (LIME): A local explanation method that fits a surrogate interpretable model to predict the result of the black-box model.
Partial Dependence Plots (PDP): A perturbation-based interpretability method that helps show interactions between features.
Morris Sensitivity Analysis: A method for quantifying input variables' influence on a model's output by systematically perturbing the variables and observing the resulting changes in the output (similar to PDP)

Almost all of the methods above can be found in other libraries, but InterpretML makes it easier for us by combining all of them into one package.

Glassbox Models

Besides providing post-hoc explainability, InterpretML also contains a few glass box (or inherently interpretable) models such as Linear Models, Decision Trees, and Decision Rules (or Oblivious Decision Trees).

InterpretML is also the only package that contains the Explainable Boosting Machine (EBM), a tree-based, gradient-boosting Generalized Additive Model. Internally, EBMs generate contribution functions based on the values of individual variables or variable interactions. These functions are then combined for the final prediction, and global explanations can be generated by visualizing the contribution values.

Explanation of priors_count on the COMPAS Dataset. As the number of priors goes up, the model predicts higher recidivism rates (Figure by Author)

EBMs are often as accurate as other boosting models like LightGBM and XGBoost, making them a vital tool in any data scientist's toolbox. Please check Dr. Kubler's article for a full explanation of the EBM.

Captum

GitHub: https://github.com/pytorch/captum

Documentation: https://captum.ai/docs/introduction

While InterpretML focused mainly on "shallow" models, Captum [2] is PyTorch's go-to package for Deep Learning interpretability. This library contains many post-hoc interpretability algorithms that help provide both feature-importance and neuron/layer attributions (a full table can be found below).

Captum Attribution Algorithms organized by explanation focus (Image By Author)

These algorithms help with tabular interpretability, but their use cases extend beyond that. Ever wondered what BERT might be looking at for its predictions? Well, one of the tutorials provided by Captum shows how to use Layer Integrated Gradients to explain question-answer pairs generated by BERT:

Question Answering Interpretability(Image by Author)

Captum can also be used to explain image predictions using algorithms such as Input x Gradient or Layerwise relevance propagation:

MNIST Prediction Explanation using Layerwise Relevance Propagation (Image by Author)

Overall, this library is incredibly easy to use and extremely versatile, making it a must-know for any deep learning developer.

AIF360

GitHub: https://github.com/Trusted-AI/AIF360

Documentation: https://aif360.readthedocs.io/en/stable/

While interpretability can go a long way in identifying potential bias in models, some dedicated tools and metrics can measure and, more importantly, mitigate unfairness in datasets and predictive tools. One of these is the AI Fairness 360 toolkit (AIF360) [3], an open-source library developed by IBM for both Python and R. This toolkit covers almost all the fairness and mitigation methods one may need.

Additionally, AIF360 (like Captum) has a large number of easy-to-approach tutorials on how to use this library.

Datasets

The first extremely useful feature AIF360 provides is a large number of sandbox datasets provided that are extremely useful when learning about Fairness and Interpretability. These include the Adult Census Income, Bank Marketing, COMPAS (the criminal recidivism dataset), MEPS (Medical Expenditure Panel Survey) Data for 2019–21, Law School GPA, and German Credit datasets. All of these are great starting points for examining fairness and systemic bias.

Fairness Metrics

AIF360 also provides a comprehensive set of tools that calculate metrics on representation and model performance conditioned on privileged and underprivileged groups. This makes it easy for users to calculate fairness scores like Equalized Odds (equal false positive and negative rates across groups) and Demographic Parity (identical predictions if we ignore a sensitive feature). For example, using compute_num_TF_PN can give a confusion matrix comparison between an underrepresented and privileged group.

Mitigation Methods

The crowning feature of AIF360 is the large number of mitigation algorithms the library contains. These algorithms can easily be integrated into a standard machine-learning pipeline without many changes and all of them are compatible with the sklearn interface.

The first group of mitigation methods is pre-processing algorithms. These transform input data to help balance the fairness and representation of the data. AIF360 contains four algorithms for this:

Disparate Impact Removal: This edits the feature values across classes to increase overall fairness and reduce the impact of systemic biases on the dataset
Learning Fair Representations (LFR): This algorithm finds a latent representation of the data which encodes important information but obfuscates information about protected attributes.
Optimized Preprocessing: This technique learns a probabilistic transformation that edits the features and labels to ensure group fairness and data fidelity
Reweighting: This algorithm simply reweights the samples to ensure fairness before classification tasks.

AIF360 also provides a lot of "in-processing" methods that wrap around the training and hyperparameter search processes. These include methods like Grid Search reduction (finding hyperparameters that optimize performance and fairness), Adversarial Debiasing (learning a second model that aims to detect protected attributes using the results from the first model), and others.

Finally, AIF360 offers multiple post-processing algorithms that take in a model's predictions and solve their optimization problems to modify predictions to be more fair. These include Calibrated Equalized Odds (modifying predictions to ensure equal positive and negative rates) and Reject Option Classifier (changing predictions to give more favourable outcomes to underprivileged groups).

Honorable Mentions

The three libraries listed above are incredible and will cover 80% of the interpretability and fairness needs of the beginner data scientist. However, there are some other packages and tools that deserve an honourable mention:

Interpretability

SHAP [4] / LIME [5]: Dedicated implementations of the SHAP and LIME algorithms, respectively, along with related visualizations.
ELI5 [6]: This package is similar to InterpretML and shares many white-box models and black-box explainers in the other packages. Unfortunately, this project is no longer updated anymore.
Yellowbrick [7]: This package extends the sklearn API to provide a lot of visualization tools for your model internals.
Alibi [8]: This package is similar to InterpretML and ELI5, providing many explainers and white box models.

Fairness

Fairlearn [9]: Fairlearn is a library similar to AIF360, providing fairness-promoting tools. This package shares many of the algorithms found in AIF360.
Aequitas [10]: Aequitas is a bias audit toolkit that is a library and a web application. Using this tool, you can generate reports on the systemic biases potentially present in your data.
FairML [11]: FairML is a library that quantifies a model's inputs' relative significance and predictive dependency. This tool can help audit predictive models.

Conclusion

In the end, the collective effort to embrace interpretability and fairness in Machine Learning will lead us toward a future where AI systems are accurate and powerful but also transparent, fair, and trustworthy, ultimately benefiting developers and end-users alike. By harnessing the capabilities of these Python packages and embracing a commitment to ethical AI, we can pave the way for a more inclusive and responsible AI-driven world.

Resources and References

If you are interested in interpretable machine learning and forecasting, consider giving me a follow: https://medium.com/@upadhyan
For other articles on Ethical and Interpretable AI, check out the reading list below:

List: Interpretable and Ethical AI | Curated by Nakul Upadhya | Medium

References

[1] Nori, H., Jenkins, S., Koch, P., & Caruana, R. (2019). InterpretML: A Unified Framework for Machine Learning Interpretability. arXiv preprint arXiv:1909.09223.

[2] Narine Kokhlikyan, Vivek Miglani, Miguel Martin, Edward Wang, Bilal Alsallakh, Jonathan Reynolds, Alexander Melnikov, Natalia Kliushkina, Carlos Araya, Siqi Yan, & Orion Reblitz-Richardson. (2020). Captum: A unified and generic model interpretability library for PyTorch.

[3] Rachel K. E. Bellamy, Kuntal Dey, Michael Hind and Samuel C. Hoffman, Stephanie Houde, Kalapriya Kannan and Pranay Lohia, Jacquelyn Martino, Sameep Mehta and Aleksandra Mojsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy and John Richards, Diptikalyan Saha, Prasanna Sattigeri and Moninder Singh, Kush R. Varshney, & Yunfeng Zhang. (2018). AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias.

[4] Lundberg, S., & Lee, S.I. (2017). A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems 30, 4765–4774.

[5] Marco Tulio Ribeiro, Sameer Singh, & Carlos Guestrin (2016). "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13–17, 2016 (pp. 1135–1144).

[6] TeamHG-Memex (2019) ELI5. Github

[7] Bengfort, B., & Bilbro, R. (2019). Yellowbrick: Visualizing the Scikit-Learn Model Selection Process. The Journal of Open Source Software, 4(35).

[8] Janis Klaise, Arnaud Van Looveren, Giovanni Vacanti, & Alexandru Coca (2021). Alibi Explain: Algorithms for Explaining Machine Learning Models. Journal of Machine Learning Research, 22(181), 1–7.

[9] Bird, S., Dudik, M., Edgar, R., Horn, B., Lutz, R., Milan, V., Sameki, M., Wallach, H., & Walker, K. (2020). Fairlearn: A toolkit for assessing and improving fairness in AI [White paper]. Microsoft.

[10] Saleiro, P., Kuester, B., Stevens, A., Anisfeld, A., Hinkson, L., London, J., & Ghani, R. (2018). Aequitas: A Bias and Fairness Audit Toolkit. arXiv preprint arXiv:1811.

[11] Adebayo, J. A. (2016). FairML: ToolBox for diagnosing bias in predictive modeling (Doctoral dissertation, Massachusetts Institute of Technology).

Tags: Artificial Intelligence Deep Learning Explainable Ai Fairness Machine Learning