Universität Bielefeld Play

[BA/MA]

Extending Feature Attributions with Feature Interactions

Contact: Fabian Fumagalli

Feature Attribution methods like SHAP [1] or LIME [2] are widely applied to explain predictions of black box machine learning models without making any assumption on the model class. However, single feature attributions cannot quantify the contribution that multiple feature achieve at the same time. For instance, in languange models, explaining the prediction by assigning contributions to individual words is not meaningful, as multiple words must be considered together. Quantifying the contributions of multiple features is known as feature interactions [3], where existing methods have been extended. In this thesis, the goal is to discover feature interactions in existing deep learning architectures to explain these predictions more comprehensively.

Keywords: Explainable AI, Deep Learning, Feature Interactions

Literature

  1. Lundberg, S. M. & Lee, S.-I. A Unified Approach to Interpreting Model Predictions. https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html

  2. Ribeiro, M. T., Singh, S. & Guestrin, C. ‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier. https://dl.acm.org/doi/abs/10.1145/2939672.2939778

  3. Fumagalli, Fabian, et al. “Shap-iq: Unified approximation of any-order shapley interactions.” https://proceedings.neurips.cc/paper_files/paper/2023/hash/264f2e10479c9370972847e96107db7f-Abstract-Conference.html