Hammer Lab, Bielefeld University

For AI to be successfully integrated into high-stakes decision-making, it must be fundamentally Trustworthy. A key pillar of this trust is a model’s ability to signal its own limitations; however, current uncertainty quantification (UQ) methods [3]—such as entropy or predictive variance—largely function as “black boxes.” While they may correctly identify that a model is hesitant, they offer no insight into why. This lack of transparency forces users to rely on a cryptic numerical score without understanding the underlying reasoning or the semantic root of the model’s doubt.

This research proposes a paradigm shift from numerical to Semantic Uncertainty. By leveraging Concept Activation Vectors (CAVs) [1,2], we can deconstruct a model’s high-dimensional activations into human-understandable features. These explanations facilitate the construction of a transparent, surrogate Generalized Additive Model (GAM) based on the original model’s internal concept importances. Within this framework, uncertainty becomes a visible “Concept Competition”: the model can explain its hesitation by showing how an input simultaneously activates concepts belonging to conflicting classes, aligning the model’s uncertainty with human reasoning and moving us closer to truly reliable, interpretable AI.

Possible Research Paths:

Surrogate Construction: Build an Explainable Surrogate (GAM-style) that predicts classes by aggregating concept importance scores
Semantic Entanglement Uncertainty Score : Develop a metric that quantifies uncertainty based on “Semantic Entanglement”—measuring how much an input activates mutually exclusive or competing class-concepts.
Semantic Shift Uncertainty Mapping Score: Develop a metric that quantifies uncertainty based on “Semantic Shift”—measuring how much an input deviates from the current set of learned concepts.
Human-Inspired Uncertainty Mitigation: Integrate previous heuristics commonly found in Cognitive Science to mitigate inference uncertainty

Literature

[1] https://arxiv.org/abs/2211.10154
[2] https://arxiv.org/abs/2306.07304
[3] https://link.springer.com/article/10.1007/s10462-023-10562-9
[4] https://ojs.aaai.org/aimagazine/index.php/aimagazine/article/view/393

[BA/MA/Project]

Explainable Uncertainty with Concept Activation Vectors