For AI to be successfully integrated into high-stakes decision-making, it must be fundamentally Trustworthy. A key pillar of this trust is a model’s ability to signal its own limitations; however, current uncertainty quantification (UQ) methods [3]—such as entropy or predictive variance—largely function as “black boxes.” While they may correctly identify that a model is hesitant, they offer no insight into why. This lack of transparency forces users to rely on a cryptic numerical score without understanding the underlying reasoning or the semantic root of the model’s doubt.
This research proposes a paradigm shift from numerical to Semantic Uncertainty. By leveraging Concept Activation Vectors (CAVs) [1,2], we can deconstruct a model’s high-dimensional activations into human-understandable features. These explanations facilitate the construction of a transparent, surrogate Generalized Additive Model (GAM) based on the original model’s internal concept importances. Within this framework, uncertainty becomes a visible “Concept Competition”: the model can explain its hesitation by showing how an input simultaneously activates concepts belonging to conflicting classes, aligning the model’s uncertainty with human reasoning and moving us closer to truly reliable, interpretable AI.
Possible Research Paths:
Literature
[1] https://arxiv.org/abs/2211.10154
[2] https://arxiv.org/abs/2306.07304
[3] https://link.springer.com/article/10.1007/s10462-023-10562-9
[4] https://ojs.aaai.org/aimagazine/index.php/aimagazine/article/view/393