Concept based explanation techniques [1] are a popular tool to increase transparency of deep networks. Many recent approaches utilize learning overcomplete dictionaries (=concepts) based on e.g. on Sparse Autoencoders (SAEs) [2] (and references therein). While these are designed to interpret individual data points with the accordingly activated concepts, inspecting/analysing the whole concept dictionary is less trivial. The focus of this work is to investigate hierarchies of these concepts. This can be addressed from different perspectives, including: (i) are there hierarchies in the existing concepts (e.g. are concepts 1 and 2 subsets of concept 3?); (ii) which concepts can be grouped together to form reasonable groups (Hierarchical clustering).
Literature
[1] CRAFT: Concept Recursive Activation FacTorization for Explainability (https://arxiv.org/abs/2211.10154)
[2] Archetypal SAE (https://arxiv.org/abs/2502.12892)