Universität Bielefeld Play

[BA/MA/Project]

Investigate Hallucinations of LLMs using xAI

Contact: Alexander Schulz

Hallucinations in large language models (LLMs) pose a major challange in many practical applications. Some promising work [1,2] present data sets and hidden state representation based detection approaches, which do not perform extremely well, at the moment.

A goal of a thesis/project in this context would be, depending on the LPs of the work, to investigate normalization schemes inspired by [4], and/or apply xAI based approaches to investigate their detection performance. Examples for the latter are concept based methods [5,6] and DeepView [7].

Literature

  1. Ridder, Fabian, and Malte Schilling. “The HalluRAG Dataset: Detecting Closed-Domain Hallucinations in RAG Applications Using an LLM’s Internal States.” https://arxiv.org/abs/2412.17056
  2. Ravichander, Abhilasha, et al. “HALoGEN: Fantastic LLM Hallucinations and Where to Find Them.” https://arxiv.org/abs/2501.08292
  3. Skean, Oscar, et al. “Layer by Layer: Uncovering Hidden Representations in Language Models.” https://arxiv.org/abs/2502.02013
  4. Bürger, Lennart, Fred A. Hamprecht, and Boaz Nadler. “Truth is universal: Robust detection of lies in llms.” https://proceedings.neurips.cc/paper_files/paper/2024/hash/f9f54762cbb4fe4dbffdd4f792c31221-Abstract-Conference.html
  5. Fel, Thomas, et al. “Craft: Concept recursive activation factorization for explainability.” https://openaccess.thecvf.com/content/CVPR2023/papers/Fel_CRAFT_Concept_Recursive_Activation_FacTorization_for_Explainability_CVPR_2023_paper.pdf
  6. Parekh, Jayneel, et al. “A concept-based explainability framework for large multimodal models.” https://proceedings.neurips.cc/paper_files/paper/2024/hash/f4fba41b554f9aaa013c4062a1c40518-Abstract-Conference.html
  7. Schulz, Alexander, Fabian Hinder, and Barbara Hammer. “DeepView: Visualizing Classification Boundaries of Deep Neural Networks as Scatter Plots Using Discriminative Dimensionality Reduction.” https://www.ijcai.org/Proceedings/2020/319