AIBullisharXiv โ CS AI ยท 7h ago7/10
๐ง
Weakly Supervised Distillation of Hallucination Signals into Transformer Representations
Researchers developed a weak supervision framework to detect hallucinations in large language models by distilling grounding signals into transformer representations during training. Using substring matching, sentence embeddings, and LLM judges, they created a 15,000-sample dataset and trained five probing classifiers that achieve hallucination detection from internal activations alone at inference time, eliminating the need for external verification systems.