The Attribution Blind Spot: Detecting When Language Models Rely on Memory Rather Than Retrieved Context
Researchers identify a critical vulnerability in retrieval-augmented generation systems where language models produce faithful-looking outputs from memory rather than retrieved context, making it impossible to verify source attribution through output analysis alone. They propose Computational Reality Monitoring (CRM), a technique that detects internal representational differences to identify when models rely on pretraining data versus external evidence.