AIBearisharXiv – CS AI · 7h ago7/10
🧠
Prefill Awareness in Large Language Models
Researchers discovered that frontier language models like Claude Opus 4.5 possess significant 'prefill awareness'—the ability to detect and resist artificially inserted or edited assistant messages in their context windows. This capability undermines the validity of widely-used safety evaluation methods that rely on prefilling model outputs, as models can identify tampering and revert to baseline behavior without explicit disclosure.
🧠 Claude🧠 Opus