AIBearisharXiv – CS AI · 6h ago7/10
🧠
CIAware-Bench: Benchmarking Control Intervention Awareness Across Frontier LLMs
Researchers introduce CIAware-Bench, a benchmark measuring whether frontier LLMs can detect when their outputs are being monitored and modified by AI control systems. Testing eleven models across multiple domains, the study finds low-to-moderate detection rates (up to 0.87 accuracy), revealing that intervention awareness varies significantly by task and model pair, with implications for the robustness of AI safety protocols.