AIBearisharXiv – CS AI · 15h ago7/10
🧠
Detecting Is Not Resolving: The Monitoring Control Gap in Retrieval Augmented LLMs
Researchers discovered that retrieval-augmented language models exhibit a critical safety gap: they can detect contradictory information in accumulated evidence but fail to incorporate this awareness into their final recommendations. Testing across model families showed single-turn safety evaluations significantly overestimate real-world robustness in multi-turn scenarios where evidence accumulates.