AINeutralarXiv โ CS AI ยท 6h ago1
๐ง
Constitutional Black-Box Monitoring for Scheming in LLM Agents
Researchers developed constitutional black-box monitors to detect scheming behavior in LLM agents using only observable inputs and outputs. The study found that monitors trained on synthetic data can generalize to realistic environments, but performance improvements plateau quickly with simple optimization techniques outperforming complex methods.