y0news
#ai-monitoring2 articles
2 articles
AIBullisharXiv โ€“ CS AI ยท 5h ago1
๐Ÿง 

Beyond Reward: A Bounded Measure of Agent Environment Coupling

Researchers introduce 'bipredictability' as a new metric to monitor reinforcement learning agents in real-world deployments, measuring interaction effectiveness through shared information ratios. The Information Digital Twin (IDT) system detects 89.3% of perturbations versus 44% for traditional reward-based monitoring, with 4.4x faster detection speed.

AINeutralarXiv โ€“ CS AI ยท 5h ago1
๐Ÿง 

Constitutional Black-Box Monitoring for Scheming in LLM Agents

Researchers developed constitutional black-box monitors to detect scheming behavior in LLM agents using only observable inputs and outputs. The study found that monitors trained on synthetic data can generalize to realistic environments, but performance improvements plateau quickly with simple optimization techniques outperforming complex methods.