AIBearisharXiv – CS AI · 10h ago7/10
🧠
Attacking the Trusted Imagination: Oracle-Level Integrity Attacks on Imagine-then-Act World Models
Researchers demonstrate a novel attack vector against vision-language-action (VLA) policies that exploit the 'trusted imagination' component of world-action models rather than targeting reactive policies directly. By perturbing observations to corrupt latent trajectory predictions, attackers can fool downstream systems like safety gates and MPC planners while leaving the base policy unaffected, revealing a critical asymmetry in AI system robustness.