AINeutralarXiv – CS AI · 14h ago7/10
🧠
MiraBench: Evaluating Action-Conditioned Reliability in Robotic World Models
MiraBench introduces a new evaluation framework for robotic world models that prioritizes action-conditioned reliability over visual fidelity. The benchmark reveals that current AI models struggle to faithfully follow commanded actions and exhibit persistent optimism bias when predicting outcomes of failure-inducing actions.
$OP