From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation
Researchers introduce PRIMO R1, a 7B parameter AI framework that transforms video MLLMs from passive observers into active critics for robotic manipulation tasks. The system uses reinforcement learning to achieve 50% better accuracy than specialized baselines and outperforms 72B-scale models, establishing state-of-the-art performance on the RoboFail benchmark.