AINeutralarXiv – CS AI · 14h ago6/10
🧠
Reinforcement Learning with Robust Rubric Rewards
Researchers introduce RLR³, an advanced reinforcement learning framework that extends reward verification from task-level to criterion-level evaluation, enabling multi-criteria supervision for vision-language tasks. The approach uses hybrid verification paths combining LLM extractors with deterministic verifiers or LLM judges, demonstrating a 4.7-point improvement over baseline models on 15 benchmarks.