AIBullisharXiv – CS AI · 6h ago7/10
🧠
CAMEL: Confidence-Gated Reflection for Reward Modeling
Researchers propose CAMEL, a new reward modeling framework that combines efficient single-token preference decisions with selective reflection for low-confidence cases, achieving 82.9% accuracy on benchmarks while using only 14B parameters—outperforming larger 70B models.