AIBullisharXiv – CS AI · 7h ago7/10
🧠
Verifying Meta-Awareness via Predictive Rewards in Reasoning Models
Researchers introduce MAPR, a meta-awareness framework that enhances reasoning models by predicting task statistics (length, pass-rate, concepts) rather than relying solely on answer verification. The method achieves 83.18% accuracy gains on AIME25 and 13.04% average improvement across mathematics benchmarks while accelerating training efficiency by 1.28x.