AINeutralarXiv – CS AI · 10h ago6/10
🧠
How You Begin is How You Reason: Driving Exploration in RLVR via Prefix-Tuned Priors
Researchers propose IMAX, a framework that uses trainable prefix tuning to improve exploration in reinforcement learning with verifiable rewards (RLVR) for language model reasoning. The approach addresses entropy collapse by creating diverse reasoning trajectories, achieving performance gains up to 11.60% in Pass@4 accuracy across multiple model scales.