AIBullisharXiv โ CS AI ยท 14h ago6/10
๐ง
Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration
Researchers propose NExt, a nonlinear extrapolation framework that accelerates reinforcement learning with verifiable rewards (RLVR) for large language models by modeling low-rank parameter trajectories. The method reduces computational overhead by approximately 37.5% while remaining compatible with various RLVR algorithms, addressing a key bottleneck in scaling LLM training.