AIBullisharXiv – CS AI · 7h ago6/10
🧠
Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning
Researchers present SWARR, a two-stage method combining supervised fine-tuning and reinforcement learning to make sliding-window attention (SWA) competitive with standard self-attention for mathematical reasoning tasks. By using RL to adapt model trajectories to SWA's architectural constraints, the approach recovers much of the accuracy lost during conversion while maintaining linear-complexity efficiency benefits.