AIBullisharXiv – CS AI · Mar 277/10
🧠
SWAA: Sliding Window Attention Adaptation for Efficient and Quality Preserving Long Context Processing
Researchers propose SWAA (Sliding Window Attention Adaptation), a toolkit that enables efficient long-context processing in large language models by adapting full attention models to sliding window attention without expensive retraining. The solution achieves 30-100% speedups for long context inference while maintaining acceptable performance quality through four core strategies that address training-inference mismatches.