AIBullisharXiv โ CS AI ยท 10h ago7/10
๐ง
SWAA: Sliding Window Attention Adaptation for Efficient and Quality Preserving Long Context Processing
Researchers propose SWAA (Sliding Window Attention Adaptation), a toolkit that enables efficient long-context processing in large language models by adapting full attention models to sliding window attention without expensive retraining. The solution achieves 30-100% speedups for long context inference while maintaining acceptable performance quality through four core strategies that address training-inference mismatches.