AINeutralarXiv – CS AI · 9h ago6/10
🧠
Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation
Researchers introduce SemanticSeg, a large semantic segmentation dataset, and block distillation framework to improve block attention mechanisms for long-context language models. The approach uses a frozen full-attention teacher to train block-attention students more efficiently, addressing key challenges in KV cache reuse for applications like RAG.