AIBullisharXiv – CS AI · 3h ago7/10
🧠
How Far Can Disaggregation Go? A Design-Space Exploration of Attention-FFN Disaggregation for Efficient MoE LLM Serving
Researchers present a systematic study of Attention-FFN Disaggregation (AFD), a technique that separates attention and expert layers across different GPU groups to optimize inference serving for Mixture-of-Experts language models. The framework demonstrates that AFD enables 4k tokens/s throughput on DeepSeek-V3.2 under strict latency constraints where traditional disaggregation approaches fail, providing design principles for scaling LLM infrastructure.