Beyond Task-Agnostic: Task-Aware Grouping for Communication-Efficient Multi-Task MoE Inference
Researchers propose Task-Aware Coactivation Grouping (TACG), a framework for optimizing Mixture-of-Experts (MoE) model inference across distributed GPUs by grouping experts based on task-specific activation patterns rather than global averages. The approach reduces communication costs by 31.39% while maintaining load balance, addressing a critical efficiency bottleneck in multi-task AI serving.