y0news
← Feed
Back to feed
🧠 AI🟢 BullishImportance 6/10

NCCL EP: Towards a Unified Expert Parallel Communication API for NCCL

arXiv – CS AI|Amos Goldman (NVIDIA Corporation), Nimrod Boker (NVIDIA Corporation), Maayan Sheraizin (NVIDIA Corporation), Nimrod Admoni (NVIDIA Corporation), Artem Polyakov (NVIDIA Corporation), Subhadeep Bhattacharya (NVIDIA Corporation), Fan Yu (NVIDIA Corporation), Kai Sun (NVIDIA Corporation), Georgios Theodorakis (NVIDIA Corporation), Hsin-Chun Yin (NVIDIA Corporation), Peter-Jan Gootzen (NVIDIA Corporation), Aamir Shafi (NVIDIA Corporation), Assaf Ravid (NVIDIA Corporation), Salvatore Di Girolamo (NVIDIA Corporation), Manjunath Gorentla Venkata (NVIDIA Corporation), Gil Bloch (NVIDIA Corporation)|
🤖AI Summary

Researchers have developed NCCL EP, a new communication library for Mixture-of-Experts (MoE) AI model architectures that improves GPU-initiated communication performance. The library provides unified APIs supporting both low-latency inference and high-throughput training modes, built entirely on NVIDIA's NCCL Device API.

Key Takeaways
  • NCCL EP introduces unified ncclEpDispatch and ncclEpCombine primitives for MoE communication with C and Python interfaces.
  • The library supports Low-Latency mode for small batch inference (1-128 tokens) and High-Throughput mode for large batch training (4096+ tokens).
  • Performance evaluation on H100-based clusters shows competitive results with vLLM integration across multi-node configurations.
  • The solution leverages NVIDIA's topology awareness and optimized GPU-initiated implementation for both intra- and inter-node communications.
  • NCCL EP provides a supported pathway for expert parallelism on current and future NVIDIA platforms.
Mentioned in AI
Companies
Nvidia
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles