AIBullisharXiv – CS AI · 18h ago7/10
🧠
FMplex: Model Virtualization for Serving Extensible Foundation Models
FMplex is a new model-serving system that enables multiple downstream tasks to share a single foundation model backbone through virtualization, reducing memory waste and computational costs. The system achieves up to 80% latency reduction compared to traditional spatial partitioning approaches while enabling clusters to host 6x more tasks simultaneously.
🏢 Meta