AIBullisharXiv – CS AI · 14h ago7/10
🧠
AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling
Researchers introduce AnyMo, a unified framework for conditional human motion generation that supports arbitrary modality combinations (text, speech, music, trajectory). The work is enabled by OmniHuMo, a large-scale dataset of 5,000+ hours of motion with precisely aligned multimodal annotations, addressing the critical bottleneck of training data scarcity in multimodal synthesis.