y0news
← Feed
Back to feed
🧠 AI🟢 BullishImportance 6/10

OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model

arXiv – CS AI|Maomao Li, Zhen Li, Kaipeng Zhang, Guosheng Yin, Zhifeng Li, Dong Xu|
🤖AI Summary

Researchers introduce OmniCustom, a new AI framework that simultaneously customizes both video identity and audio timbre in generated content. The system uses reference images and audio samples to create synchronized audio-video content while allowing users to specify spoken content through text prompts.

Key Takeaways
  • OmniCustom enables synchronized audio-video customization, generating videos that maintain reference image identity while imitating reference audio timbre.
  • The framework uses separate LoRA modules for identity and audio timbre control within a DiT-based architecture.
  • A contrastive learning objective enhances the model's ability to preserve both visual identity and audio characteristics.
  • The system operates in zero-shot manner, requiring no additional training for new reference inputs.
  • Training utilized a large-scale, high-quality audio-visual human dataset to achieve superior performance.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles