y0news
← Feed
Back to feed
🧠 AI🟢 Bullish

Kaleido: Open-Sourced Multi-Subject Reference Video Generation Model

arXiv – CS AI|Zhenxing Zhang, Jiayan Teng, Zhuoyi Yang, Tiankun Cao, Cheng Wang, Xiaotao Gu, Jie Tang, Dan Guo, Meng Wang|
🤖AI Summary

Researchers have introduced Kaleido, an open-source AI model for generating consistent videos from multiple reference images of subjects. The framework addresses key limitations in subject-to-video generation through improved data construction and a novel Reference Rotary Positional Encoding technique.

Key Takeaways
  • Kaleido is a new open-source subject-to-video generation framework that maintains consistency across multiple subjects in generated videos.
  • The model introduces Reference Rotary Positional Encoding (R-RoPE) to better integrate multiple reference images without subject confusion.
  • A dedicated data construction pipeline was developed to filter low-quality samples and synthesize diverse training data.
  • Extensive benchmarks show Kaleido significantly outperforms previous methods in consistency, fidelity, and generalization.
  • The research addresses critical shortcomings in existing S2V models including background disentanglement and multi-subject consistency issues.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles