AIBullisharXiv – CS AI · 2d ago7/10
🧠Researchers propose a render-free framework for 3D-aware video diffusion models that uses compressed mesh tokens instead of 2D rendered guidance to control human motion in generated videos. By processing 3D geometric information directly alongside video tokens, the approach demonstrates improved performance on motion control tasks while reducing artifacts associated with traditional 2D guidance methods.
AIBullisharXiv – CS AI · May 277/10
🧠Researchers have developed a bias correction technique for quantizing KV-cache memory in video diffusion models, addressing a fundamental problem where quantization noise causes inflated attention to cached data. The method recovers near-full quality video generation while using 50% less memory than standard approaches, enabling longer video synthesis without sacrificing output quality.
AIBullisharXiv – CS AI · May 97/10
🧠Researchers introduce EA-WM, an event-aware generative world model that bridges kinematic control and visual perception for robotic systems. By projecting robot actions directly into camera views as structured kinematic-to-visual action fields rather than abstract tokens, the model achieves state-of-the-art performance on the WorldArena benchmark, significantly advancing robot learning and simulation capabilities.
AIBullisharXiv – CS AI · Mar 57/10
🧠Researchers have developed Phys4D, a new pipeline that enhances video diffusion models with physics-consistent 4D world representations through a three-stage training process. The system addresses current limitations where AI-generated videos often exhibit physically implausible dynamics, using pseudo-supervised pretraining, physics-grounded fine-tuning, and reinforcement learning to improve spatiotemporal consistency.
AIBullisharXiv – CS AI · Mar 46/102
🧠Researchers introduce Frame Guidance, a training-free method for controllable video generation using diffusion models. The technique enables fine-grained control over video generation through frame-level signals like keyframes and style references without requiring expensive fine-tuning of large-scale models.
AIBullisharXiv – CS AI · 2d ago6/10
🧠Researchers present a compression pipeline for large video diffusion models that combines few-step distillation with low-bit quantization, enabling efficient deployment without sacrificing visual quality. The approach treats dual-expert denoising branches separately and achieves better results than the original model at inference speeds of 8-20 steps.
AIBullisharXiv – CS AI · 6d ago6/10
🧠Researchers introduce VideoMLA, a novel approach that reduces KV cache memory requirements in video diffusion models by 92.7% through Multi-Head Latent Attention, enabling longer video generation with improved efficiency. The method challenges conventional assumptions about low-rank approximations in video models and demonstrates comparable quality to existing methods while improving throughput by 23%.
AIBullisharXiv – CS AI · May 276/10
🧠Researchers present a new quantization method for large video diffusion models that achieves 59.3% memory reduction while maintaining near-baseline quality. The technique addresses challenges in compressing Wan2.2-I2V's mixture-of-experts architecture by using timestep-aware and expert-specific calibration strategies.
AINeutralarXiv – CS AI · Mar 45/103
🧠Researchers have developed new methods to understand how Video Diffusion Transformers convert motion-related text descriptions into video content. The study introduces GramCol and Interpretable Motion-Attentive Maps (IMAP) to spatially and temporally localize motion concepts in AI-generated videos without requiring gradient calculations.
AIBullisharXiv – CS AI · Mar 36/104
🧠Researchers propose ANSE, a new framework that improves video generation quality in diffusion models by intelligently selecting initial noise seeds based on the model's internal attention patterns. The method uses Bayesian uncertainty quantification to identify high-quality seeds that produce better video quality and temporal coherence with minimal computational overhead.