#video-diffusion News & Analysis

10 articles tagged with #video-diffusion. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

10 articles

AIBullisharXiv – CS AI · 2d ago7/10

🧠

Towards 3D-Aware Video Diffusion Models: Render-Free Human Motion Control with Mesh Tokenization

Researchers propose a render-free framework for 3D-aware video diffusion models that uses compressed mesh tokens instead of 2D rendered guidance to control human motion in generated videos. By processing 3D geometric information directly alongside video tokens, the approach demonstrates improved performance on motion control tasks while reducing artifacts associated with traditional 2D guidance methods.

AIBullisharXiv – CS AI · May 277/10

🧠

Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion

Researchers have developed a bias correction technique for quantizing KV-cache memory in video diffusion models, addressing a fundamental problem where quantization noise causes inflated attention to cached data. The method recovers near-full quality video generation while using 50% less memory than standard approaches, enabling longer video synthesis without sacrificing output quality.

AIBullisharXiv – CS AI · May 97/10

🧠

EA-WM: Event-Aware Generative World Model with Structured Kinematic-to-Visual Action Fields

Researchers introduce EA-WM, an event-aware generative world model that bridges kinematic control and visual perception for robotic systems. By projecting robot actions directly into camera views as structured kinematic-to-visual action fields rather than abstract tokens, the model achieves state-of-the-art performance on the WorldArena benchmark, significantly advancing robot learning and simulation capabilities.

AIBullisharXiv – CS AI · Mar 57/10

🧠

Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion

Researchers have developed Phys4D, a new pipeline that enhances video diffusion models with physics-consistent 4D world representations through a three-stage training process. The system addresses current limitations where AI-generated videos often exhibit physically implausible dynamics, using pseudo-supervised pretraining, physics-grounded fine-tuning, and reinforcement learning to improve spatiotemporal consistency.

AIBullisharXiv – CS AI · Mar 46/102

🧠

Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models

Researchers introduce Frame Guidance, a training-free method for controllable video generation using diffusion models. The technique enables fine-grained control over video generation through frame-level signals like keyframes and style references without requiring expensive fine-tuning of large-scale models.

AIBullisharXiv – CS AI · 2d ago6/10

🧠

Collaborative Few-Step Distillation and Low-Bit Quantization for Wan2.2 Dual-Expert Video Diffusion Models

Researchers present a compression pipeline for large video diffusion models that combines few-step distillation with low-bit quantization, enabling efficient deployment without sacrificing visual quality. The approach treats dual-expert denoising branches separately and achieves better results than the original model at inference speeds of 8-20 steps.

AIBullisharXiv – CS AI · 6d ago6/10

🧠

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

Researchers introduce VideoMLA, a novel approach that reduces KV cache memory requirements in video diffusion models by 92.7% through Multi-Head Latent Attention, enabling longer video generation with improved efficiency. The method challenges conventional assumptions about low-rank approximations in video models and demonstrates comparable quality to existing methods while improving throughput by 23%.

AIBullisharXiv – CS AI · May 276/10

🧠

Timestep-Aware SVDQuant-GPTQ for W4A4 Quantization of Wan2.2-I2V

Researchers present a new quantization method for large video diffusion models that achieves 59.3% memory reduction while maintaining near-baseline quality. The technique addresses challenges in compressing Wan2.2-I2V's mixture-of-experts architecture by using timestep-aware and expert-specific calibration strategies.

AINeutralarXiv – CS AI · Mar 45/103

🧠

Interpretable Motion-Attentive Maps: Spatio-Temporally Localizing Concepts in Video Diffusion Transformers

Researchers have developed new methods to understand how Video Diffusion Transformers convert motion-related text descriptions into video content. The study introduces GramCol and Interpretable Motion-Attentive Maps (IMAP) to spatially and temporally localize motion concepts in AI-generated videos without requiring gradient calculations.

AIBullisharXiv – CS AI · Mar 36/104

🧠

Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion Model

Researchers propose ANSE, a new framework that improves video generation quality in diffusion models by intelligently selecting initial noise seeds based on the model's internal attention patterns. The method uses Bayesian uncertainty quantification to identify high-quality seeds that produce better video quality and temporal coherence with minimal computational overhead.