#diffusion-models News & Analysis

173 articles tagged with #diffusion-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

173 articles

AIBullisharXiv – CS AI · Mar 176/10

🧠

RAZOR: Ratio-Aware Layer Editing for Targeted Unlearning in Vision Transformers and Diffusion Models

Researchers introduce RAZOR, a new framework for efficiently removing sensitive information from AI models like CLIP and Stable Diffusion without requiring full retraining. The method selectively edits specific layers and attention heads in transformer models to achieve targeted 'unlearning' while preserving overall performance.

🧠 Stable Diffusion

AINeutralarXiv – CS AI · Mar 176/10

🧠

Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Researchers conducted the first systematic study on post-training quantization for diffusion large language models (dLLMs), identifying activation outliers as a key challenge for compression. The study evaluated state-of-the-art quantization methods across multiple dimensions to provide insights for efficient dLLM deployment on edge devices.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Diverse Text-to-Image Generation via Contrastive Noise Optimization

Researchers introduce Contrastive Noise Optimization, a new method that improves diversity in text-to-image AI generation by optimizing initial noise patterns rather than intermediate outputs. The technique uses contrastive loss to maximize diversity while preserving image quality, achieving superior results across multiple text-to-image model architectures.

AINeutralarXiv – CS AI · Mar 176/10

🧠

EgoGrasp: World-Space Hand-Object Interaction Estimation from Egocentric Videos

EgoGrasp introduces the first method to reconstruct world-space hand-object interactions from egocentric videos using open-vocabulary objects. The multi-stage framework combines vision foundation models with body-guided diffusion models to achieve state-of-the-art performance in 3D scene reconstruction and hand pose estimation.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Agentic Retoucher for Text-To-Image Generation

Researchers introduce Agentic Retoucher, a new AI framework that fixes common distortions in text-to-image generation through a three-agent system for perception, reasoning, and correction. The system outperformed existing methods on a new 27K-image dataset, potentially improving the quality and reliability of AI-generated images.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Na\"ive PAINE: Lightweight Text-to-Image Generation Improvement with Prompt Evaluation

Researchers propose Naïve PAINE, a lightweight system that improves text-to-image generation quality by predicting which initial noise inputs will produce better results before running the full diffusion model. The approach reduces the need for multiple generation cycles to get satisfactory images by pre-selecting higher-quality noise patterns.

AIBullisharXiv – CS AI · Mar 116/10

🧠

Latent-DARM: Bridging Discrete Diffusion And Autoregressive Models For Reasoning

Researchers introduce Latent-DARM, a framework that bridges discrete diffusion language models and autoregressive models to improve multi-agent AI reasoning capabilities. The system achieved significant improvements on reasoning benchmarks, increasing accuracy from 27% to 36% on DART-5 while using less than 2.2% of the token budget of state-of-the-art models.

AINeutralarXiv – CS AI · Mar 116/10

🧠

Latent Generative Models with Tunable Complexity for Compressed Sensing and other Inverse Problems

Researchers developed tunable-complexity priors for generative models (diffusion models, normalizing flows, and variational autoencoders) that can dynamically adjust complexity based on the specific inverse problem. The approach uses nested dropout and demonstrates superior performance across compressed sensing, inpainting, denoising, and phase retrieval tasks compared to fixed-complexity baselines.

AIBullisharXiv – CS AI · Mar 96/10

🧠

TempoSyncDiff: Distilled Temporally-Consistent Diffusion for Low-Latency Audio-Driven Talking Head Generation

Researchers introduce TempoSyncDiff, a new AI framework that uses distilled diffusion models to generate realistic talking head videos from audio with significantly reduced computational latency. The system addresses key challenges in AI-driven video synthesis including temporal instability, identity drift, and audio-visual alignment while enabling deployment on edge computing devices.

AIBullisharXiv – CS AI · Mar 96/10

🧠

Place-it-R1: Unlocking Environment-aware Reasoning Potential of MLLM for Video Object Insertion

Researchers introduce Place-it-R1, an AI framework that uses Multimodal Large Language Models to insert objects into videos while maintaining physical realism. The system employs Chain-of-Thought reasoning to ensure inserted objects interact naturally with their environment, addressing the gap between visual quality and physical plausibility in video editing.

AINeutralarXiv – CS AI · Mar 96/10

🧠

ContextBench: Modifying Contexts for Targeted Latent Activation

Researchers have developed ContextBench, a new benchmark for evaluating methods that generate targeted inputs to trigger specific behaviors in language models. The study introduces enhanced Evolutionary Prompt Optimization techniques that better balance effectiveness in activating AI model features while maintaining linguistic fluency.

AIBullisharXiv – CS AI · Mar 55/10

🧠

Weight Space Representation Learning via Neural Field Adaptation

Researchers have developed a new approach using multiplicative LoRA (Low-Rank Adaptation) weights for neural field representation learning, achieving improved quality in reconstruction, generation, and analysis tasks. The method constrains optimization space through pre-trained base models, creating structured weight representations that outperform existing weight-space methods when used with latent diffusion models.

AIBullishHugging Face Blog · Mar 56/10

🧠

Introducing Modular Diffusers - Composable Building Blocks for Diffusion Pipelines

The article introduces Modular Diffusers, a new framework for building composable and flexible diffusion model pipelines. This development allows developers to create more modular AI systems by breaking down diffusion processes into reusable components.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Does FLUX Already Know How to Perform Physically Plausible Image Composition?

Researchers introduce SHINE, a training-free framework that enables FLUX and other diffusion models to perform high-quality image composition without retraining. The framework addresses complex lighting scenarios like shadows and reflections, achieving state-of-the-art performance on new benchmark ComplexCompo.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Sketch2Colab: Sketch-Conditioned Multi-Human Animation via Controllable Flow Distillation

Sketch2Colab is a new AI system that converts 2D sketches into realistic 3D multi-human animations with precise control over interactions and movements. The technology uses a novel approach combining sketch-driven diffusion with rectified-flow distillation for faster, more stable animation generation than existing methods.

AIBullisharXiv – CS AI · Mar 36/104

🧠

MAP-Diff: Multi-Anchor Guided Diffusion for Progressive 3D Whole-Body Low-Dose PET Denoising

Researchers developed MAP-Diff, a multi-anchor guided diffusion framework that improves 3D whole-body PET scan denoising by using intermediate-dose scans as trajectory anchors. The method achieves significant improvements in image quality metrics, increasing PSNR from 42.48 dB to 43.71 dB while reducing radiation exposure for patients.

AIBullisharXiv – CS AI · Mar 36/104

🧠

DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing

DragFlow introduces the first framework to leverage FLUX's DiT priors for drag-based image editing, addressing distortion issues that plagued earlier Stable Diffusion-based approaches. The system uses region-based editing with affine transformations instead of point-based supervision, achieving state-of-the-art results on benchmarks.

AIBullisharXiv – CS AI · Mar 36/103

🧠

EquiReg: Equivariance Regularized Diffusion for Inverse Problems

Researchers propose EquiReg, a new framework that improves diffusion models for inverse problems like image restoration by keeping sampling trajectories on the data manifold. The method uses equivariance regularization to guide sampling toward symmetry-preserving regions, enabling high-quality reconstructions with fewer sampling steps.

AIBullisharXiv – CS AI · Mar 36/104

🧠

LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation

LiftAvatar is a new AI system that enhances 3D avatar animation by completing sparse monocular video observations in kinematic space using expression-controlled video diffusion Transformers. The technology addresses limitations in 3D Gaussian Splatting-based avatars by generating high-quality, temporally coherent facial expressions from single or multiple reference images.

AIBullisharXiv – CS AI · Mar 37/104

🧠

FreeAct: Freeing Activations for LLM Quantization

Researchers propose FreeAct, a new quantization framework for Large Language Models that improves efficiency by using dynamic transformation matrices for different token types. The method achieves up to 5.3% performance improvement over existing approaches by addressing the memory and computational overhead challenges in LLMs.

AIBullisharXiv – CS AI · Mar 37/108

🧠

State-Action Inpainting Diffuser for Continuous Control with Delay

Researchers introduce State-Action Inpainting Diffuser (SAID), a new AI framework that addresses signal delay challenges in continuous control and reinforcement learning. SAID combines model-based and model-free approaches using a generative formulation that can be applied to both online and offline RL, demonstrating state-of-the-art performance on delayed control benchmarks.

AINeutralarXiv – CS AI · Mar 37/106

🧠

StaTS: Spectral Trajectory Schedule Learning for Adaptive Time Series Forecasting with Frequency Guided Denoiser

Researchers introduce StaTS, a new diffusion model for time series forecasting that learns adaptive noise schedules and uses frequency-guided denoising. The model addresses limitations of fixed noise schedules in existing diffusion models by incorporating spectral regularization and data-adaptive scheduling for improved structural preservation.

$NEAR

AIBullisharXiv – CS AI · Mar 36/103

🧠

Latent Diffusion Model without Variational Autoencoder

Researchers introduce SVG, a new latent diffusion model that eliminates the need for variational autoencoders by using self-supervised representations. The approach leverages frozen DINO features to create semantically structured latent spaces, enabling faster training, fewer sampling steps, and better generative quality while maintaining semantic capabilities.

AIBullisharXiv – CS AI · Mar 36/104

🧠

Closed-Loop Action Chunks with Dynamic Corrections for Training-Free Diffusion Policy

Researchers have developed DCDP, a Dynamic Closed-Loop Diffusion Policy framework that significantly improves robotic manipulation in dynamic environments. The system achieves 19% better adaptability without retraining while requiring only 5% additional computational overhead through real-time action correction and environmental dynamics integration.

AIBullisharXiv – CS AI · Mar 37/108

🧠

Breaking the Factorization Barrier in Diffusion Language Models

Researchers introduce Coupled Discrete Diffusion (CoDD), a breakthrough framework that solves the "factorization barrier" in diffusion language models by enabling parallel token generation without sacrificing coherence. The approach uses a lightweight probabilistic inference layer to model complex joint dependencies while maintaining computational efficiency.

← PrevPage 4 of 7Next →