173 articles tagged with #diffusion-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv β CS AI Β· 2d ago7/10
π§ Researchers introduce FS-DFM, a discrete flow-matching model that generates long text 128x faster than standard diffusion models while maintaining quality parity. The breakthrough uses few-step sampling with teacher guidance distillation, achieving in 8 steps what previously required 1,024 evaluations.
π’ Perplexity
AIBullisharXiv β CS AI Β· 2d ago7/10
π§ Researchers introduce PnP-CM, a new method that reformulates consistency models as proximal operators within plug-and-play frameworks for solving inverse problems. The approach achieves high-quality image reconstructions with minimal neural function evaluations (4 NFEs), demonstrating practical efficiency gains over existing consistency model solvers and marking the first application of CMs to MRI data.
AIBullisharXiv β CS AI Β· 2d ago7/10
π§ Researchers introduce Introspective Diffusion Language Models (I-DLM), a new approach that combines the parallel generation speed of diffusion models with the quality of autoregressive models by ensuring models verify their own outputs. I-DLM achieves performance matching conventional large language models while delivering 3x higher throughput, potentially reshaping how AI systems are deployed at scale.
AIBearisharXiv β CS AI Β· 3d ago7/10
π§ Researchers demonstrate a critical vulnerability in diffusion-based language models where safety mechanisms can be bypassed by re-masking committed refusal tokens and injecting affirmative prefixes, achieving 76-82% attack success rates without gradient optimization. The findings reveal that dLLM safety relies on a fragile architectural assumption rather than robust adversarial defenses.
AIBullisharXiv β CS AI Β· 3d ago7/10
π§ Researchers propose Advantage-Guided Diffusion (AGD-MBRL), a novel approach that improves model-based reinforcement learning by using advantage estimates to guide diffusion models during trajectory generation. The method addresses the short-horizon myopia problem in existing diffusion-based world models and demonstrates 2x performance improvements over current baselines on MuJoCo control tasks.
AIBullisharXiv β CS AI Β· 6d ago7/10
π§ DiffSketcher is a novel AI algorithm that generates vector sketches from text prompts by leveraging pre-trained text-to-image diffusion models. The method optimizes BΓ©zier curves using an extended Score Distillation Sampling loss and introduces a stroke initialization strategy based on attention maps, achieving superior results in sketch quality and controllability.
AIBullisharXiv β CS AI Β· 6d ago7/10
π§ Researchers demonstrate a data-efficient fine-tuning method for text-to-video diffusion models that enables new generative controls using sparse, low-quality synthetic data rather than expensive, photorealistic datasets. Counterintuitively, models trained on simple synthetic data outperform those trained on high-fidelity real data, supported by both empirical results and theoretical justification.
AIBullisharXiv β CS AI Β· Apr 77/10
π§ Researchers have developed a method to unlock prompt infilling capabilities in masked diffusion language models by extending full-sequence masking during supervised fine-tuning, rather than the conventional response-only masking. This breakthrough enables models to automatically generate effective prompts that match or exceed manually designed templates, suggesting training practices rather than architectural limitations were the primary constraint.
AIBullisharXiv β CS AI Β· Apr 67/10
π§ Researchers introduce OSCAR, a training-free framework that reduces AI hallucinations in diffusion language models by using cross-chain entropy to detect uncertain token positions during generation. The system runs parallel denoising chains and performs targeted remasking with retrieved evidence to improve factual accuracy without requiring external hallucination classifiers.
AINeutralarXiv β CS AI Β· Mar 277/10
π§ Researchers identified critical security vulnerabilities in Diffusion Large Language Models (dLLMs) that differ from traditional autoregressive LLMs, stemming from their iterative generation process. They developed DiffuGuard, a training-free defense framework that reduces jailbreak attack success rates from 47.9% to 14.7% while maintaining model performance.
AIBullisharXiv β CS AI Β· Mar 277/10
π§ Researchers have published a comprehensive review of Large Language Models for Autonomous Driving (LLM4AD), introducing new benchmarks and conducting real-world experiments on autonomous vehicle platforms. The paper explores how LLMs can enhance perception, decision-making, and motion control in self-driving cars, while identifying key challenges including latency, security, and safety concerns.
AIBearisharXiv β CS AI Β· Mar 267/10
π§ Research reveals that multimodal large language models (MLLMs) pose greater safety risks than diffusion models for image generation, producing more unsafe content and creating images that are harder for detection systems to identify. The enhanced semantic understanding capabilities of MLLMs, while more powerful, enable them to interpret complex prompts that lead to dangerous outputs including fake image synthesis.
AINeutralarXiv β CS AI Β· Mar 267/10
π§ Researchers developed Anti-I2V, a new defense system that protects personal photos from being used to create malicious deepfake videos through image-to-video AI models. The system works across different AI architectures by operating in multiple domains and targeting specific network layers to degrade video generation quality.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers have developed UniVid, a new pyramid diffusion model that unifies text-to-video and image-to-video generation into a single system. The model uses dual-stream cross-attention mechanisms to process both text prompts and reference images, achieving superior temporal coherence across different video generation tasks.
AIBearisharXiv β CS AI Β· Mar 177/10
π§ New research reveals that despite visual improvements, modern text-to-image models from 2022-2025 perform worse as synthetic training data generators for AI classifiers. The study found that newer models collapse to narrow, aesthetic-focused distributions that lack the diversity needed for effective machine learning training.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers introduce MARVAL, a distillation framework that accelerates masked auto-regressive diffusion models by compressing inference into a single step while enabling practical reinforcement learning applications. The method achieves 30x speedup on ImageNet with comparable quality, making RL post-training feasible for the first time with these models.
AINeutralarXiv β CS AI Β· Mar 177/10
π§ Researchers introduce Safety-Guided Flow (SGF), a unified probabilistic framework that combines control barrier functions with negative guidance approaches to improve safety in AI-generated content. The framework identifies a critical time window during the denoising process where strong negative guidance is most effective for preventing harmful outputs.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers propose LESA, a new framework that accelerates Diffusion Transformers (DiTs) by up to 6.25x using learnable predictors and Kolmogorov-Arnold Networks. The method achieves significant speedups while maintaining or improving generation quality in text-to-image and text-to-video synthesis tasks.
AIBullisharXiv β CS AI Β· Mar 167/10
π§ Researchers developed a new reinforcement learning approach for training diffusion language models that uses entropy-guided step selection and stepwise advantages to overcome challenges with sequence-level likelihood calculations. The method achieves state-of-the-art results on coding and logical reasoning benchmarks while being more computationally efficient than existing approaches.
AIBearisharXiv β CS AI Β· Mar 167/10
π§ Researchers have identified a critical vulnerability in image protection systems that use adversarial perturbations to prevent unauthorized AI editing. Two new purification methods can effectively remove these protections, creating a 'purify-once, edit-freely' attack where images become vulnerable to unlimited manipulation.
AIBullisharXiv β CS AI Β· Mar 127/10
π§ Researchers developed ES-dLLM, a training-free inference acceleration framework that speeds up diffusion large language models by selectively skipping tokens in early layers based on importance scoring. The method achieves 5.6x to 16.8x speedup over vanilla implementations while maintaining generation quality, offering a promising alternative to autoregressive models.
π’ Nvidia
AIBearisharXiv β CS AI Β· Mar 117/10
π§ Researchers developed NetDiffuser, a framework that uses diffusion models to generate natural adversarial examples capable of deceiving AI-based network intrusion detection systems. The system achieved up to 29.93% higher attack success rates compared to baseline attacks, highlighting significant vulnerabilities in current deep learning-based security systems.
AIBullisharXiv β CS AI Β· Mar 117/10
π§ Researchers introduce FCDM, a fully convolutional diffusion model based on ConvNeXt architecture that achieves competitive performance with DiT-XL/2 using only 50% of the computational resources. The model demonstrates exceptional training efficiency, requiring 7x fewer training steps and can be trained on just 4 GPUs, reviving convolutional networks as an efficient alternative to Transformer-based diffusion models.
AIBullisharXiv β CS AI Β· Mar 97/10
π§ Researchers introduce PSIVG, a framework that integrates physical simulators into AI video generation to ensure generated videos obey real-world physics like gravity and collision. The system reconstructs 4D scenes from template videos and uses physical simulations to guide video generators toward more realistic motion while maintaining visual quality.
AIBullisharXiv β CS AI Β· Mar 56/10
π§ EgoWorld is a new AI framework that converts third-person camera views into first-person perspectives using 3D data and diffusion models. The technology addresses limitations in current methods and shows strong performance across multiple datasets, with applications in AR, VR, and robotics.