AIBullisharXiv – CS AI · 4d ago7/10
🧠Researchers introduce DIDR (Diff-Instruct with Diffused Reward), a reinforcement learning framework that improves one-step text-to-image generation by aligning reward optimization with diffusion dynamics. The method addresses a fundamental mismatch in existing approaches where optimizing for image-space rewards often degrades overall image fidelity, demonstrating superior results compared to current SDXL baselines.
AIBullisharXiv – CS AI · May 127/10
🧠Researchers introduce Auto-Rubric as Reward (ARR), a framework that replaces opaque scalar reward signals in multimodal AI alignment with explicit, structured criteria-based evaluation. By externalizing a model's implicit preferences into interpretable rubrics before comparison, ARR reduces evaluation bias and enables more reliable human-preference alignment in generative models.
AIBullisharXiv – CS AI · May 127/10
🧠HyperTransport is a new hypernetwork framework that dramatically accelerates activation steering for text-to-image models by amortizing optimization costs across multiple concepts. Rather than optimizing intervention parameters for each new concept (which takes minutes), the system learns to map CLIP embeddings directly to steering parameters in a single forward pass, achieving 3600-7000x speedup while matching per-concept baselines on unseen concepts.
AIBullisharXiv – CS AI · May 117/10
🧠Researchers introduce Flow-OPD, a post-training framework that applies on-policy distillation to Flow Matching text-to-image models, addressing reward sparsity and gradient interference problems. Built on Stable Diffusion 3.5 Medium, the method achieves significant performance gains—GenEval scores improve from 63 to 92 and OCR accuracy from 59 to 94—while maintaining image quality and surpassing individual teacher models.
🧠 Stable Diffusion
AIBearisharXiv – CS AI · May 117/10
🧠Researchers have developed OrchJail, a fuzzing framework that discovers vulnerabilities in tool-calling text-to-image AI agents by exploiting how multiple benign steps combine into unsafe outputs. Unlike traditional prompt-injection attacks, OrchJail targets the orchestration layer where agents chain tools together, achieving higher attack success rates while evading existing defenses.
AIBullisharXiv – CS AI · May 117/10
🧠Researchers have developed CASCADE, a novel speculative decoding technique that accelerates autoregressive image generation by up to 3.6x through identifying and exploiting redundancies in neural network representations. The method addresses a critical bottleneck in image synthesis by reducing draft token rejection rates without requiring model retraining, advancing the efficiency of text-to-image AI systems.
AIBullisharXiv – CS AI · May 117/10
🧠Researchers introduce SCOPE, a framework that addresses the challenge of maintaining semantic commitments throughout the text-to-image generation process by using structured specifications and conditional skill orchestration. The framework achieves significantly higher performance on complex image generation tasks, with a new benchmark (Gen-Arena) and evaluation metric (EGIP) designed to measure commitment-level intent realization.
AINeutralarXiv – CS AI · May 47/10
🧠Researchers have identified fundamental limitations in how text-to-image diffusion models handle multi-object generation, finding that scene complexity rather than data imbalance is the primary culprit. Through a controlled framework called MOSAIC, they demonstrate that counting objects is particularly difficult in low-data regimes and that compositional generalization collapses when training combinations are systematically excluded.
AIBullisharXiv – CS AI · May 17/10
🧠Researchers introduce Flow Map Reward Guidance (FMRG), a novel training-free method for guiding generative models toward user-specified objectives using optimal control theory. The approach achieves comparable or superior results to existing baselines while requiring only 3 neural function evaluations, representing a 10x+ speedup over prior methods.
AIBullisharXiv – CS AI · Apr 107/10
🧠DiffSketcher is a novel AI algorithm that generates vector sketches from text prompts by leveraging pre-trained text-to-image diffusion models. The method optimizes Bézier curves using an extended Score Distillation Sampling loss and introduces a stroke initialization strategy based on attention maps, achieving superior results in sketch quality and controllability.
AIBullisharXiv – CS AI · Mar 177/10
🧠Researchers propose LESA, a new framework that accelerates Diffusion Transformers (DiTs) by up to 6.25x using learnable predictors and Kolmogorov-Arnold Networks. The method achieves significant speedups while maintaining or improving generation quality in text-to-image and text-to-video synthesis tasks.
AIBearisharXiv – CS AI · Mar 177/10
🧠New research reveals that despite visual improvements, modern text-to-image models from 2022-2025 perform worse as synthetic training data generators for AI classifiers. The study found that newer models collapse to narrow, aesthetic-focused distributions that lack the diversity needed for effective machine learning training.
AINeutralarXiv – CS AI · Mar 56/10
🧠Researchers have identified Order-to-Space Bias (OTS) in modern image generation models, where the order entities are mentioned in text prompts incorrectly determines spatial layout and role assignments. The study introduces OTS-Bench to measure this bias and demonstrates that targeted fine-tuning and early-stage interventions can reduce the problem while maintaining generation quality.
AIBullisharXiv – CS AI · Mar 46/104
🧠Researchers introduce Conditioned Activation Transport (CAT), a new framework to prevent text-to-image AI models from generating unsafe content while preserving image quality for legitimate prompts. The method uses a geometry-based conditioning mechanism and nonlinear transport maps, validated on Z-Image and Infinity architectures with significantly reduced attack success rates.
AIBullisharXiv – CS AI · Mar 46/103
🧠Researchers have developed TikZilla, a new AI model that generates high-quality scientific figures from text descriptions using TikZ code. The model uses a dataset four times larger than previous versions and combines supervised learning with reinforcement learning to achieve performance matching GPT-5 while using much smaller model sizes.
AIBearisharXiv – CS AI · Mar 47/103
🧠Researchers have developed SemBD, a new semantic-level backdoor attack against text-to-image diffusion models that achieves 100% success rate while evading current defenses. The attack uses continuous semantic regions as triggers rather than fixed textual patterns, making it significantly harder to detect and defend against.
AIBullisharXiv – CS AI · Mar 47/102
🧠Researchers present P-GRAFT, a new method for fine-tuning diffusion models by shaping distributions at intermediate noise levels, showing improved performance on text-to-image generation tasks. The framework achieved an 8.81% relative improvement over base Stable Diffusion v2 model on popular benchmarks.
AIBullisharXiv – CS AI · Mar 37/105
🧠Researchers developed HierarchicalPrune, a compression framework that reduces large-scale text-to-image diffusion models' memory footprint by 77.5-80.4% and latency by 27.9-38.0% while maintaining image quality. The technique enables billion-parameter AI models to run efficiently on resource-constrained devices through hierarchical pruning and knowledge distillation.
AIBullisharXiv – CS AI · Mar 37/104
🧠Researchers propose a new annealing guidance scheduler that dynamically adjusts guidance scales in diffusion models during image generation, improving both image quality and text prompt alignment. The method enhances text-to-image generation performance without requiring additional memory or computational resources.
AIBearisharXiv – CS AI · Feb 277/104
🧠Researchers reveal a critical evaluation bias in text-to-image diffusion models where human preference models favor high guidance scales, leading to inflated performance scores despite poor image quality. The study introduces a new evaluation framework and demonstrates that simply increasing CFG scales can compete with most advanced guidance methods.
AIBullishOpenAI News · Nov 37/105
🧠OpenAI has launched the DALL·E API in public beta, allowing developers to integrate the AI image generation technology into their applications. This marks a significant step in making advanced AI image generation capabilities more widely accessible to developers and businesses.
AIBullishOpenAI News · Jan 57/107
🧠OpenAI has developed DALL·E, a neural network that generates images from text descriptions. This AI system can create visual content for a wide range of concepts that can be expressed in natural language.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers propose Alignment-Guided Score Matching (AGSM), a reward-free post-training method that improves text-to-image alignment in diffusion models by integrating contrastive guidance into the score-matching objective. The approach addresses failure cases like over-counting and repetition in existing methods, achieving 35% improvement in counting accuracy while remaining compatible with major diffusion model architectures.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers introduce SafeDIG, a safety steering framework designed to make text-to-image diffusion transformers like FLUX.1 and Stable Diffusion 3.5 resistant to generating harmful content. The method uses sparse autoencoders and adaptive decoding to maintain safety controls across different risk domains while preserving image quality.
🧠 Stable Diffusion
AINeutralarXiv – CS AI · May 126/10
🧠Researchers propose Cosine-Aware Adaptive Elastic Weight Consolidation (EWC) to improve text-to-image model backdoor attacks while maintaining model fidelity and generalization. The method addresses a fundamental trade-off between attack success and output quality by dynamically adjusting regularization weights based on semantic utility, achieving stronger performance on both in-domain and out-of-domain datasets compared to existing approaches.