44 articles tagged with #text-to-image. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv – CS AI · 6d ago7/10
🧠DiffSketcher is a novel AI algorithm that generates vector sketches from text prompts by leveraging pre-trained text-to-image diffusion models. The method optimizes Bézier curves using an extended Score Distillation Sampling loss and introduces a stroke initialization strategy based on attention maps, achieving superior results in sketch quality and controllability.
AIBearisharXiv – CS AI · Mar 177/10
🧠New research reveals that despite visual improvements, modern text-to-image models from 2022-2025 perform worse as synthetic training data generators for AI classifiers. The study found that newer models collapse to narrow, aesthetic-focused distributions that lack the diversity needed for effective machine learning training.
AIBullisharXiv – CS AI · Mar 177/10
🧠Researchers propose LESA, a new framework that accelerates Diffusion Transformers (DiTs) by up to 6.25x using learnable predictors and Kolmogorov-Arnold Networks. The method achieves significant speedups while maintaining or improving generation quality in text-to-image and text-to-video synthesis tasks.
AINeutralarXiv – CS AI · Mar 56/10
🧠Researchers have identified Order-to-Space Bias (OTS) in modern image generation models, where the order entities are mentioned in text prompts incorrectly determines spatial layout and role assignments. The study introduces OTS-Bench to measure this bias and demonstrates that targeted fine-tuning and early-stage interventions can reduce the problem while maintaining generation quality.
AIBullisharXiv – CS AI · Mar 46/103
🧠Researchers have developed TikZilla, a new AI model that generates high-quality scientific figures from text descriptions using TikZ code. The model uses a dataset four times larger than previous versions and combines supervised learning with reinforcement learning to achieve performance matching GPT-5 while using much smaller model sizes.
AIBullisharXiv – CS AI · Mar 47/102
🧠Researchers present P-GRAFT, a new method for fine-tuning diffusion models by shaping distributions at intermediate noise levels, showing improved performance on text-to-image generation tasks. The framework achieved an 8.81% relative improvement over base Stable Diffusion v2 model on popular benchmarks.
AIBullisharXiv – CS AI · Mar 46/104
🧠Researchers introduce Conditioned Activation Transport (CAT), a new framework to prevent text-to-image AI models from generating unsafe content while preserving image quality for legitimate prompts. The method uses a geometry-based conditioning mechanism and nonlinear transport maps, validated on Z-Image and Infinity architectures with significantly reduced attack success rates.
AIBearisharXiv – CS AI · Mar 47/103
🧠Researchers have developed SemBD, a new semantic-level backdoor attack against text-to-image diffusion models that achieves 100% success rate while evading current defenses. The attack uses continuous semantic regions as triggers rather than fixed textual patterns, making it significantly harder to detect and defend against.
AIBullisharXiv – CS AI · Mar 37/105
🧠Researchers developed HierarchicalPrune, a compression framework that reduces large-scale text-to-image diffusion models' memory footprint by 77.5-80.4% and latency by 27.9-38.0% while maintaining image quality. The technique enables billion-parameter AI models to run efficiently on resource-constrained devices through hierarchical pruning and knowledge distillation.
AIBullisharXiv – CS AI · Mar 37/104
🧠Researchers propose a new annealing guidance scheduler that dynamically adjusts guidance scales in diffusion models during image generation, improving both image quality and text prompt alignment. The method enhances text-to-image generation performance without requiring additional memory or computational resources.
AIBearisharXiv – CS AI · Feb 277/104
🧠Researchers reveal a critical evaluation bias in text-to-image diffusion models where human preference models favor high guidance scales, leading to inflated performance scores despite poor image quality. The study introduces a new evaluation framework and demonstrates that simply increasing CFG scales can compete with most advanced guidance methods.
AIBullishOpenAI News · Nov 37/105
🧠OpenAI has launched the DALL·E API in public beta, allowing developers to integrate the AI image generation technology into their applications. This marks a significant step in making advanced AI image generation capabilities more widely accessible to developers and businesses.
AIBullishOpenAI News · Jan 57/107
🧠OpenAI has developed DALL·E, a neural network that generates images from text descriptions. This AI system can create visual content for a wide range of concepts that can be expressed in natural language.
AIBullisharXiv – CS AI · 1d ago6/10
🧠Researchers introduce PromptEcho, a novel reward construction method for improving text-to-image model training that requires no human annotation or model fine-tuning. By leveraging frozen vision-language models to compute token-level alignment scores, the approach achieves significant performance gains on multiple benchmarks while remaining computationally efficient.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers introduce GLEaN, a visual explainability method that transforms complex AI bias detection into understandable portrait composites, enabling non-technical audiences to grasp how text-to-image models like Stable Diffusion XL associate occupations and identities with specific demographic characteristics.
🧠 Stable Diffusion
AIBullisharXiv – CS AI · 3d ago6/10
🧠Researchers introduce VisionFoundry, a synthetic data generation pipeline that uses LLMs and text-to-image models to create targeted training data for vision-language models. The approach addresses VLMs' weakness in visual perception tasks and demonstrates 7-10% improvements on benchmark tests without requiring human annotation or reference images.
AIBullisharXiv – CS AI · Mar 276/10
🧠Researchers introduce xLARD, a self-correcting framework for text-to-image generation that uses multimodal large language models to provide explainable feedback and improve alignment with complex prompts. The system employs a lightweight corrector that refines latent representations based on structured feedback, addressing challenges in generating images that match fine-grained semantics and spatial relations.
AIBullisharXiv – CS AI · Mar 276/10
🧠Researchers developed UF-FGTG, a framework that automatically converts novice user prompts into model-preferred prompts for text-to-image AI systems. The system uses a novel Coarse-Fine Granularity Prompts dataset and achieved 5% improvement across quality metrics compared to existing methods.
AINeutralarXiv – CS AI · Mar 266/10
🧠Researchers introduce SPARE, a new machine unlearning method for text-to-image diffusion models that efficiently removes unwanted concepts while preserving model performance. The two-stage approach uses parameter localization and self-distillation to achieve selective concept erasure with minimal computational overhead.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers introduced HyCon, a hyperbolic control mechanism for text-to-image models that provides better safety controls by steering generation away from unsafe content. The technique uses hyperbolic representation spaces instead of traditional Euclidean adjustments, achieving state-of-the-art results across multiple safety benchmarks.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers present Centered Reward Distillation (CRD), a new reinforcement learning framework for fine-tuning diffusion models that addresses brittleness issues in existing methods. The approach uses within-prompt centering and drift control techniques to achieve state-of-the-art performance in text-to-image generation while reducing reward hacking and convergence issues.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers introduce Contrastive Noise Optimization, a new method that improves diversity in text-to-image AI generation by optimizing initial noise patterns rather than intermediate outputs. The technique uses contrastive loss to maximize diversity while preserving image quality, achieving superior results across multiple text-to-image model architectures.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers introduce Agentic Retoucher, a new AI framework that fixes common distortions in text-to-image generation through a three-agent system for perception, reasoning, and correction. The system outperformed existing methods on a new 27K-image dataset, potentially improving the quality and reliability of AI-generated images.
AIBullisharXiv – CS AI · Mar 166/10
🧠Researchers propose Naïve PAINE, a lightweight system that improves text-to-image generation quality by predicting which initial noise inputs will produce better results before running the full diffusion model. The approach reduces the need for multiple generation cycles to get satisfactory images by pre-selecting higher-quality noise patterns.
AINeutralarXiv – CS AI · Mar 96/10
🧠Researchers have developed BlackMirror, a new framework for detecting backdoored text-to-image AI models in black-box settings. The system identifies semantic deviations between visual patterns and instructions, offering a training-free solution that can be deployed in Model-as-a-Service applications.