y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#generative-models News & Analysis

77 articles tagged with #generative-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

77 articles
AINeutralarXiv – CS AI · 4d ago6/10
🧠

DSA-Tokenizer: Disentangled Semantic-Acoustic Tokenization via Flow Matching-based Hierarchical Fusion

Researchers introduce DSA-Tokenizer, a novel speech tokenization system that separates semantic content from acoustic style using distinct optimization paths and Flow Matching decoders. The approach enables discrete Speech LLMs to achieve better disentanglement while supporting efficient voice cloning and high-fidelity speech generation with minimal inference steps.

AINeutralarXiv – CS AI · 4d ago6/10
🧠

Personalized Generative Models for Contextual Debiasing

Researchers introduce DecoupleGen, a method that uses personalized text-to-image diffusion models to generate training data featuring objects in rare contextual scenarios. This approach addresses a critical limitation in computer vision models that perform better on common object-context combinations, potentially improving recognition accuracy for edge cases without requiring expensive real-world data collection.

AINeutralarXiv – CS AI · 4d ago6/10
🧠

GICDM: Mitigating Hubness for Reliable Distance-Based Generative Model Evaluation

Researchers introduce GICDM, an improved method for evaluating generative models that corrects the hubness phenomenon—a distortion in high-dimensional spaces that skews distance-based metrics and nearest-neighbor relationships. The technique builds on classical ICDM and includes multi-scale extensions, demonstrating improved alignment with human assessment across synthetic and real benchmarks.

AINeutralarXiv – CS AI · 4d ago6/10
🧠

Atom-level Protein Representation Learning Improves Protein Structure Prediction

Researchers introduce TriProRep, a protein representation learning method that jointly models amino acid identity, backbone geometry, and full-atom geometry to improve protein structure prediction. The new approach outperforms sequence-only and prior structure-aware models across multiple benchmarks including homodimer co-folding and monomer structure prediction tasks.

AINeutralarXiv – CS AI · 4d ago6/10
🧠

Cross-scale Aligned Supervision for Training GANs

Researchers propose CAT (Cross-scale Aligned Transformer), a new GAN training method that addresses the cross-scale trajectory misalignment problem in multi-stage image generation. By adding consistency regularization between intermediate and final outputs, CAT achieves state-of-the-art results on ImageNet-256 with one-step inference, reaching FID-50K of 1.56 after just 60 training epochs.

AINeutralarXiv – CS AI · 4d ago6/10
🧠

MatFormBench: A Benchmarking Evaluation Framework for Target-Driven Materials Formulation

Researchers introduce MatFormBench, a comprehensive benchmarking framework designed to evaluate inverse design algorithms for materials formulation—addressing a critical gap in machine learning benchmarks that previously focused only on forward property prediction. The framework tests 39 diverse algorithms across 1,170 evaluations, revealing that diffusion-based models achieve superior overall performance, while VAE and genetic algorithm approaches excel in specific scenarios.

AINeutralarXiv – CS AI · 4d ago6/10
🧠

When Eyes Betray AI: Social Gaze Consistency as a Semantic Cue for AI-Generated Image Detection

Researchers introduce Social Gaze Consistency as a novel method to detect AI-generated images by analyzing the coherence of eye direction and head-eye alignment between people. The technique achieves meaningful improvements in detection accuracy across multiple vision models, suggesting that high-level semantic features offer advantages over traditional low-level artifact detection as generative models become more sophisticated.

AINeutralarXiv – CS AI · 4d ago6/10
🧠

PolyFusionAgent: A Multimodal Foundation Model and Autonomous AI Assistant for Polymer Property Prediction and Inverse Design

Researchers introduce PolyFusionAgent, a multimodal AI framework combining a foundation model (PolyFusion) with an autonomous design agent (PolyAgent) for polymer discovery. The system integrates multiple polymer representations into a shared latent space to predict properties and generate novel structures, while grounding predictions in scientific literature for actionable design decisions.

AINeutralarXiv – CS AI · 4d ago6/10
🧠

Self-Cascaded Diffusion Models for Arbitrary-Scale Image Super-Resolution

Researchers introduce CasArbi, a self-cascaded diffusion framework that enables arbitrary-scale image super-resolution by decomposing scaling factors into sequential steps rather than handling them simultaneously. The method combines coordinate-conditioned diffusion models with self-consistency guidance to achieve superior scale consistency and outperforms existing approaches on multiple benchmarks.

AINeutralarXiv – CS AI · May 126/10
🧠

Marrying Generative Model of Healthcare Events with Digital Twin of Social Determinants of Health for Disease Reasoning

Researchers develop a generative AI model that integrates social determinants of health (SDoH) with multi-organ sensor data and medical events to improve disease prediction and personalized clinical decision support. Tested on UK Biobank data spanning nearly 500,000 medical histories, the model outperforms existing autoregressive disease prediction systems by explicitly modeling socioeconomic factors alongside imaging and biomarker data.

AINeutralarXiv – CS AI · May 126/10
🧠

SLayerGen: a Crystal Generative Model for all Space and Layer Groups

SLayerGen introduces a generative AI model capable of creating crystal structures constrained to space and layer groups, addressing limitations in existing models that fail to account for diperiodic materials like 2D superconductors and thin film semiconductors. The model combines discrete autoregressive lattice generation, transformer-based sampling, and equivariant diffusion, achieving superior performance on layered material discovery while correcting mathematical inconsistencies in prior diffusion approaches.

AINeutralarXiv – CS AI · May 126/10
🧠

Beyond the False Trade-off: Adaptive EWC for Stealthy and Generalizable T2I Backdoors

Researchers propose Cosine-Aware Adaptive Elastic Weight Consolidation (EWC) to improve text-to-image model backdoor attacks while maintaining model fidelity and generalization. The method addresses a fundamental trade-off between attack success and output quality by dynamically adjusting regularization weights based on semantic utility, achieving stronger performance on both in-domain and out-of-domain datasets compared to existing approaches.

AINeutralarXiv – CS AI · May 126/10
🧠

Deterministic Decomposition of Stochastic Generative Dynamics

Researchers propose Bridge Matching, a novel framework that decomposes stochastic generative model dynamics into deterministic transport and diffusion-induced osmotic effects. This decomposition enables more interpretable and controllable generative sampling by separately parameterizing how probability mass moves versus how stochastic fluctuations affect the process.

AINeutralarXiv – CS AI · May 126/10
🧠

Micro-Defects Expose Macro-Fakes: Detecting AI-Generated Images via Local Distributional Shifts

Researchers propose MDMF, a detection framework that identifies AI-generated images by amplifying micro-scale statistical irregularities rather than relying on global semantic features. The method uses patch-wise analysis and Maximum Mean Discrepancy to distinguish synthetic images from real ones with higher accuracy than existing detectors.

AIBullisharXiv – CS AI · May 126/10
🧠

Kinetic-Optimal Scheduling with Moment Correction for Metric-Induced Discrete Flow Matching in Zero-Shot Text-to-Speech

Researchers introduce GibbsTTS, a new zero-shot text-to-speech system using metric-induced discrete flow matching with kinetic-optimal scheduling and moment correction. The method achieves superior naturalness and speaker similarity compared to existing masked generative models and state-of-the-art TTS systems without requiring hyperparameter tuning.

AIBullisharXiv – CS AI · May 126/10
🧠

Any2Any 3D Diffusion Models with Knowledge Transfer: A Radiotherapy Planning Study

Researchers introduced DiffKT3D, a 3D diffusion model framework that applies knowledge transfer from video diffusion models to radiotherapy dose prediction. The approach achieves state-of-the-art results by reducing prediction error by 7% compared to previous benchmarks while maintaining clinical alignment through reinforcement learning post-training.

AINeutralarXiv – CS AI · May 116/10
🧠

Curated Synthetic Data Doesn't Have to Collapse: A Theoretical Study of Generative Retraining with Pluralistic Preferences

Researchers demonstrate that model collapse during recursive synthetic data retraining can be prevented by curating outputs across multiple reward functions rather than a single objective. The study provides theoretical proof that diverse preference aggregation leads to stable distributions satisfying Nash bargaining solutions, offering a framework for maintaining output diversity in AI training loops.

AINeutralarXiv – CS AI · May 115/10
🧠

Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow

Researchers introduce Drifting Field Policy (DFP), a one-step generative policy that uses Wasserstein gradient flow to optimize reinforcement learning without ODE-based approaches. DFP demonstrates state-of-the-art performance on robotic manipulation tasks, suggesting a potential shift in how generative models are applied to control problems.

AINeutralarXiv – CS AI · May 116/10
🧠

On the Tradeoffs of On-Device Generative Models in Federated Predictive Maintenance Systems

Researchers analyze generative models (VAEs, GANs, and Diffusion Models) within federated learning frameworks for predictive maintenance in IoT systems, revealing critical tradeoffs between model performance, communication efficiency, and training stability. The study introduces a taxonomy for partial component sharing that enables personalization while reducing bandwidth demands, with findings suggesting diffusion models may outperform alternatives in heterogeneous, bandwidth-constrained environments.

AINeutralarXiv – CS AI · May 116/10
🧠

Supervised sparse auto-encoders for interpretable and compositional representations

Researchers have developed supervised sparse auto-encoders (SAEs) that improve mechanistic interpretability of neural networks by addressing non-smoothness issues in L1 penalties and aligning learned features with human semantics. Validated on Stable Diffusion 3.5, the method enables compositional generalization and feature-level interventions for semantic image editing without prompt modification.

🧠 Stable Diffusion
AINeutralarXiv – CS AI · May 116/10
🧠

Multimodal synthesis of MRI and tabular data with diffusion in a joint latent space via cross-attention

Researchers have developed a multimodal latent diffusion model that simultaneously synthesizes MRI brain scans and clinical tabular data (age, sex, body measurements) within a shared latent space using cross-attention mechanisms. Tested on over 10,000 participants from the German National Cohort, the system generates anatomically plausible synthetic medical data where image and tabular attributes remain coherently aligned, representing the first successful joint modeling of volumetric medical images with mixed-type clinical data.

AINeutralarXiv – CS AI · May 116/10
🧠

Do Joint Audio-Video Generation Models Understand Physics?

Researchers introduced AV-Phys Bench, a benchmark testing whether joint audio-video generation models truly understand physics or merely generate plausible outputs. Testing seven models across three scene categories, the study found all systems lack robust physical understanding, with performance collapsing on deliberately inconsistent prompts and transition-heavy scenarios.

AINeutralarXiv – CS AI · May 96/10
🧠

Entropy-Regularized Adjoint Matching for Offline RL

Researchers introduce Maximum Entropy Adjoint Matching (ME-AM), a new framework for offline reinforcement learning that combines flow-matching generative policies with entropy regularization to overcome limitations in existing Q-learning approaches. The method addresses popularity bias and support binding issues that prevent agents from discovering high-reward actions in low-density regions, demonstrating competitive performance across continuous control benchmarks.

AIBullisharXiv – CS AI · May 96/10
🧠

Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement

Researchers introduce NOVA, a world modeling framework that represents scene state as weights in implicit neural representations (INRs) rather than traditional encoded latent spaces. The approach eliminates decoder bottlenecks, achieves structural disentanglement of scene components, and enables controllable video generation on consumer GPUs with only 40M parameters.

AINeutralarXiv – CS AI · May 76/10
🧠

Detecting Deepfakes via Hamiltonian Dynamics

Researchers propose Hamiltonian Action Anomaly Detection (HAAD), a physics-inspired deepfake detection method that analyzes dynamical stability rather than static patterns. The approach models images as energy states, hypothesizing that authentic images settle in stable, low-energy configurations while deepfakes occupy unstable, high-energy states, demonstrating superior cross-dataset performance.

← PrevPage 2 of 4Next →