AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce DSA-Tokenizer, a novel speech tokenization system that separates semantic content from acoustic style using distinct optimization paths and Flow Matching decoders. The approach enables discrete Speech LLMs to achieve better disentanglement while supporting efficient voice cloning and high-fidelity speech generation with minimal inference steps.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce DecoupleGen, a method that uses personalized text-to-image diffusion models to generate training data featuring objects in rare contextual scenarios. This approach addresses a critical limitation in computer vision models that perform better on common object-context combinations, potentially improving recognition accuracy for edge cases without requiring expensive real-world data collection.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce GICDM, an improved method for evaluating generative models that corrects the hubness phenomenon—a distortion in high-dimensional spaces that skews distance-based metrics and nearest-neighbor relationships. The technique builds on classical ICDM and includes multi-scale extensions, demonstrating improved alignment with human assessment across synthetic and real benchmarks.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce TriProRep, a protein representation learning method that jointly models amino acid identity, backbone geometry, and full-atom geometry to improve protein structure prediction. The new approach outperforms sequence-only and prior structure-aware models across multiple benchmarks including homodimer co-folding and monomer structure prediction tasks.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers propose CAT (Cross-scale Aligned Transformer), a new GAN training method that addresses the cross-scale trajectory misalignment problem in multi-stage image generation. By adding consistency regularization between intermediate and final outputs, CAT achieves state-of-the-art results on ImageNet-256 with one-step inference, reaching FID-50K of 1.56 after just 60 training epochs.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce MatFormBench, a comprehensive benchmarking framework designed to evaluate inverse design algorithms for materials formulation—addressing a critical gap in machine learning benchmarks that previously focused only on forward property prediction. The framework tests 39 diverse algorithms across 1,170 evaluations, revealing that diffusion-based models achieve superior overall performance, while VAE and genetic algorithm approaches excel in specific scenarios.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce Social Gaze Consistency as a novel method to detect AI-generated images by analyzing the coherence of eye direction and head-eye alignment between people. The technique achieves meaningful improvements in detection accuracy across multiple vision models, suggesting that high-level semantic features offer advantages over traditional low-level artifact detection as generative models become more sophisticated.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce PolyFusionAgent, a multimodal AI framework combining a foundation model (PolyFusion) with an autonomous design agent (PolyAgent) for polymer discovery. The system integrates multiple polymer representations into a shared latent space to predict properties and generate novel structures, while grounding predictions in scientific literature for actionable design decisions.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce CasArbi, a self-cascaded diffusion framework that enables arbitrary-scale image super-resolution by decomposing scaling factors into sequential steps rather than handling them simultaneously. The method combines coordinate-conditioned diffusion models with self-consistency guidance to achieve superior scale consistency and outperforms existing approaches on multiple benchmarks.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers develop a generative AI model that integrates social determinants of health (SDoH) with multi-organ sensor data and medical events to improve disease prediction and personalized clinical decision support. Tested on UK Biobank data spanning nearly 500,000 medical histories, the model outperforms existing autoregressive disease prediction systems by explicitly modeling socioeconomic factors alongside imaging and biomarker data.
AINeutralarXiv – CS AI · May 126/10
🧠SLayerGen introduces a generative AI model capable of creating crystal structures constrained to space and layer groups, addressing limitations in existing models that fail to account for diperiodic materials like 2D superconductors and thin film semiconductors. The model combines discrete autoregressive lattice generation, transformer-based sampling, and equivariant diffusion, achieving superior performance on layered material discovery while correcting mathematical inconsistencies in prior diffusion approaches.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers propose Cosine-Aware Adaptive Elastic Weight Consolidation (EWC) to improve text-to-image model backdoor attacks while maintaining model fidelity and generalization. The method addresses a fundamental trade-off between attack success and output quality by dynamically adjusting regularization weights based on semantic utility, achieving stronger performance on both in-domain and out-of-domain datasets compared to existing approaches.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers propose Bridge Matching, a novel framework that decomposes stochastic generative model dynamics into deterministic transport and diffusion-induced osmotic effects. This decomposition enables more interpretable and controllable generative sampling by separately parameterizing how probability mass moves versus how stochastic fluctuations affect the process.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers propose MDMF, a detection framework that identifies AI-generated images by amplifying micro-scale statistical irregularities rather than relying on global semantic features. The method uses patch-wise analysis and Maximum Mean Discrepancy to distinguish synthetic images from real ones with higher accuracy than existing detectors.
AIBullisharXiv – CS AI · May 126/10
🧠Researchers introduce GibbsTTS, a new zero-shot text-to-speech system using metric-induced discrete flow matching with kinetic-optimal scheduling and moment correction. The method achieves superior naturalness and speaker similarity compared to existing masked generative models and state-of-the-art TTS systems without requiring hyperparameter tuning.
AIBullisharXiv – CS AI · May 126/10
🧠Researchers introduced DiffKT3D, a 3D diffusion model framework that applies knowledge transfer from video diffusion models to radiotherapy dose prediction. The approach achieves state-of-the-art results by reducing prediction error by 7% compared to previous benchmarks while maintaining clinical alignment through reinforcement learning post-training.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers demonstrate that model collapse during recursive synthetic data retraining can be prevented by curating outputs across multiple reward functions rather than a single objective. The study provides theoretical proof that diverse preference aggregation leads to stable distributions satisfying Nash bargaining solutions, offering a framework for maintaining output diversity in AI training loops.
AINeutralarXiv – CS AI · May 115/10
🧠Researchers introduce Drifting Field Policy (DFP), a one-step generative policy that uses Wasserstein gradient flow to optimize reinforcement learning without ODE-based approaches. DFP demonstrates state-of-the-art performance on robotic manipulation tasks, suggesting a potential shift in how generative models are applied to control problems.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers analyze generative models (VAEs, GANs, and Diffusion Models) within federated learning frameworks for predictive maintenance in IoT systems, revealing critical tradeoffs between model performance, communication efficiency, and training stability. The study introduces a taxonomy for partial component sharing that enables personalization while reducing bandwidth demands, with findings suggesting diffusion models may outperform alternatives in heterogeneous, bandwidth-constrained environments.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers have developed supervised sparse auto-encoders (SAEs) that improve mechanistic interpretability of neural networks by addressing non-smoothness issues in L1 penalties and aligning learned features with human semantics. Validated on Stable Diffusion 3.5, the method enables compositional generalization and feature-level interventions for semantic image editing without prompt modification.
🧠 Stable Diffusion
AINeutralarXiv – CS AI · May 116/10
🧠Researchers have developed a multimodal latent diffusion model that simultaneously synthesizes MRI brain scans and clinical tabular data (age, sex, body measurements) within a shared latent space using cross-attention mechanisms. Tested on over 10,000 participants from the German National Cohort, the system generates anatomically plausible synthetic medical data where image and tabular attributes remain coherently aligned, representing the first successful joint modeling of volumetric medical images with mixed-type clinical data.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduced AV-Phys Bench, a benchmark testing whether joint audio-video generation models truly understand physics or merely generate plausible outputs. Testing seven models across three scene categories, the study found all systems lack robust physical understanding, with performance collapsing on deliberately inconsistent prompts and transition-heavy scenarios.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers introduce Maximum Entropy Adjoint Matching (ME-AM), a new framework for offline reinforcement learning that combines flow-matching generative policies with entropy regularization to overcome limitations in existing Q-learning approaches. The method addresses popularity bias and support binding issues that prevent agents from discovering high-reward actions in low-density regions, demonstrating competitive performance across continuous control benchmarks.
AIBullisharXiv – CS AI · May 96/10
🧠Researchers introduce NOVA, a world modeling framework that represents scene state as weights in implicit neural representations (INRs) rather than traditional encoded latent spaces. The approach eliminates decoder bottlenecks, achieves structural disentanglement of scene components, and enables controllable video generation on consumer GPUs with only 40M parameters.
AINeutralarXiv – CS AI · May 76/10
🧠Researchers propose Hamiltonian Action Anomaly Detection (HAAD), a physics-inspired deepfake detection method that analyzes dynamical stability rather than static patterns. The approach models images as energy states, hypothesizing that authentic images settle in stable, low-energy configurations while deepfakes occupy unstable, high-energy states, demonstrating superior cross-dataset performance.