AIBullisharXiv – CS AI · 2d ago7/10
🧠FlowTime introduces a novel 'Continuous Generative Regression' paradigm for watch time prediction in short-video recommender systems, addressing limitations of existing regression, ordinal, and discrete generative approaches. The method uses flow-based personalized priors within a one-step generative VAE to model multimodal user-item interaction patterns while reducing inference latency, demonstrating superior performance in both offline experiments and A/B testing.
AIBullisharXiv – CS AI · May 127/10
🧠Researchers present a weakly supervised learning approach that combines neural networks with symbolic AI for object-centric reasoning tasks, requiring only 1% of typical labels while outperforming foundation models in domain generalization. The method bridges perception and logical reasoning by using slot-based architectures and VAEs to ground symbolic outputs for frameworks like Inductive Logic Programming.
AIBullisharXiv – CS AI · Mar 47/103
🧠Researchers introduce LaDiR (Latent Diffusion Reasoner), a novel framework that combines continuous latent representation with iterative refinement capabilities to enhance Large Language Models' reasoning abilities. The system uses a Variational Autoencoder to encode reasoning steps and a latent diffusion model for parallel generation of diverse reasoning trajectories, showing improved accuracy and interpretability in mathematical reasoning benchmarks.
AINeutralarXiv – CS AI · 20h ago6/10
🧠SymTRELLIS introduces a method to enforce geometric symmetries in 3D generative models without retraining underlying systems, using learned linear operators on voxel latents and velocity symmetrization during generation. The technique substantially reduces symmetry violations across rotational, reflectional, and polyhedral symmetries compared to existing models like TRELLIS.2 and Hunyuan3D-2.1.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers propose a self-supervised framework for monocular depth and pose estimation in endoscopy using a Generative Latent Bank and VAE to improve 3D mapping of the gastrointestinal tract. The method achieves superior performance over existing self-supervised approaches on standard endoscopic datasets without requiring synthetic training data.
AINeutralarXiv – CS AI · 6d ago6/10
🧠Researchers demonstrate that VAE-based world models develop organized spatial semantic representations through physical exploration alone, without linguistic input. The geometric structure of the physical world emerges as the primary organizing principle, with prediction performance and semantic alignment improving together across training, suggesting a shared underlying mechanism.
AINeutralarXiv – CS AI · 6d ago6/10
🧠A new mathematical primer on arXiv provides a foundational, derivation-focused introduction to generative AI models, systematically connecting PCA, VAEs, diffusion models, normalizing flows, GANs, and energy-based models through coherent mathematical frameworks rather than surveying recent architectures.
AIBullisharXiv – CS AI · May 126/10
🧠Researchers have identified why diffusion transformers (DiTs) degrade in quality during multi-turn image editing and proposed VAE-LFA, a training-free alignment method that operates in VAE latent space to suppress accumulated semantic drift. The solution works with both white-box and black-box models by aligning low-frequency components across editing rounds while preserving high-frequency details.
AINeutralarXiv – CS AI · Apr 136/10
🧠Researchers introduce ASPECT, a novel reinforcement learning framework that uses large language models as semantic operators to enable zero-shot transfer learning across novel tasks. By conditioning a text-based VAE on LLM-generated task descriptions, the approach allows agents to reuse policies on structurally similar but previously unseen tasks without discrete category constraints.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers propose a dual-path AI framework combining Variational Autoencoders and Wasserstein GANs for real-time fraud detection in banking systems. The system achieves sub-50ms detection latency while maintaining GDPR compliance through selective explainability mechanisms for high-uncertainty transactions.
AINeutralLil'Log (Lilian Weng) · Oct 134/10
🧠This article introduces flow-based deep generative models as a third type of generative AI model that, unlike GANs and VAEs, explicitly learns the probability density function of input data. The piece explains the mathematical challenges in calculating probability density functions due to the intractability of integrating over all possible latent variable values.