🧠

AI

21,449 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

21449 articles

AIBullisharXiv – CS AI · Mar 35/102

🧠

Purrception: Variational Flow Matching for Vector-Quantized Image Generation

Researchers introduce Purrception, a new variational flow matching approach for AI image generation that combines continuous transport dynamics with discrete supervision. The method demonstrates faster training convergence than existing baselines while achieving competitive quality scores on ImageNet-1k 256x256 generation tasks.

AINeutralarXiv – CS AI · Mar 35/103

🧠

Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness

Researchers introduce C³B (Comics Cross-Cultural Benchmark), a new benchmark to test cultural awareness capabilities in Multimodal Large Language Models using over 2000 comic images and 18000 QA pairs. Testing revealed significant performance gaps between current MLLMs and human performance, highlighting the need for improved cultural understanding in AI systems.

AIBullisharXiv – CS AI · Mar 36/104

🧠

MENLO: From Preferences to Proficiency -- Evaluating and Modeling Native-like Quality Across 47 Languages

Researchers introduce MENLO, a new framework for evaluating native-like quality in large language model responses across 47 languages. The study reveals significant improvements in multilingual LLM performance through reinforcement learning and fine-tuning, though gaps with human judgment persist.

AIBullisharXiv – CS AI · Mar 36/104

🧠

Distillation of Large Language Models via Concrete Score Matching

Researchers propose Concrete Score Distillation (CSD), a new knowledge distillation method that improves efficiency of large language models by better preserving logit information compared to traditional softmax-based approaches. CSD demonstrates consistent performance improvements across multiple models including GPT-2, OpenLLaMA, and GEMMA while maintaining training stability.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Calibrating Verbalized Confidence with Self-Generated Distractors

Researchers introduce DINCO (Distractor-Normalized Coherence), a method to improve confidence calibration in large language models by using self-generated alternative claims to reduce overconfidence bias. The approach addresses LLM suggestibility issues that cause models to express high confidence on low-accuracy outputs, potentially improving AI safety and trustworthiness.

AIBullisharXiv – CS AI · Mar 36/102

🧠

COMRES-VLM: Coordinated Multi-Robot Exploration and Search using Vision Language Models

Researchers developed COMRES-VLM, a new framework using Vision Language Models to coordinate multiple robots for exploration and object search in indoor environments. The system achieved 10.2% faster exploration and 55.7% higher search efficiency compared to existing methods, while enabling natural language-based human guidance.

AINeutralarXiv – CS AI · Mar 36/104

🧠

SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs

Researchers introduced SpinBench, a new benchmark for evaluating spatial reasoning abilities in vision language models (VLMs), focusing on perspective taking and viewpoint transformations. Testing 43 state-of-the-art VLMs revealed systematic weaknesses including strong egocentric bias and poor rotational understanding, with human performance significantly outpacing AI models at 91.2% accuracy.

AIBullisharXiv – CS AI · Mar 36/104

🧠

EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing

Researchers developed EditReward, a human-aligned reward model for instruction-guided image editing trained on over 200K preference pairs. The model demonstrates superior performance on established benchmarks and can effectively filter high-quality training data, addressing a key bottleneck in open-source image editing models.

AIBullisharXiv – CS AI · Mar 36/104

🧠

EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering

Researchers have developed EasySteer, a unified framework for controlling large language model behavior at inference time that achieves 10.8-22.3x speedup over existing frameworks. The system offers modular architecture with pre-computed steering vectors for eight application domains and transforms steering from a research technique into production-ready capability.

AIBullisharXiv – CS AI · Mar 36/104

🧠

AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size

Researchers introduce AdaBlock-dLLM, a training-free optimization technique for diffusion-based large language models that adaptively adjusts block sizes during inference based on semantic structure. The method addresses limitations in conventional fixed-block semi-autoregressive decoding, achieving up to 5.3% accuracy improvements under the same throughput budget.

AIBullisharXiv – CS AI · Mar 36/104

🧠

Prompt and Parameter Co-Optimization for Large Language Models

Researchers introduce MetaTuner, a new framework that combines prompt optimization with fine-tuning for Large Language Models, using shared neural networks to discover optimal combinations of prompts and parameters. The approach addresses the discrete-continuous optimization challenge through supervised regularization and demonstrates consistent performance improvements across benchmarks.

AIBullisharXiv – CS AI · Mar 36/104

🧠

Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends

Researchers demonstrate that Group Relative Policy Optimization (GRPO), traditionally viewed as an on-policy reinforcement learning algorithm, can be reinterpreted as an off-policy algorithm through first-principles analysis. This theoretical breakthrough provides new insights for optimizing reinforcement learning applications in large language models and offers principled approaches for off-policy RL algorithm design.

AINeutralarXiv – CS AI · Mar 35/104

🧠

SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents

Researchers introduced SimuHome, a high-fidelity smart home simulator and benchmark with 600 episodes for testing LLM-based smart home agents. The system uses the Matter protocol standard and enables time-accelerated simulation to evaluate how AI agents handle device control, environmental monitoring, and workflow scheduling in smart homes.

AIBullisharXiv – CS AI · Mar 36/102

🧠

Characteristic Root Analysis and Regularization for Linear Time Series Forecasting

Researchers present a systematic study of linear models for time series forecasting, focusing on characteristic roots in temporal dynamics and introducing two regularization strategies (Reduced-Rank Regression and Root Purge) to address noise-induced spurious roots. The work achieves state-of-the-art results by combining classical linear systems theory with modern machine learning techniques.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Motivating Next-Gen Accelerators with Flexible (N:M) Activation Sparsity via Benchmarking Lightweight Post-Training Sparsification Approaches

Researchers present a comprehensive analysis of post-training N:M activation pruning techniques for large language models, demonstrating that activation pruning preserves generative capabilities better than weight pruning. The study establishes hardware-friendly baselines and explores sparsity patterns beyond NVIDIA's standard 2:4, with 8:16 patterns showing superior performance while maintaining implementation feasibility.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents

Researchers introduce ReMemR1, a new approach to improve large language models' ability to handle long-context question answering by integrating memory retrieval into the memory update process. The system enables non-linear reasoning through selective callback of historical memories and uses multi-level reward design to strengthen training.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Quantile Advantage Estimation: Stabilizing RLVR for LLM Reasoning

Researchers propose Quantile Advantage Estimation (QAE) to stabilize Reinforcement Learning with Verifiable Rewards (RLVR) for large language model reasoning. The method replaces mean baselines with group-wise K-quantile baselines to prevent entropy collapse and explosion, showing sustained improvements on mathematical reasoning tasks.

AINeutralarXiv – CS AI · Mar 36/103

🧠

Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training

Researchers propose rubric-based reward modeling to address reward over-optimization in large language model fine-tuning. The approach focuses on the high-reward tail where models struggle to distinguish excellent responses from merely great ones, using off-policy examples to improve training effectiveness.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Does FLUX Already Know How to Perform Physically Plausible Image Composition?

Researchers introduce SHINE, a training-free framework that enables FLUX and other diffusion models to perform high-quality image composition without retraining. The framework addresses complex lighting scenarios like shadows and reflections, achieving state-of-the-art performance on new benchmark ComplexCompo.

AINeutralarXiv – CS AI · Mar 36/104

🧠

GraphUniverse: Synthetic Graph Generation for Evaluating Inductive Generalization

Researchers introduce GraphUniverse, a new framework for generating synthetic graph families to evaluate how AI models generalize to unseen graph structures. The study reveals that strong performance on single graphs doesn't predict generalization ability, highlighting a critical gap in current graph learning evaluation methods.

AIBullisharXiv – CS AI · Mar 36/104

🧠

Post-training Large Language Models for Diverse High-Quality Responses

Researchers have developed DQO (Diversity Quality Optimization), a new training method that uses determinantal point processes to improve large language models' response diversity while maintaining quality. The approach addresses a key limitation of current reinforcement learning methods that tend to narrow LLM outputs to canonical responses.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Next Visual Granularity Generation

Researchers have introduced Next Visual Granularity (NVG), a new AI image generation framework that creates images by progressively refining visual details from global layout to fine granularity. The approach outperforms existing VAR models on ImageNet, achieving better FID scores and offering fine-grained control over the generation process.

AIBullisharXiv – CS AI · Mar 36/104

🧠

MOON: Generative MLLM-based Multimodal Representation Learning for E-commerce Product Understanding

Researchers propose MOON, the first generative multimodal large language model designed specifically for e-commerce product understanding. The model addresses key challenges in product representation learning through guided Mixture-of-Experts modules and semantic region detection, while introducing a new benchmark dataset for evaluation.

AIBullisharXiv – CS AI · Mar 36/104

🧠

Generating Multi-Table Time Series EHR from Latent Space with Minimal Preprocessing

Researchers have developed RawMed, the first framework to generate synthetic multi-table time-series Electronic Health Records (EHR) that closely resembles raw medical data. The system addresses privacy concerns in healthcare data sharing while maintaining fidelity and utility, outperforming baseline models in validation tests.

AIBullisharXiv – CS AI · Mar 36/104

🧠

FMIP: Joint Continuous-Integer Flow For Mixed-Integer Linear Programming

Researchers have developed FMIP, a new generative AI framework that models both integer and continuous variables simultaneously to solve Mixed-Integer Linear Programming problems more efficiently. The approach reduces the primal gap by 41.34% on average compared to existing baselines and is compatible with various downstream solvers.

← PrevPage 549 of 858Next →