🧠

AI

13,004 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

13004 articles

AIBullisharXiv – CS AI · Mar 37/106

🧠

M-Gaussian: An Magnetic Gaussian Framework for Efficient Multi-Stack MRI Reconstruction

Researchers developed M-Gaussian, a new AI framework that adapts 3D Gaussian Splatting for efficient multi-stack MRI reconstruction. The method achieves 40.31 dB PSNR while being 14 times faster than existing implicit neural representation methods, offering improved balance between quality and computational efficiency.

AIBullisharXiv – CS AI · Mar 37/106

🧠

Expert Divergence Learning for MoE-based Language Models

Researchers introduce Expert Divergence Learning, a new pre-training strategy for Mixture-of-Experts language models that prevents expert homogenization by encouraging functional specialization. The method uses domain labels to maximize routing distribution differences between data domains, achieving better performance on 15 billion parameter models with minimal computational overhead.

AIBullisharXiv – CS AI · Mar 36/107

🧠

M3-AD: Reflection-aware Multi-modal, Multi-category, and Multi-dimensional Benchmark and Framework for Industrial Anomaly Detection

Researchers propose M3-AD, a new reflection-aware multimodal framework that improves industrial anomaly detection using large language models. The system includes RA-Monitor technology that enables AI models to self-correct unreliable decisions, outperforming existing open-source and commercial models in zero-shot anomaly detection tasks.

AINeutralarXiv – CS AI · Mar 37/106

🧠

MOSAIC: Unveiling the Moral, Social and Individual Dimensions of Large Language Models

Researchers introduce MOSAIC, the first comprehensive benchmark to evaluate moral, social, and individual characteristics of Large Language Models beyond traditional Moral Foundation Theory. The benchmark includes over 600 curated questions and scenarios from nine validated questionnaires and four platform-based games, providing empirical evidence that current evaluation methods are insufficient for assessing AI ethics comprehensively.

AIBullisharXiv – CS AI · Mar 36/108

🧠

DeepXiv-SDK: An Agentic Data Interface for Scientific Papers

DeepXiv-SDK introduces a new agentic data interface for scientific papers that enables AI research agents to access and process academic literature more efficiently. The SDK provides structured, budget-aware views of papers and supports progressive access patterns, currently deployed at arXiv scale with free API access.

AIBearisharXiv – CS AI · Mar 36/106

🧠

Stochastic Parrots or Singing in Harmony? Testing Five Leading LLMs for their Ability to Replicate a Human Survey with Synthetic Data

Researchers compared human survey responses from 420 Silicon Valley developers with synthetic data from five leading LLMs including ChatGPT, Claude, and Gemini. While AI models produced technically plausible results, they failed to capture counterintuitive insights and only replicated conventional wisdom rather than revealing novel findings.

AINeutralarXiv – CS AI · Mar 37/107

🧠

What Is the Geometry of the Alignment Tax?

Researchers present a formal geometric theory for quantifying the alignment tax - the tradeoff between AI safety and capability performance. They derive mathematical frameworks showing how safety-capability conflicts can be measured using angles between representation subspaces and provide scaling laws for how these tradeoffs evolve with model size.

AIBullisharXiv – CS AI · Mar 37/107

🧠

Attn-QAT: 4-Bit Attention With Quantization-Aware Training

Researchers introduce Attn-QAT, the first systematic approach to 4-bit quantization-aware training for attention mechanisms in AI models. The method enables stable FP4 computation on emerging GPUs and delivers up to 1.5x speedup on RTX 5090 while maintaining model quality across diffusion and language models.

AIBullisharXiv – CS AI · Mar 37/108

🧠

Maximizing the Spectral Energy Gain in Sub-1-Bit LLMs via Latent Geometry Alignment

Researchers introduce LittleBit-2, a new framework for extreme compression of large language models that achieves sub-1-bit quantization while maintaining performance comparable to 1-bit baselines. The method uses Internal Latent Rotation and Joint Iterative Quantization to solve geometric alignment issues in binary quantization, establishing new state-of-the-art results on Llama-2 and Llama-3 models.

AIBullisharXiv – CS AI · Mar 36/108

🧠

Reinforcement Learning for Control with Probabilistic Stability Guarantee: A Finite-Sample Approach

Researchers have developed L-REINFORCE, a novel reinforcement learning algorithm that provides probabilistic stability guarantees for control systems using finite data samples. The approach bridges reinforcement learning and control theory by extending classical REINFORCE algorithms with Lyapunov stability methods, demonstrating superior performance in Cartpole simulations.

AIBullisharXiv – CS AI · Mar 36/107

🧠

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

Researchers propose ActMem, a novel memory framework for LLM agents that combines memory retrieval with active causal reasoning to handle complex decision-making scenarios. The framework transforms dialogue history into structured causal graphs and uses counterfactual reasoning to resolve conflicts between past states and current intentions, significantly outperforming existing baselines in memory-dependent tasks.

AINeutralarXiv – CS AI · Mar 37/109

🧠

Property-Driven Evaluation of GNN Expressiveness at Scale: Datasets, Framework, and Study

Researchers developed a comprehensive evaluation framework for Graph Neural Networks (GNNs) using formal specification methods, creating 336 new datasets to test GNN expressiveness across 16 fundamental graph properties. The study reveals that no single pooling approach consistently performs well across all properties, with attention-based pooling excelling in generalization while second-order pooling provides better sensitivity.

AINeutralarXiv – CS AI · Mar 37/106

🧠

StaTS: Spectral Trajectory Schedule Learning for Adaptive Time Series Forecasting with Frequency Guided Denoiser

Researchers introduce StaTS, a new diffusion model for time series forecasting that learns adaptive noise schedules and uses frequency-guided denoising. The model addresses limitations of fixed noise schedules in existing diffusion models by incorporating spectral regularization and data-adaptive scheduling for improved structural preservation.

$NEAR

AIBullisharXiv – CS AI · Mar 37/107

🧠

CARE: Confounder-Aware Aggregation for Reliable LLM Evaluation

Researchers introduce CARE, a new framework for improving LLM evaluation by addressing correlated errors in AI judge ensembles. The method separates true quality signals from confounding factors like verbosity and style preferences, achieving up to 26.8% error reduction across 12 benchmarks.

AIBullisharXiv – CS AI · Mar 37/107

🧠

Conformal Policy Control

Researchers have developed a conformal policy control method that enables AI agents to safely explore new behaviors while maintaining strict safety constraints. The approach uses safe reference policies as probabilistic regulators to determine how aggressively new policies can act, providing finite-sample guarantees without requiring specific model assumptions or hyperparameter tuning.

AIBullisharXiv – CS AI · Mar 37/108

🧠

Nano-EmoX: Unifying Multimodal Emotional Intelligence from Perception to Empathy

Researchers have developed Nano-EmoX, a compact 2.2B parameter multimodal language model that unifies emotional intelligence tasks across perception, understanding, and interaction levels. The model achieves state-of-the-art performance on six core affective tasks using a novel curriculum-based training framework called P2E (Perception-to-Empathy).

AIBullisharXiv – CS AI · Mar 37/107

🧠

Tool Verification for Test-Time Reinforcement Learning

Researchers introduce T³RL (Tool-Verification for Test-Time Reinforcement Learning), a new method that improves self-evolving AI reasoning models by using external tool verification to prevent incorrect learning from biased consensus. The approach shows significant improvements on mathematical problem-solving tasks, with larger gains on harder problems.

AINeutralarXiv – CS AI · Mar 36/107

🧠

Pencil Puzzle Bench: A Benchmark for Multi-Step Verifiable Reasoning

Researchers introduced Pencil Puzzle Bench, a new framework for evaluating large language model reasoning capabilities using constraint-satisfaction problems. The benchmark tested 51 models across 300 puzzles, revealing significant performance improvements through increased reasoning effort and iterative verification processes.

AIBullisharXiv – CS AI · Mar 37/108

🧠

Breaking the Factorization Barrier in Diffusion Language Models

Researchers introduce Coupled Discrete Diffusion (CoDD), a breakthrough framework that solves the "factorization barrier" in diffusion language models by enabling parallel token generation without sacrificing coherence. The approach uses a lightweight probabilistic inference layer to model complex joint dependencies while maintaining computational efficiency.

AINeutralarXiv – CS AI · Mar 37/107

🧠

Personalization Increases Affective Alignment but Has Role-Dependent Effects on Epistemic Independence in LLMs

Research reveals that personalization in Large Language Models increases emotional validation but has complex effects on how models maintain their positions depending on their assigned role. When acting as advisors, personalized LLMs show greater independence, but as social peers, they become more susceptible to abandoning their positions when challenged.

AIBullisharXiv – CS AI · Mar 36/106

🧠

OpenRad: a Curated Repository of Open-access AI models for Radiology

Researchers created OpenRad, a curated repository containing approximately 1,700 open-access AI models for radiology. The platform aggregates scattered radiology AI research into a standardized, searchable database that includes model weights, interactive applications, and spans all imaging modalities and radiology subspecialties.

AIBullisharXiv – CS AI · Mar 37/107

🧠

What Papers Don't Tell You: Recovering Tacit Knowledge for Automated Paper Reproduction

Researchers propose a new framework called 'method' that addresses the challenge of automated paper reproduction by recovering tacit knowledge that academic papers leave implicit. The graph-based agent framework achieves 10.04% performance gap against official implementations, improving over baselines by 24.68% across 40 recent papers.

$LINK

AIBullisharXiv – CS AI · Mar 36/107

🧠

CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification

Researchers introduce CoVe, a framework for training interactive tool-use AI agents that uses constraint-guided verification to generate high-quality training data. The compact CoVe-4B model achieves competitive performance with models 17 times larger on benchmark tests, with the team open-sourcing code, models, and 12K training trajectories.

AINeutralarXiv – CS AI · Mar 36/105

🧠

LiveCultureBench: a Multi-Agent, Multi-Cultural Benchmark for Large Language Models in Dynamic Social Simulations

Researchers introduce LiveCultureBench, a new benchmark that evaluates large language models as autonomous agents in simulated social environments, testing both task completion and adherence to cultural norms. The benchmark uses a multi-cultural town simulation to assess cross-cultural robustness and the balance between effectiveness and cultural sensitivity in LLM agents.

AIBullisharXiv – CS AI · Mar 37/108

🧠

FT-Dojo: Towards Autonomous LLM Fine-Tuning with Language Agents

Researchers introduce FT-Dojo, an interactive environment for studying autonomous LLM fine-tuning, along with FT-Agent, an AI system that can automatically fine-tune language models without human intervention. The system achieved best performance on 10 out of 13 tasks across five domains, demonstrating the potential for fully automated machine learning workflows while revealing current limitations in AI reasoning capabilities.

← PrevPage 239 of 521Next →