🧠

AI

21,472 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

21472 articles

AIBullisharXiv – CS AI · Mar 26/1014

🧠

Trust Region Masking for Long-Horizon LLM Reinforcement Learning

Researchers propose Trust Region Masking (TRM) to address off-policy mismatch problems in Large Language Model reinforcement learning pipelines. The method provides the first non-vacuous monotonic improvement guarantees for long-horizon LLM-RL tasks by masking entire sequences that violate trust region constraints.

AINeutralarXiv – CS AI · Mar 27/1018

🧠

Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models

Researchers analyzed how large language models express moral judgments when prompted to role-play different personas. The study found that Claude models are most morally robust, while larger models within families tend to be more susceptible to moral shifts through persona conditioning.

AINeutralarXiv – CS AI · Mar 27/1022

🧠

Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control

Researchers developed an offline-to-online reinforcement learning framework that improves robot control robustness through adversarial fine-tuning. The method trains policies on clean datasets then applies action perturbations during fine-tuning to build resilience against actuator faults and environmental uncertainties.

AIBullisharXiv – CS AI · Mar 27/1019

🧠

Thompson Sampling via Fine-Tuning of LLMs

Researchers developed ToSFiT (Thompson Sampling via Fine-Tuning), a new Bayesian optimization method that uses fine-tuned large language models to improve search efficiency in complex discrete spaces. The approach eliminates computational bottlenecks by directly parameterizing reward probabilities and demonstrates superior performance across diverse applications including protein search and quantum circuit design.

AIBullisharXiv – CS AI · Mar 27/1014

🧠

Less is More: Lean yet Powerful Vision-Language Model for Autonomous Driving

Researchers introduce Max-V1, a novel vision-language model framework that treats autonomous driving as a language problem, predicting trajectories from camera input. The model achieved over 30% performance improvement on the nuScenes dataset and demonstrates strong cross-vehicle adaptability.

AIBearisharXiv – CS AI · Mar 26/1017

🧠

CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers

Researchers created CMT-Benchmark, a new dataset of 50 expert-level condensed matter theory problems to evaluate large language models' capabilities in advanced scientific research. The best performing model (GPT5) solved only 30% of problems, with the average across 17 models being just 11.4%, highlighting significant gaps in current AI's physical reasoning abilities.

AIBullisharXiv – CS AI · Mar 27/1014

🧠

VoiceBridge: General Speech Restoration with One-step Latent Bridge Models

VoiceBridge is a new AI model that can restore high-quality 48kHz speech from various types of audio distortions using a single one-step process. The model uses a latent bridge approach with an energy-preserving variational autoencoder and transformer architecture to handle multiple speech restoration tasks simultaneously.

AIBullisharXiv – CS AI · Mar 27/1014

🧠

Carr\'e du champ flow matching: better quality-generalisation tradeoff in generative models

Researchers introduce Carrée du champ flow matching (CDC-FM), a new generative AI model that improves the quality-generalization tradeoff by using geometry-aware noise instead of standard uniform noise. The method shows significant improvements in data-scarce scenarios and non-uniformly sampled datasets, particularly relevant for AI applications in scientific domains.

AIBullisharXiv – CS AI · Mar 26/1019

🧠

BEV-VLM: Trajectory Planning via Unified BEV Abstraction

Researchers introduced BEV-VLM, a new autonomous driving trajectory planning system that combines Vision-Language Models with Bird's-Eye View maps from camera and LiDAR data. The approach achieved 53.1% better planning accuracy and complete collision avoidance compared to vision-only methods on the nuScenes dataset.

AIBullisharXiv – CS AI · Mar 27/1022

🧠

Scaling Generalist Data-Analytic Agents

Researchers introduce DataMind, a new training framework for building open-source data-analytic AI agents that can handle complex, multi-step data analysis tasks. The DataMind-14B model achieves state-of-the-art performance with 71.16% average score, outperforming proprietary models like DeepSeek-V3.1 and GPT-5 on data analysis benchmarks.

AIBullisharXiv – CS AI · Mar 27/1020

🧠

MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes

Researchers developed MobileLLM-R1, a sub-billion parameter AI model that demonstrates strong reasoning capabilities using only 2T tokens of high-quality data instead of massive 10T+ token datasets. The 950M parameter model achieves superior performance on reasoning benchmarks compared to larger competitors while using only 11.7% of the training data compared to proprietary models like Qwen3.

AIBullisharXiv – CS AI · Mar 26/1018

🧠

LIA: Supervised Fine-Tuning of Large Language Models for Automatic Issue Assignment

Researchers developed LIA, a supervised fine-tuning approach using DeepSeek-R1-Distill-Llama-8B to automatically assign software issues to developers. The system achieved up to 187.8% improvement over the base model and 211.2% better performance than existing methods in developer recommendation accuracy.

AIBearisharXiv – CS AI · Mar 26/1015

🧠

The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators

Research reveals that machine-learned operators (MLOs) fail at zero-shot super-resolution, unable to accurately perform inference at resolutions different from their training data. The study identifies key limitations in frequency extrapolation and resolution interpolation, proposing a multi-resolution training protocol as a solution.

AIBullisharXiv – CS AI · Mar 27/1014

🧠

Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization

Researchers propose MetaAPO, a new framework for aligning large language models with human preferences that dynamically balances online and offline training data. The method uses a meta-learner to evaluate when on-policy sampling is beneficial, resulting in better performance while reducing online annotation costs by 42%.

AIBullisharXiv – CS AI · Mar 27/1016

🧠

Activation Function Design Sustains Plasticity in Continual Learning

Researchers from arXiv demonstrate that activation function design is crucial for maintaining neural network plasticity in continual learning scenarios. They introduce two new activation functions (Smooth-Leaky and Randomized Smooth-Leaky) that help prevent models from losing their ability to adapt to new tasks over time.

$LINK

AIBullisharXiv – CS AI · Mar 26/1016

🧠

Context and Diversity Matter: The Emergence of In-Context Learning in World Models

Researchers investigate in-context learning (ICL) in world models, identifying two core mechanisms - environment recognition and environment learning - that enable AI systems to adapt to new configurations. The study provides theoretical error bounds and empirical evidence showing that diverse environments and long context windows are crucial for developing self-adapting world models.

AIBullisharXiv – CS AI · Mar 26/1014

🧠

Latent Self-Consistency for Reliable Majority-Set Selection in Short- and Long-Answer Reasoning

Researchers introduce Latent Self-Consistency (LSC), a new method for improving Large Language Model output reliability across both short and long-form reasoning tasks. LSC uses learnable token embeddings to select semantically consistent responses with only 0.9% computational overhead, outperforming existing consistency methods like Self-Consistency and Universal Self-Consistency.

AINeutralarXiv – CS AI · Mar 27/1019

🧠

Once4All: Skeleton-Guided SMT Solver Fuzzing with LLM-Synthesized Generators

Researchers developed Once4All, an LLM-assisted fuzzing framework for testing SMT solvers that addresses syntax validity issues and computational overhead. The system found 43 confirmed bugs in leading solvers Z3 and cvc5, with 40 already fixed by developers.

AINeutralarXiv – CS AI · Mar 27/1018

🧠

LumiMAS: A Comprehensive Framework for Real-Time Monitoring and Enhanced Observability in Multi-Agent Systems

Researchers have developed LumiMAS, a comprehensive framework for monitoring and detecting failures in multi-agent systems that incorporate large language models. The framework features three layers: monitoring and logging, anomaly detection, and anomaly explanation with root cause analysis, addressing the unique challenges of observing entire multi-agent systems rather than individual agents.

AINeutralarXiv – CS AI · Mar 27/1010

🧠

Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning

Researchers introduce Veritas, a multi-modal large language model designed for deepfake detection that uses pattern-aware reasoning to mimic human forensic processes. The system addresses real-world challenges through the HydraFake dataset and achieves significant improvements in detecting unseen forgeries across different domains.

AIBullisharXiv – CS AI · Mar 26/1014

🧠

Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward

Researchers introduced AC3 (Actor-Critic for Continuous Chunks), a new reinforcement learning framework that addresses challenges in long-horizon robotic manipulation tasks with sparse rewards. The system uses continuous action chunks with stabilization mechanisms and achieved superior performance on 25 benchmark tasks using minimal demonstrations.

AIBullisharXiv – CS AI · Mar 27/1022

🧠

Beyond Na\"ive Prompting: Strategies for Improved Context-aided Forecasting with LLMs

Researchers introduce a framework of four strategies to improve large language models' performance in context-aided forecasting, addressing diagnostic tools, accuracy, and efficiency. The study reveals an 'Execution Gap' where models understand context but fail to apply reasoning, while showing 25-50% performance improvements and cost-effective adaptive routing approaches.

AIBullisharXiv – CS AI · Mar 26/1013

🧠

Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing

Researchers introduce Draw-In-Mind (DIM), a new approach to multimodal AI models that improves image editing by better balancing responsibilities between understanding and generation modules. The DIM-4.6B model achieves state-of-the-art performance on image editing benchmarks despite having fewer parameters than competing models.

AIBullisharXiv – CS AI · Mar 26/1015

🧠

OM2P: Offline Multi-Agent Mean-Flow Policy

Researchers propose OM2P, a new offline multi-agent reinforcement learning algorithm that achieves efficient one-step action sampling using mean-flow models. The approach delivers up to 3.8x reduction in GPU memory usage and 10.8x speed-up in training time compared to existing diffusion and flow-based models.

AIBullisharXiv – CS AI · Mar 26/1011

🧠

Less is More: AMBER-AFNO -- a New Benchmark for Lightweight 3D Medical Image Segmentation

Researchers developed AMBER-AFNO, a new lightweight architecture for 3D medical image segmentation that replaces traditional attention mechanisms with Adaptive Fourier Neural Operators. The model achieves state-of-the-art results on medical datasets while maintaining linear memory scaling and quasi-linear computational complexity.

$NEAR

← PrevPage 568 of 859Next →