#arxiv-research News & Analysis

38 articles tagged with #arxiv-research. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

38 articles

AINeutralarXiv – CS AI · 1d ago7/10

🧠

Evaluating Relational Reasoning in LLMs with REL

Researchers introduce REL, a benchmark framework that evaluates relational reasoning in large language models by measuring Relational Complexity (RC)—the number of entities that must be simultaneously bound to apply a relation. The study reveals that frontier LLMs consistently degrade in performance as RC increases, exposing a fundamental limitation in higher-arity reasoning that persists even with increased compute and in-context learning.

AIBullisharXiv – CS AI · Apr 77/10

🧠

Readable Minds: Emergent Theory-of-Mind-Like Behavior in LLM Poker Agents

Research published on arXiv demonstrates that large language models playing poker can develop sophisticated Theory of Mind capabilities when equipped with persistent memory, progressing to advanced levels of opponent modeling and strategic deception. The study found memory is necessary and sufficient for this emergent behavior, while domain expertise enhances but doesn't gate ToM development.

🧠 GPT-4

AINeutralarXiv – CS AI · Apr 77/10

🧠

Causality Laundering: Denial-Feedback Leakage in Tool-Calling LLM Agents

Researchers have identified a new security vulnerability called 'causality laundering' in AI tool-calling systems, where attackers can extract private information by learning from system denials and using that knowledge in subsequent tool calls. They developed the Agentic Reference Monitor (ARM) system to detect and prevent these attacks through enhanced provenance tracking.

AIBullisharXiv – CS AI · Mar 277/10

🧠

Decidable By Construction: Design-Time Verification for Trustworthy AI

Researchers propose a framework for verifying AI model properties at design time rather than after deployment, using algebraic constraints over finitely generated abelian groups. The approach eliminates computational overhead of post-hoc verification by building trustworthiness into the model architecture from the start.

AIBullisharXiv – CS AI · Mar 177/10

🧠

RelayCaching: Accelerating LLM Collaboration via Decoding KV Cache Reuse

Researchers introduce RelayCaching, a training-free method that accelerates multi-agent LLM systems by reusing KV cache data from previous agents to eliminate redundant computation. The technique achieves over 80% cache reuse and reduces time-to-first-token by up to 4.7x while maintaining accuracy across mathematical reasoning, knowledge tasks, and code generation.

AIBearisharXiv – CS AI · Mar 167/10

🧠

Diagnosing Retrieval Bias Under Multiple In-Context Knowledge Updates in Large Language Models

Researchers identify a significant bias in Large Language Models when processing multiple updates to the same factual information within context. The study reveals that LLMs struggle to accurately retrieve the most recent version of updated facts, with performance degrading as the number of updates increases, similar to memory interference patterns observed in cognitive psychology.

AIBullisharXiv – CS AI · Mar 127/10

🧠

Optimal Expert-Attention Allocation in Mixture-of-Experts: A Scalable Law for Dynamic Model Design

Researchers have developed a new scaling law for Mixture-of-Experts (MoE) models that optimizes compute allocation between expert and attention layers. The study extends the Chinchilla scaling law by introducing an optimal ratio formula that follows a power-law relationship with total compute and model sparsity.

AIBearisharXiv – CS AI · Mar 46/103

🧠

SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models

Researchers introduce SpatialText, a diagnostic framework to test whether large language models can truly reason about spatial relationships or merely rely on linguistic patterns. The study reveals that current AI models fail at egocentric perspective reasoning despite proficiency in basic spatial fact retrieval.

AIBullisharXiv – CS AI · Mar 47/103

🧠

Type-Aware Retrieval-Augmented Generation with Dependency Closure for Solver-Executable Industrial Optimization Modeling

Researchers developed a type-aware retrieval-augmented generation (RAG) method that translates natural language requirements into solver-executable optimization code for industrial applications. The method uses a typed knowledge base and dependency closure to ensure code executability, successfully validated on battery production optimization and job scheduling tasks where conventional RAG approaches failed.

AIBullisharXiv – CS AI · Mar 37/102

🧠

Sparse Shift Autoencoders for Identifying Concepts from Large Language Model Activations

Researchers introduce Sparse Shift Autoencoders (SSAEs), a new method for improving large language model interpretability by learning sparse representations of differences between embeddings rather than the embeddings themselves. This approach addresses the identifiability problem in current sparse autoencoder techniques, potentially enabling more precise control over specific AI behaviors without unintended side effects.

AIBullisharXiv – CS AI · Mar 37/102

🧠

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention

Researchers propose Intervened Preference Optimization (IPO) to address safety issues in Large Reasoning Models, where chain-of-thought reasoning contains harmful content even when final responses appear safe. The method achieves over 30% reduction in harmfulness while maintaining reasoning performance.

AIBullisharXiv – CS AI · Mar 37/103

🧠

DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization

Researchers propose Decoupled Reward Policy Optimization (DRPO), a new framework that reduces computational costs in large reasoning models by 77% while maintaining performance. The method addresses the 'overthinking' problem where AI models generate unnecessarily long reasoning for simple questions, achieving significant efficiency gains over existing approaches.

AINeutralarXiv – CS AI · 1d ago6/10

🧠

LLM as Attention-Informed NTM and Topic Modeling as long-input Generation: Interpretability and long-Context Capability

Researchers propose a novel framework treating Large Language Models as attention-informed Neural Topic Models, enabling interpretable topic extraction from documents. The approach combines white-box interpretability analysis with black-box long-context LLM capabilities, demonstrating competitive performance on topic modeling tasks while maintaining semantic clarity.

AIBullisharXiv – CS AI · 2d ago6/10

🧠

Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents

Researchers introduce Skill-SD, a novel training framework for multi-turn LLM agents that improves sample efficiency by converting successful agent trajectories into dynamic natural language skills that condition a teacher model. The approach combines reinforcement learning with self-distillation and achieves significant performance improvements over baseline methods on benchmark tasks.

AIBullisharXiv – CS AI · Mar 276/10

🧠

Instruction Following by Principled Boosting Attention of Large Language Models

Researchers developed InstABoost, a new method to improve instruction following in large language models by boosting attention to instruction tokens without retraining. The technique addresses reliability issues where LLMs violate constraints under long contexts or conflicting user inputs, achieving better performance than existing methods across 15 tasks.

AINeutralarXiv – CS AI · Mar 266/10

🧠

The Diminishing Returns of Early-Exit Decoding in Modern LLMs

Research shows that newer LLMs have diminishing effectiveness for early-exit decoding techniques due to improved architectures that reduce layer redundancy. The study finds that dense transformers outperform Mixture-of-Experts models for early-exit, with larger models (20B+ parameters) and base pretrained models showing the highest early-exit potential.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Distilling Reasoning Without Knowledge: A Framework for Reliable LLMs

Researchers propose a new framework for large language models that separates planning from factual retrieval to improve reliability in fact-seeking question answering. The modular approach uses a lightweight student planner trained via teacher-student learning to generate structured reasoning steps, showing improved accuracy and speed on challenging benchmarks.

AIBullisharXiv – CS AI · Mar 116/10

🧠

AutoAgent: Evolving Cognition and Elastic Memory Orchestration for Adaptive Agents

Researchers introduce AutoAgent, a self-evolving multi-agent framework that combines evolving cognition, contextual decision-making, and elastic memory orchestration to enable adaptive autonomous agents. The system continuously learns from experience without external retraining and shows improved performance across retrieval, tool-use, and collaborative tasks compared to static baselines.

AINeutralarXiv – CS AI · Mar 55/10

🧠

Zono-Conformal Prediction: Zonotope-Based Uncertainty Quantification for Regression and Classification Tasks

Researchers introduce zono-conformal prediction, a new uncertainty quantification method for machine learning that uses zonotope-based prediction sets instead of traditional intervals. The approach is more computationally efficient and less conservative than existing conformal prediction methods while maintaining statistical coverage guarantees for both regression and classification tasks.

AIBullisharXiv – CS AI · Mar 37/108

🧠

AI Runtime Infrastructure

Researchers introduce AI Runtime Infrastructure, a new execution layer that sits between AI models and applications to optimize agent performance in real-time. This infrastructure actively monitors and intervenes in agent behavior during execution to improve task success, efficiency, and safety across long-running workflows.

AIBullisharXiv – CS AI · Mar 37/107

🧠

QuickGrasp: Responsive Video-Language Querying Service via Accelerated Tokenization and Edge-Augmented Inference

Researchers propose QuickGrasp, a video-language querying system that combines local processing with edge computing to achieve both fast response times and high accuracy. The system achieves up to 12.8x reduction in response delay while maintaining the accuracy of large video-language models through accelerated tokenization and adaptive edge augmentation.

AIBullisharXiv – CS AI · Mar 37/106

🧠

General Proximal Flow Networks

Researchers introduce General Proximal Flow Networks (GPFNs), a generalization of Bayesian Flow Networks that allows for arbitrary divergence functions instead of fixed Kullback-Leibler divergence. The framework enables iterative generative modeling with improved generation quality when divergence functions are adapted to underlying data geometry.

$LINK

AIBullisharXiv – CS AI · Mar 36/105

🧠

AMDS: Attack-Aware Multi-Stage Defense System for Network Intrusion Detection with Two-Stage Adaptive Weight Learning

Researchers developed AMDS, an attack-aware multi-stage defense system for network intrusion detection that uses adaptive weight learning to counter adversarial attacks. The system achieved 94.2% AUC and improved classification accuracy by 4.5 percentage points over existing adversarially trained ensembles by learning attack-specific detection strategies.

$CRV

AIBullisharXiv – CS AI · Mar 36/1010

🧠

ClinCoT: Clinical-Aware Visual Chain-of-Thought for Medical Vision Language Models

Researchers propose ClinCoT, a new framework for medical AI that improves Visual Language Models by grounding reasoning in specific visual regions rather than just text. The approach reduces factual hallucinations in medical AI systems by using visual chain-of-thought reasoning with clinically relevant image regions.

AINeutralarXiv – CS AI · Mar 37/108

🧠

Align and Filter: Improving Performance in Asynchronous On-Policy RL

Researchers propose a new method called total Variation-based Advantage aligned Constrained policy Optimization to address policy lag issues in distributed reinforcement learning systems. The approach aims to improve performance when scaling on-policy learning algorithms by mitigating the mismatch between behavior and learning policies during high-frequency updates.

Page 1 of 2Next →