y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#arxiv News & Analysis

408 articles tagged with #arxiv. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

408 articles
AINeutralarXiv โ€“ CS AI ยท Mar 117/10
๐Ÿง 

From Data Statistics to Feature Geometry: How Correlations Shape Superposition

Researchers introduce Bag-of-Words Superposition (BOWS) to study how neural networks arrange features in superposition when using realistic correlated data. The study reveals that interference between features can be constructive rather than just noise, leading to semantic clusters and cyclical structures observed in language models.

AIBullisharXiv โ€“ CS AI ยท Mar 117/10
๐Ÿง 

The Missing Memory Hierarchy: Demand Paging for LLM Context Windows

Researchers developed Pichay, a demand paging system that treats LLM context windows like computer memory with hierarchical caching. The system reduces context consumption by up to 93% in production by evicting stale content and managing memory more efficiently, addressing fundamental scalability issues in AI systems.

AIBullisharXiv โ€“ CS AI ยท Mar 97/10
๐Ÿง 

DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning

Researchers introduce DataChef-32B, an AI system that uses reinforcement learning to automatically generate optimal data processing recipes for training large language models. The system eliminates the need for manual data curation by automatically designing complete data pipelines, achieving performance comparable to human experts across six benchmark tasks.

AIBullisharXiv โ€“ CS AI ยท Mar 97/10
๐Ÿง 

BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations

Researchers introduce BEVLM, a framework that integrates Large Language Models with Bird's-Eye View representations for autonomous driving. The approach improves LLM reasoning accuracy in cross-view driving scenarios by 46% and enhances end-to-end driving performance by 29% in safety-critical situations.

AINeutralarXiv โ€“ CS AI ยท Mar 97/10
๐Ÿง 

Reasoning Models Struggle to Control their Chains of Thought

Researchers found that AI reasoning models struggle to control their chain-of-thought (CoT) outputs, with Claude Sonnet 4.5 able to control its CoT only 2.7% of the time versus 61.9% for final outputs. This limitation suggests CoT monitoring remains viable for detecting AI misbehavior, though the underlying mechanisms are poorly understood.

๐Ÿง  Claude๐Ÿง  Sonnet
AIBullisharXiv โ€“ CS AI ยท Mar 67/10
๐Ÿง 

SkillNet: Create, Evaluate, and Connect AI Skills

Researchers introduce SkillNet, an open infrastructure for creating, evaluating, and organizing AI skills at scale to address the problem of AI agents repeatedly rediscovering solutions. The system includes over 200,000 skills and demonstrates 40% improvement in agent performance while reducing execution steps by 30% across multiple testing environments.

AIBullisharXiv โ€“ CS AI ยท Mar 67/10
๐Ÿง 

CONE: Embeddings for Complex Numerical Data Preserving Unit and Variable Semantics

Researchers introduce CONE, a hybrid transformer encoder model that improves numerical reasoning in AI by creating embeddings that preserve the semantics of numbers, ranges, and units. The model achieves 87.28% F1 score on DROP dataset, representing a 9.37% improvement over existing state-of-the-art models across web, medical, finance, and government domains.

AINeutralarXiv โ€“ CS AI ยท Mar 56/10
๐Ÿง 

Belief-Sim: Towards Belief-Driven Simulation of Demographic Misinformation Susceptibility

Researchers introduce BeliefSim, a framework that uses Large Language Models to simulate how different demographic groups are susceptible to misinformation based on their underlying beliefs. The system achieves up to 92% accuracy in predicting misinformation susceptibility by incorporating psychology-informed belief profiles.

AIBullisharXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

HumanLM: Simulating Users with State Alignment Beats Response Imitation

Researchers introduce HumanLM, a novel AI training framework that creates user simulators by aligning psychological states rather than just imitating response patterns. The system achieved 16.3% improvement in alignment scores across six datasets with 26k users and 216k responses, demonstrating superior ability to simulate real human behavior.

AIBullisharXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

Kaleido: Open-Sourced Multi-Subject Reference Video Generation Model

Researchers have introduced Kaleido, an open-source AI model for generating consistent videos from multiple reference images of subjects. The framework addresses key limitations in subject-to-video generation through improved data construction and a novel Reference Rotary Positional Encoding technique.

AINeutralarXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

Upholding Epistemic Agency: A Brouwerian Assertibility Constraint for Responsible AI

Researchers propose a Brouwerian assertibility constraint for AI systems that requires them to provide publicly inspectable certificates of entitlement before making claims in high-stakes domains. The framework introduces a three-status interface (Asserted, Denied, Undetermined) to preserve human epistemic agency when AI systems participate in public justification processes.

AINeutralarXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

InEdit-Bench: Benchmarking Intermediate Logical Pathways for Intelligent Image Editing Models

Researchers introduced InEdit-Bench, the first evaluation benchmark specifically designed to test image editing models' ability to reason through intermediate logical pathways in multi-step visual transformations. Testing 14 representative models revealed significant shortcomings in handling complex scenarios requiring dynamic reasoning and procedural understanding.

AIBullisharXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

Architectural Proprioception in State Space Models: Thermodynamic Training Induces Anticipatory Halt Detection

Researchers introduce the Probability Navigation Architecture (PNA) framework that trains State Space Models with thermodynamic principles, discovering that SSMs develop 'architectural proprioception' - the ability to predict when to stop computation based on internal state entropy. This breakthrough shows SSMs can achieve computational self-awareness while Transformers cannot, with significant implications for efficient AI inference systems.

AIBullisharXiv โ€“ CS AI ยท Mar 56/10
๐Ÿง 

PlugMem: A Task-Agnostic Plugin Memory Module for LLM Agents

Researchers propose PlugMem, a task-agnostic plugin memory module for LLM agents that structures episodic memories into knowledge-centric graphs for efficient retrieval. The system consistently outperforms existing memory designs across multiple benchmarks while maintaining transferability between different tasks.

AIBullisharXiv โ€“ CS AI ยท Mar 56/10
๐Ÿง 

Toward Reasoning on the Boundary: A Mixup-based Approach for Graph Anomaly Detection

Researchers introduce ANOMIX, a new framework that improves graph neural network anomaly detection by generating hard negative samples through mixup techniques. The method addresses the limitation of existing GNN-based detection systems that struggle with subtle boundary anomalies by creating more robust decision boundaries.

AIBullisharXiv โ€“ CS AI ยท Mar 56/10
๐Ÿง 

AriadneMem: Threading the Maze of Lifelong Memory for LLM Agents

Researchers have developed AriadneMem, a new memory system for long-horizon LLM agents that addresses challenges in maintaining accurate memory under fixed context budgets. The system uses a two-phase pipeline with entropy-aware gating and conflict-aware coarsening to improve multi-hop reasoning while reducing runtime by 77.8% and using only 497 context tokens.

๐Ÿง  GPT-4
AIBearisharXiv โ€“ CS AI ยท Mar 56/10
๐Ÿง 

Preference Leakage: A Contamination Problem in LLM-as-a-judge

Researchers have identified 'preference leakage,' a contamination problem in LLM-as-a-judge systems where evaluator models show bias toward related data generator models. The study found this bias occurs when judge and generator LLMs share relationships like being the same model, having inheritance connections, or belonging to the same model family.

AIBullisharXiv โ€“ CS AI ยท Mar 56/10
๐Ÿง 

A Rubric-Supervised Critic from Sparse Real-World Outcomes

Researchers propose a new framework called Critic Rubrics to bridge the gap between academic coding agent benchmarks and real-world applications. The system learns from sparse, noisy human interaction data using 24 behavioral features and shows significant improvements in code generation tasks including 15.9% better reranking performance on SWE-bench.

AINeutralarXiv โ€“ CS AI ยท Mar 56/10
๐Ÿง 

Towards Personalized Deep Research: Benchmarks and Evaluations

Researchers introduce PDR-Bench, the first benchmark for evaluating personalization in Deep Research Agents (DRAs), featuring 250 realistic user-task queries across 10 domains. The benchmark uses a new PQR Evaluation Framework to measure personalization alignment, content quality, and factual reliability in AI research assistants.

AIBullisharXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

Safety Guardrails for LLM-Enabled Robots

Researchers developed RoboGuard, a two-stage safety architecture to protect LLM-enabled robots from harmful behaviors caused by AI hallucinations and adversarial attacks. The system reduced unsafe plan execution from over 92% to below 3% in testing while maintaining performance on safe operations.

AIBullisharXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

What Does Flow Matching Bring To TD Learning?

Researchers demonstrate that flow matching improves reinforcement learning through enhanced TD learning mechanisms rather than distributional modeling. The approach achieves 2x better final performance and 5x improved sample efficiency compared to standard critics by enabling test-time error recovery and more plastic feature learning.

AIBullisharXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs

Researchers discovered that Large Language Models become increasingly sparse in their internal representations when handling more difficult or out-of-distribution tasks. This sparsity mechanism appears to be an adaptive response that helps stabilize reasoning under challenging conditions, leading to the development of a new learning strategy called Sparsity-Guided Curriculum In-Context Learning (SG-ICL).

AINeutralarXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

Emotion-Gradient Metacognitive RSI (Part I): Theoretical Foundations and Single-Agent Architecture

Researchers introduce the Emotion-Gradient Metacognitive Recursive Self-Improvement (EG-MRSI) framework, a theoretical architecture for AI systems that can safely modify their own learning algorithms. The framework integrates metacognition, emotion-based motivation, and self-modification with formal safety constraints, representing foundational research toward safe artificial general intelligence.