#arxiv News & Analysis

Content tagged #arxiv focuses on preprint research from the arXiv repository, primarily covering computer science and artificial intelligence topics. Over the past 30 days, six articles have been indexed, with recent discussions centering on large language models including GPT-4 and Llama. The sentiment around these preprints remains entirely neutral, though bullish sentiment has declined 58.6 percentage points compared to the prior quarter. The tag frequently overlaps with #machine-learning, #research, and #ai-research discussions. Blockchain and cryptocurrency tickers like NEAR, LINK, and COMP have appeared alongside #arxiv content in recent coverage. Browse the articles below to explore what's currently being discussed in academic AI research.

sentiment · last 30d (6 articles) · -58.6pp bullish vs prior 90d

Top sources:arXiv – CS AI · 406

Often co-tagged with:#machine-learning #research #ai-research #llm #reinforcement-learning #computer-vision

Most-discussed entities:GPT-4 · 6Llama · 4Hugging Face · 1Claude · 1Nvidia · 1

447 articles

AINeutralarXiv – CS AI · May 296/10

🧠

Test Time Training for Supervised Causal Learning

Researchers propose Test-Time Training for Supervised Causal Learning (TTT-SCL), a framework addressing critical limitations in causal discovery by generating test-specific training sets. The approach significantly improves performance gaps between synthetic benchmarks and real-world applications while enhancing robustness to distribution shifts.

AINeutralarXiv – CS AI · May 296/10

🧠

From Meta-Thought to Execution: Cognitively Aligned Post-Training for Generalizable and Reliable LLM Reasoning

Researchers propose a cognitively-inspired post-training framework for large language models that separates abstract reasoning from problem-specific execution, mirroring how humans actually think. The approach, combining Chain-of-Meta-Thought supervised learning with Confidence-Calibrated Reinforcement Learning, achieves 2-3% performance improvements across benchmarks while improving generalization and robustness.

AINeutralarXiv – CS AI · May 286/10

🧠

The Computational Boundary of Inference: Capability Internalization, Training, and the Turing Jump

A new computability theory paper proves that finite internal self-modification in AI systems cannot exceed their existing computational layer, while qualitatively stronger capabilities require access to a higher computational level (the Turing jump). This formally separates recursive self-improvement narratives into within-layer iteration versus genuine capability ascent, constraining theoretical claims about AI recursive self-improvement.

AINeutralarXiv – CS AI · May 285/10

🧠

Eliot: Interactively $\underline{E}$xploring Fast-Changing Scientific $\underline{Li}$terature Trends with $\underline{O}$nline Da$\underline{t}$a and Learning

Researchers present Eliot, an interactive system for exploring evolving scientific literature trends across rapidly changing fields like Large Language Models and Automated Planning. The tool retrieves arXiv papers at query time, clusters them into thematic groups, and visualizes publication patterns over time, with evaluations showing 85% accuracy in meaningful cluster labeling across eight research domains.

AINeutralarXiv – CS AI · May 276/10

🧠

Graph is a Substrate Across Data Modalities

Researchers propose G-Substrate, a novel graph framework that treats graph structures as persistent substrates across multiple data modalities and tasks rather than isolated, task-specific constructs. The approach uses unified structural schemas and role-based training to enable graph representations to accumulate knowledge across heterogeneous domains, demonstrating superior performance compared to traditional isolated and multi-task learning methods.

AIBearishArs Technica – AI · May 156/10

🧠

Send the arXiv AI-generated slop, get a yearlong vacation from submissions

arXiv, the preprint repository for scientific papers, has implemented a policy banning AI-generated content submissions, with violators facing year-long submission bans. A moderator announced the enforcement on social media, signaling the platform's effort to maintain research integrity amid growing concerns about low-quality AI-generated submissions flooding academic repositories.

AINeutralarXiv – CS AI · May 126/10

🧠

ProactBench: Beyond What The User Asked For

ProactBench introduces a new evaluation framework for large language models that measures conversational proactivity—the ability to infer and act on users' implicit needs rather than just responding to explicit requests. The benchmark decomposes this ability into three types (Emergent, Critical, and Recovery) and tests 16 frontier models across 198 curated dialogues, revealing that Recovery tasks are particularly difficult and poorly predicted by existing benchmarks.

AINeutralarXiv – CS AI · May 126/10

🧠

Shaping Schema via Language Representation as the Next Frontier for LLM Intelligence Expanding

A new arXiv paper argues that optimizing how language represents tasks—rather than scaling model size—is crucial for advancing LLM intelligence. The research demonstrates that deliberate language representation design can yield substantial performance improvements without modifying model parameters, supported by controlled experiments showing how different linguistic framings of identical tasks trigger different internal feature activations.

AINeutralarXiv – CS AI · May 126/10

🧠

A Geometric Perspective on Next-Token Prediction in Large Language Models: Three Emerging Phases

Researchers have developed a geometric framework for understanding how large language models process information across their layers, identifying three distinct phases in next-token prediction: Seeding Multiplexing, Hoisting Overriding, and Focal Convergence. The study reveals that model depth primarily increases capacity for candidate disambiguation rather than adding fundamentally new computational stages.

AINeutralarXiv – CS AI · May 116/10

🧠

AGWM: Affordance-Grounded World Models for Environments with Compositional Prerequisites

Researchers propose AGWM (Affordance-Grounded World Models), a machine learning framework that improves how AI agents understand which actions are executable in dynamic environments by explicitly tracking prerequisite dependencies. The approach addresses a fundamental limitation in conventional world models that fail to account for how actions reshape the availability of future actions, reducing multi-step prediction errors and improving generalization.

AINeutralarXiv – CS AI · May 116/10

🧠

ARMOR: An Agentic Framework for Reaction Feasibility Prediction via Adaptive Utility-aware Multi-tool Reasoning

Researchers introduce ARMOR, an agentic framework that improves chemical reaction feasibility prediction by intelligently combining multiple AI tools rather than relying on single models. The system uses hierarchical tool organization and memory-augmented reasoning to resolve conflicting predictions, demonstrating significant performance gains especially when different tools disagree on outcomes.

AINeutralarXiv – CS AI · May 96/10

🧠

MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

Researchers introduce MASPO, a framework that automatically optimizes prompts across multi-agent LLM systems by evaluating how well each agent's outputs enable downstream success rather than in isolation. The approach uses evolutionary beam search to navigate prompt spaces and achieves 2.9% average accuracy improvements over existing methods across six diverse tasks.

AINeutralarXiv – CS AI · May 96/10

🧠

Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions

Researchers introduce GLiBRL, a novel deep Bayesian reinforcement learning method that combines generalized linear models with learnable basis functions to improve task generalization. The approach achieves fully tractable Bayesian inference over task parameters and demonstrates up to 1.8x performance improvements over existing meta-RL methods on benchmark tasks.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Memory as Metabolism: A Design for Companion Knowledge Systems

A new research paper proposes a governance framework for personal AI memory systems designed to function as 'companion' knowledge wikis that mirror user knowledge while compensating for epistemic failures like entrenchment and evidence suppression. The work addresses an emerging 2026 landscape of memory architectures for large language models through five operational mechanisms (TRIAGE, DECAY, CONTEXTUALIZE, CONSOLIDATE, AUDIT) aimed at preventing user-coupled drift in single-user knowledge systems.

AINeutralarXiv – CS AI · Apr 156/10

🧠

PrivacyReasoner: Can LLM Emulate a Human-like Privacy Mind?

Researchers introduce PrivacyReasoner, an LLM-based agent architecture that reconstructs individual privacy perspectives from online comment history to predict how specific people would perceive data practices. The system outperforms baseline models in predicting privacy concerns across AI, e-commerce, and healthcare domains by contextually activating relevant privacy beliefs.

AINeutralarXiv – CS AI · Apr 146/10

🧠

MADQRL: Distributed Quantum Reinforcement Learning Framework for Multi-Agent Environments

Researchers propose MADQRL, a distributed quantum reinforcement learning framework that enables multiple agents to learn independently across high-dimensional environments. The approach demonstrates ~10% improvement over classical distribution strategies and ~5% gains versus traditional policy representation models, addressing computational constraints of current quantum hardware in multi-agent settings.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Neural Computers

Researchers propose Neural Computers (NCs), a new computing paradigm where AI models function as executable runtime environments rather than static predictors. The work demonstrates early NC prototypes using video models that process instructions and user actions to generate screen frames, establishing foundational I/O primitives while identifying significant challenges toward achieving general-purpose Completely Neural Computers (CNCs).

AINeutralarXiv – CS AI · Apr 106/10

🧠

Commander-GPT: Dividing and Routing for Multimodal Sarcasm Detection

Researchers introduce Commander-GPT, a modular framework that orchestrates multiple specialized AI agents for multimodal sarcasm detection rather than relying on a single LLM. The system achieves 4.4-11.7% F1 score improvements over existing baselines on standard benchmarks, demonstrating that task decomposition and intelligent routing can overcome LLM limitations in understanding sarcasm.

🧠 GPT-4🧠 Gemini

AIBearisharXiv – CS AI · Apr 76/10

🧠

Plausibility as Commonsense Reasoning: Humans Succeed, Large Language Models Do not

A new study reveals that large language models fail to integrate world knowledge with syntactic structure for ambiguity resolution in the same way humans do. Researchers tested Turkish language models on relative-clause attachment ambiguities and found that while humans reliably use plausibility to guide interpretation, LLMs show weak, unstable, or reversed responses to the same plausibility cues.

AIBullisharXiv – CS AI · Apr 76/10

🧠

Optimizing Service Operations via LLM-Powered Multi-Agent Simulation

Researchers introduce an LLM-powered multi-agent simulation framework for optimizing service operations by modeling human behavior through AI agents. The method uses prompts to embed design choices and extracts outcomes from LLM responses to create a controlled Markov chain model, showing superior performance in supply chain and contest design applications.

AINeutralarXiv – CS AI · Apr 76/10

🧠

Selective Forgetting for Large Reasoning Models

Researchers propose a new framework for 'selective forgetting' in Large Reasoning Models (LRMs) that can remove sensitive information from AI training data while preserving general reasoning capabilities. The method uses retrieval-augmented generation to identify and replace problematic reasoning segments with benign placeholders, addressing privacy and copyright concerns in AI systems.

AIBullisharXiv – CS AI · Apr 76/10

🧠

Profile-Then-Reason: Bounded Semantic Complexity for Tool-Augmented Language Agents

Researchers introduce Profile-Then-Reason (PTR), a new framework for AI language agents that use external tools, which reduces computational overhead by pre-planning workflows rather than recomputing after each step. The approach limits language model calls to 2-3 times maximum and shows superior performance in 16 of 24 test configurations compared to reactive execution methods.

AIBullisharXiv – CS AI · Apr 76/10

🧠

InferenceEvolve: Towards Automated Causal Effect Estimators through Self-Evolving AI

Researchers introduce InferenceEvolve, an AI framework using large language models to automatically discover and refine causal inference methods. The system outperformed 58 human submissions in a recent competition and demonstrates how AI can optimize complex scientific programs through evolutionary approaches.

AINeutralarXiv – CS AI · Apr 76/10

🧠

Implementing surrogate goals for safer bargaining in LLM-based agents

Researchers developed methods to implement 'surrogate goals' in LLM-based agents to reduce bargaining risks by deflecting threats away from what principals care about. The study tested four approaches (prompting, fine-tuning, scaffolding) and found that scaffolding and fine-tuning methods outperformed simple prompting for implementing desired threat response behaviors.

AIBullisharXiv – CS AI · Apr 76/10

🧠

ANX: Protocol-First Design for AI Agent Interaction with a Supporting 3EX Decoupled Architecture

ANX is a new protocol-first framework designed for AI agent interaction, featuring a 3EX decoupled architecture that reduces token consumption by up to 66% compared to existing methods. The open-source protocol addresses security and efficiency issues in current AI agent implementations through agent-native design and integrated CLI, Skill, and MCP components.

🧠 GPT-4

← PrevPage 8 of 18Next →