Models, papers, tools. 20,246 articles with AI-powered sentiment analysis and key takeaways.
AINeutralarXiv – CS AI · Mar 176/10
🧠Researchers propose the Content Creation with Spillovers (CCS) model to address how GenAI and LLMs create positive spillovers where creators' content can be reused by others, potentially undermining individual incentives. They introduce Provisional Allocation mechanisms to guarantee equilibrium existence and develop approximation algorithms to maximize social welfare in content creation ecosystems.
AINeutralarXiv – CS AI · Mar 176/10
🧠Researchers introduce AgentProcessBench, the first benchmark for evaluating step-level effectiveness in AI tool-using agents, comprising 1,000 trajectories and 8,509 human-labeled annotations. The benchmark reveals that current AI models struggle with distinguishing neutral and erroneous actions in tool execution, and that process-level signals can significantly enhance test-time performance.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers introduce ArgEval, a new framework that enhances Large Language Model decision-making through structured argumentation and global contestability. Unlike previous approaches limited to binary choices and local corrections, ArgEval maps entire decision spaces and builds reusable argumentation frameworks that can be globally modified to prevent repeated mistakes.
AINeutralarXiv – CS AI · Mar 176/10
🧠Research reveals that Large Language Models struggle with dynamic Theory of Mind tasks, particularly tracking how others' beliefs change over time. While LLMs can infer current beliefs effectively, they fail to maintain and retrieve prior belief states after updates occur, showing patterns consistent with human cognitive biases.
AINeutralarXiv – CS AI · Mar 176/10
🧠Researchers introduce Gradient Atoms, an unsupervised method that decomposes AI model training gradients to discover interpretable behaviors without requiring predefined queries. The technique can identify model behaviors like refusal patterns and arithmetic capabilities, while also serving as effective steering vectors to control model outputs.
AIBearisharXiv – CS AI · Mar 176/10
🧠Researchers introduced BrainBench, a new benchmark revealing significant gaps in commonsense reasoning among leading LLMs. Even the best model (Claude Opus 4.6) achieved only 80.3% accuracy on 100 brainteaser questions, while GPT-4o scored just 39.7%, exposing fundamental reasoning deficits across frontier AI models.
🧠 GPT-4🧠 Claude🧠 Opus
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers introduce OpenHospital, a new interactive arena designed to develop and benchmark Large Language Model-based Collective Intelligence through physician-patient agent interactions. The platform uses a data-in-agent-self paradigm to rapidly enhance AI agent capabilities while providing evaluation metrics for medical proficiency and system efficiency.
AINeutralarXiv – CS AI · Mar 176/10
🧠Researchers introduce VTC-Bench, a comprehensive benchmark for evaluating multimodal AI models' ability to use visual tools for complex tasks. The benchmark reveals significant limitations in current models, with leading model Gemini-3.0-Pro achieving only 51% accuracy on multi-tool visual reasoning tasks.
🧠 Gemini
AINeutralarXiv – CS AI · Mar 176/10
🧠Researchers have introduced Prompt Readiness Levels (PRL), a nine-level maturity framework for evaluating and governing AI prompt assets in production environments. The system includes a multidimensional scoring method (PRS) designed to ensure prompt engineering meets operational, safety, and compliance standards across organizations.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers introduced NS-Mem, a neuro-symbolic memory framework that combines neural representations with symbolic structures to improve multimodal AI agent reasoning. The system achieved 4.35% average improvement in reasoning accuracy over pure neural systems, with up to 12.5% gains on constrained reasoning tasks.
AINeutralarXiv – CS AI · Mar 176/10
🧠Researchers have developed PMAx, an autonomous AI framework that democratizes process mining by allowing business users to analyze organizational workflows through natural language queries. The system uses a multi-agent architecture with local execution to ensure data privacy and mathematical accuracy while eliminating the need for specialized technical expertise.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers propose a new AI learning architecture inspired by human and animal cognition that integrates observational learning and active behavior learning. The framework includes a meta-control system that switches between learning modes, addressing current limitations in autonomous AI learning.
AINeutralarXiv – CS AI · Mar 176/10
🧠Researchers developed an information-theoretic framework to explain 'Aha moments' in large language models during reasoning tasks. The study reveals that strong reasoning performance stems from uncertainty externalization rather than specific tokens, decomposing LLM reasoning into procedural information and epistemic verbalization.
AIBearisharXiv – CS AI · Mar 176/10
🧠Researchers propose a priority graph model to understand conflicts in LLM alignment, revealing that unified stable alignment is challenging due to context-dependent inconsistencies. The study identifies 'priority hacking' as a vulnerability where adversaries can manipulate safety alignments, and suggests runtime verification mechanisms as a potential solution.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers propose a new computational concept for modeling the human psyche as an operating system for artificial general intelligence. The approach treats the psyche as a decision-making system that operates in a state space including needs, sensations, and actions to optimize goal achievement while minimizing risks.
AIBearisharXiv – CS AI · Mar 176/10
🧠A new study reveals that standard algorithmic metrics used to evaluate AI counterfactual explanations poorly correlate with human perceptions of explanation quality. The research found weak and dataset-dependent relationships between technical metrics and user judgments, highlighting fundamental limitations in current AI explainability evaluation methods.
AINeutralarXiv – CS AI · Mar 176/10
🧠Researchers propose autonomous editorial systems that use AI to continuously process, analyze, and organize large volumes of news and information. The system treats stories as persistent state that evolves over time through automated updates and enrichment, while maintaining human oversight and traceability.
$MKR
AIBearisharXiv – CS AI · Mar 176/10
🧠A research paper examines how AI-generated visual content is transforming society's relationship with reality and representation, intensifying visual media's dominance in shaping public consciousness. An experiment in Bolzano, Italy revealed people's strong preference for visually striking AI-generated urban development scenarios over practical solutions, highlighting how AI accelerates image commodification and deepens societal alienation.
AINeutralarXiv – CS AI · Mar 176/10
🧠Researchers discovered that transformer language models process factual information through rotational dynamics rather than magnitude changes, actively suppressing incorrect answers instead of passively failing. This geometric pattern only emerges in models above 1.6B parameters, suggesting a phase transition in factual processing capabilities.
AINeutralarXiv – CS AI · Mar 176/10
🧠Researchers introduce a structural taxonomy and unified evaluation framework for Audio Large Language Models (ALLMs) to assess fairness, safety, and security. The study reveals systematic differences in how ALLMs handle audio versus text inputs, with FSS behavior closely tied to acoustic information integration methods.
AINeutralarXiv – CS AI · Mar 176/10
🧠Researchers developed a framework to assess public summaries of AI training data required by EU's AI Act Article 53(1)(d), evaluating transparency and usefulness for stakeholder rights enforcement. The study analyzed 5 public summaries from GPAI model providers as of January 2026, creating guidelines for compliance and a public resource website.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers introduce Truncated-Reasoning Self-Distillation (TRSD), a post-training method that enables AI language models to maintain accuracy while using shorter reasoning traces. The technique reduces computational costs by training models to produce correct answers from partial reasoning, achieving significant inference-time efficiency gains without sacrificing performance.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers developed PREBA, a retrieval-augmented framework that uses PCA-weighted retrieval and Bayesian averaging to improve surgical duration prediction accuracy by up to 40% using large language models. The system grounds LLM predictions in institution-specific clinical data without requiring computationally intensive training, achieving performance competitive with supervised machine learning methods.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers introduce SPLARE, a new method that uses sparse autoencoders (SAEs) to improve learned sparse retrieval in language models. The technique outperforms existing vocabulary-based approaches in multilingual and out-of-domain settings, with SPLARE-7B achieving top results on multilingual retrieval benchmarks.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers propose FedTreeLoRA, a new framework for privacy-preserving fine-tuning of large language models that addresses both statistical and functional heterogeneity across federated learning clients. The method uses tree-structured aggregation to allow layer-wise specialization while maintaining shared consensus on foundational layers, significantly outperforming existing personalized federated learning approaches.