AI Pulse News

Models, papers, tools. 20,246 articles with AI-powered sentiment analysis and key takeaways.

20246 articles

AINeutralarXiv – CS AI · Mar 176/10

🧠

Contests with Spillovers: Incentivizing Content Creation with GenAI

Researchers propose the Content Creation with Spillovers (CCS) model to address how GenAI and LLMs create positive spillovers where creators' content can be reused by others, potentially undermining individual incentives. They introduce Provisional Allocation mechanisms to guarantee equilibrium existence and develop approximation algorithms to maximize social welfare in content creation ecosystems.

AINeutralarXiv – CS AI · Mar 176/10

🧠

AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents

Researchers introduce AgentProcessBench, the first benchmark for evaluating step-level effectiveness in AI tool-using agents, comprising 1,000 trajectories and 8,509 human-labeled annotations. The benchmark reveals that current AI models struggle with distinguishing neutral and erroneous actions in tool execution, and that process-level signals can significantly enhance test-time performance.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Argumentation for Explainable and Globally Contestable Decision Support with LLMs

Researchers introduce ArgEval, a new framework that enhances Large Language Model decision-making through structured argumentation and global contestability. Unlike previous approaches limited to binary choices and local corrections, ArgEval maps entire decision spaces and builds reusable argumentation frameworks that can be globally modified to prevent repeated mistakes.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Dynamic Theory of Mind as a Temporal Memory Problem: Evidence from Large Language Models

Research reveals that Large Language Models struggle with dynamic Theory of Mind tasks, particularly tracking how others' beliefs change over time. While LLMs can infer current beliefs effectively, they fail to maintain and retrieve prior belief states after updates occur, showing patterns consistent with human cognitive biases.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Gradient Atoms: Unsupervised Discovery, Attribution and Steering of Model Behaviors via Sparse Decomposition of Training Gradients

Researchers introduce Gradient Atoms, an unsupervised method that decomposes AI model training gradients to discover interpretable behaviors without requiring predefined queries. The technique can identify model behaviors like refusal patterns and arithmetic capabilities, while also serving as effective steering vectors to control model outputs.

AIBearisharXiv – CS AI · Mar 176/10

🧠

BrainBench: Exposing the Commonsense Reasoning Gap in Large Language Models

Researchers introduced BrainBench, a new benchmark revealing significant gaps in commonsense reasoning among leading LLMs. Even the best model (Claude Opus 4.6) achieved only 80.3% accuracy on 100 brainteaser questions, while GPT-4o scored just 39.7%, exposing fundamental reasoning deficits across frontier AI models.

🧠 GPT-4🧠 Claude🧠 Opus

AIBullisharXiv – CS AI · Mar 176/10

🧠

OpenHospital: A Thing-in-itself Arena for Evolving and Benchmarking LLM-based Collective Intelligence

Researchers introduce OpenHospital, a new interactive arena designed to develop and benchmark Large Language Model-based Collective Intelligence through physician-patient agent interactions. The platform uses a data-in-agent-self paradigm to rapidly enhance AI agent capabilities while providing evaluation metrics for medical proficiency and system efficiency.

AINeutralarXiv – CS AI · Mar 176/10

🧠

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

Researchers introduce VTC-Bench, a comprehensive benchmark for evaluating multimodal AI models' ability to use visual tools for complex tasks. The benchmark reveals significant limitations in current models, with leading model Gemini-3.0-Pro achieving only 51% accuracy on multi-tool visual reasoning tasks.

🧠 Gemini

AINeutralarXiv – CS AI · Mar 176/10

🧠

Prompt Readiness Levels (PRL): a maturity scale and scoring framework for production grade prompt assets

Researchers have introduced Prompt Readiness Levels (PRL), a nine-level maturity framework for evaluating and governing AI prompt assets in production environments. The system includes a multidimensional scoring method (PRS) designed to ensure prompt engineering meets operational, safety, and compliance standards across organizations.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Advancing Multimodal Agent Reasoning with Long-Term Neuro-Symbolic Memory

Researchers introduced NS-Mem, a neuro-symbolic memory framework that combines neural representations with symbolic structures to improve multimodal AI agent reasoning. The system achieved 4.35% average improvement in reasoning accuracy over pure neural systems, with up to 12.5% gains on constrained reasoning tasks.

AINeutralarXiv – CS AI · Mar 176/10

🧠

PMAx: An Agentic Framework for AI-Driven Process Mining

Researchers have developed PMAx, an autonomous AI framework that democratizes process mining by allowing business users to analyze organizational workflows through natural language queries. The system uses a multi-agent architecture with local execution to ensure data privacy and mathematical accuracy while eliminating the need for specialized technical expertise.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Why AI systems don't learn and what to do about it: Lessons on autonomous learning from cognitive science

Researchers propose a new AI learning architecture inspired by human and animal cognition that integrates observational learning and active behavior learning. The framework includes a meta-control system that switches between learning modes, addressing current limitations in autonomous AI learning.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty

Researchers developed an information-theoretic framework to explain 'Aha moments' in large language models during reasoning tasks. The study reveals that strong reasoning performance stems from uncertainty externalization rather than specific tokens, decomposing LLM reasoning into procedural information and epistemic verbalization.

AIBearisharXiv – CS AI · Mar 176/10

🧠

Are Dilemmas and Conflicts in LLM Alignment Solvable? A View from Priority Graph

Researchers propose a priority graph model to understand conflicts in LLM alignment, revealing that unified stable alignment is challenging due to context-dependent inconsistencies. The study identifies 'priority hacking' as a vulnerability where adversaries can manipulate safety alignments, and suggests runtime verification mechanisms as a potential solution.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Computational Concept of the Psyche

Researchers propose a new computational concept for modeling the human psyche as an operating system for artificial general intelligence. The approach treats the psyche as a decision-making system that operates in a state space including needs, sensations, and actions to optimize goal achievement while minimizing risks.

AIBearisharXiv – CS AI · Mar 176/10

🧠

Do Metrics for Counterfactual Explanations Align with User Perception?

A new study reveals that standard algorithmic metrics used to evaluate AI counterfactual explanations poorly correlate with human perceptions of explanation quality. The research found weak and dataset-dependent relationships between technical metrics and user judgments, highlighting fundamental limitations in current AI explainability evaluation methods.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Autonomous Editorial Systems and Computational Investigation with Artificial Intelligence

Researchers propose autonomous editorial systems that use AI to continuously process, analyze, and organize large volumes of news and information. The system treats stories as persistent state that evolves over time through automated updates and enrichment, while maintaining human oversight and traceability.

$MKR

AIBearisharXiv – CS AI · Mar 176/10

🧠

Artificial Intelligence: Beyound Ocularcentrism, the New Age of Humans Beyond the Spectacle

A research paper examines how AI-generated visual content is transforming society's relationship with reality and representation, intensifying visual media's dominance in shaping public consciousness. An experiment in Bolzano, Italy revealed people's strong preference for visually striking AI-generated urban development scenarios over practical solutions, highlighting how AI accelerates image commodification and deepens societal alienation.

AINeutralarXiv – CS AI · Mar 176/10

🧠

How Transformers Reject Wrong Answers: Rotational Dynamics of Factual Constraint Processing

Researchers discovered that transformer language models process factual information through rotational dynamics rather than magnitude changes, actively suppressing incorrect answers instead of passively failing. This geometric pattern only emerges in models above 1.6B parameters, suggesting a phase transition in factual processing capabilities.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Evaluation of Audio Language Models for Fairness, Safety, and Security

Researchers introduce a structural taxonomy and unified evaluation framework for Audio Large Language Models (ALLMs) to assess fairness, safety, and security. The study reveals systematic differences in how ALLMs handle audio versus text inputs, with FSS behavior closely tied to acoustic information integration methods.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Quality Assessment of Public Summary of Training Content for GPAI models required by AI Act Article 53(1)(d)

Researchers developed a framework to assess public summaries of AI training data required by EU's AI Act Article 53(1)(d), evaluating transparency and usefulness for stakeholder rights enforcement. The study analyzed 5 public summaries from GPAI model providers as of January 2026, creating guidelines for compliance and a public resource website.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Learning from Partial Chain-of-Thought via Truncated-Reasoning Self-Distillation

Researchers introduce Truncated-Reasoning Self-Distillation (TRSD), a post-training method that enables AI language models to maintain accuracy while using shorter reasoning traces. The technique reduces computational costs by training models to produce correct answers from partial reasoning, achieving significant inference-time efficiency gains without sacrificing performance.

AIBullisharXiv – CS AI · Mar 176/10

🧠

PREBA: Surgical Duration Prediction via PCA-Weighted Retrieval-Augmented LLMs and Bayesian Averaging Aggregation

Researchers developed PREBA, a retrieval-augmented framework that uses PCA-weighted retrieval and Bayesian averaging to improve surgical duration prediction accuracy by up to 40% using large language models. The system grounds LLM predictions in institution-specific clinical data without requiring computationally intensive training, achieving performance competitive with supervised machine learning methods.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Learning Retrieval Models with Sparse Autoencoders

Researchers introduce SPLARE, a new method that uses sparse autoencoders (SAEs) to improve learned sparse retrieval in language models. The technique outperforms existing vocabulary-based approaches in multilingual and out-of-domain settings, with SPLARE-7B achieving top results on multilingual retrieval benchmarks.

AIBullisharXiv – CS AI · Mar 176/10

🧠

FedTreeLoRA: Reconciling Statistical and Functional Heterogeneity in Federated LoRA Fine-Tuning

Researchers propose FedTreeLoRA, a new framework for privacy-preserving fine-tuning of large language models that addresses both statistical and functional heterogeneity across federated learning clients. The method uses tree-structured aggregation to allow layer-wise specialization while maintaining shared consensus on foundational layers, significantly outperforming existing personalized federated learning approaches.

← PrevPage 402 of 810Next →