AI Pulse News

Models, papers, tools. 19,058 articles with AI-powered sentiment analysis and key takeaways.

19058 articles

AINeutralarXiv – CS AI · Mar 176/10

🧠

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

Researchers introduce VTC-Bench, a comprehensive benchmark for evaluating multimodal AI models' ability to use visual tools for complex tasks. The benchmark reveals significant limitations in current models, with leading model Gemini-3.0-Pro achieving only 51% accuracy on multi-tool visual reasoning tasks.

🧠 Gemini

AINeutralarXiv – CS AI · Mar 176/10

🧠

Prompt Readiness Levels (PRL): a maturity scale and scoring framework for production grade prompt assets

Researchers have introduced Prompt Readiness Levels (PRL), a nine-level maturity framework for evaluating and governing AI prompt assets in production environments. The system includes a multidimensional scoring method (PRS) designed to ensure prompt engineering meets operational, safety, and compliance standards across organizations.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Advancing Multimodal Agent Reasoning with Long-Term Neuro-Symbolic Memory

Researchers introduced NS-Mem, a neuro-symbolic memory framework that combines neural representations with symbolic structures to improve multimodal AI agent reasoning. The system achieved 4.35% average improvement in reasoning accuracy over pure neural systems, with up to 12.5% gains on constrained reasoning tasks.

AINeutralarXiv – CS AI · Mar 176/10

🧠

PMAx: An Agentic Framework for AI-Driven Process Mining

Researchers have developed PMAx, an autonomous AI framework that democratizes process mining by allowing business users to analyze organizational workflows through natural language queries. The system uses a multi-agent architecture with local execution to ensure data privacy and mathematical accuracy while eliminating the need for specialized technical expertise.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Why AI systems don't learn and what to do about it: Lessons on autonomous learning from cognitive science

Researchers propose a new AI learning architecture inspired by human and animal cognition that integrates observational learning and active behavior learning. The framework includes a meta-control system that switches between learning modes, addressing current limitations in autonomous AI learning.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty

Researchers developed an information-theoretic framework to explain 'Aha moments' in large language models during reasoning tasks. The study reveals that strong reasoning performance stems from uncertainty externalization rather than specific tokens, decomposing LLM reasoning into procedural information and epistemic verbalization.

AIBearisharXiv – CS AI · Mar 176/10

🧠

Are Dilemmas and Conflicts in LLM Alignment Solvable? A View from Priority Graph

Researchers propose a priority graph model to understand conflicts in LLM alignment, revealing that unified stable alignment is challenging due to context-dependent inconsistencies. The study identifies 'priority hacking' as a vulnerability where adversaries can manipulate safety alignments, and suggests runtime verification mechanisms as a potential solution.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Computational Concept of the Psyche

Researchers propose a new computational concept for modeling the human psyche as an operating system for artificial general intelligence. The approach treats the psyche as a decision-making system that operates in a state space including needs, sensations, and actions to optimize goal achievement while minimizing risks.

AIBearisharXiv – CS AI · Mar 176/10

🧠

Do Metrics for Counterfactual Explanations Align with User Perception?

A new study reveals that standard algorithmic metrics used to evaluate AI counterfactual explanations poorly correlate with human perceptions of explanation quality. The research found weak and dataset-dependent relationships between technical metrics and user judgments, highlighting fundamental limitations in current AI explainability evaluation methods.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Autonomous Editorial Systems and Computational Investigation with Artificial Intelligence

Researchers propose autonomous editorial systems that use AI to continuously process, analyze, and organize large volumes of news and information. The system treats stories as persistent state that evolves over time through automated updates and enrichment, while maintaining human oversight and traceability.

$MKR

AIBearisharXiv – CS AI · Mar 176/10

🧠

Artificial Intelligence: Beyound Ocularcentrism, the New Age of Humans Beyond the Spectacle

A research paper examines how AI-generated visual content is transforming society's relationship with reality and representation, intensifying visual media's dominance in shaping public consciousness. An experiment in Bolzano, Italy revealed people's strong preference for visually striking AI-generated urban development scenarios over practical solutions, highlighting how AI accelerates image commodification and deepens societal alienation.

AINeutralarXiv – CS AI · Mar 176/10

🧠

How Transformers Reject Wrong Answers: Rotational Dynamics of Factual Constraint Processing

Researchers discovered that transformer language models process factual information through rotational dynamics rather than magnitude changes, actively suppressing incorrect answers instead of passively failing. This geometric pattern only emerges in models above 1.6B parameters, suggesting a phase transition in factual processing capabilities.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Evaluation of Audio Language Models for Fairness, Safety, and Security

Researchers introduce a structural taxonomy and unified evaluation framework for Audio Large Language Models (ALLMs) to assess fairness, safety, and security. The study reveals systematic differences in how ALLMs handle audio versus text inputs, with FSS behavior closely tied to acoustic information integration methods.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Quality Assessment of Public Summary of Training Content for GPAI models required by AI Act Article 53(1)(d)

Researchers developed a framework to assess public summaries of AI training data required by EU's AI Act Article 53(1)(d), evaluating transparency and usefulness for stakeholder rights enforcement. The study analyzed 5 public summaries from GPAI model providers as of January 2026, creating guidelines for compliance and a public resource website.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Learning from Partial Chain-of-Thought via Truncated-Reasoning Self-Distillation

Researchers introduce Truncated-Reasoning Self-Distillation (TRSD), a post-training method that enables AI language models to maintain accuracy while using shorter reasoning traces. The technique reduces computational costs by training models to produce correct answers from partial reasoning, achieving significant inference-time efficiency gains without sacrificing performance.

AIBullisharXiv – CS AI · Mar 176/10

🧠

PREBA: Surgical Duration Prediction via PCA-Weighted Retrieval-Augmented LLMs and Bayesian Averaging Aggregation

Researchers developed PREBA, a retrieval-augmented framework that uses PCA-weighted retrieval and Bayesian averaging to improve surgical duration prediction accuracy by up to 40% using large language models. The system grounds LLM predictions in institution-specific clinical data without requiring computationally intensive training, achieving performance competitive with supervised machine learning methods.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Learning Retrieval Models with Sparse Autoencoders

Researchers introduce SPLARE, a new method that uses sparse autoencoders (SAEs) to improve learned sparse retrieval in language models. The technique outperforms existing vocabulary-based approaches in multilingual and out-of-domain settings, with SPLARE-7B achieving top results on multilingual retrieval benchmarks.

AIBullisharXiv – CS AI · Mar 176/10

🧠

FedTreeLoRA: Reconciling Statistical and Functional Heterogeneity in Federated LoRA Fine-Tuning

Researchers propose FedTreeLoRA, a new framework for privacy-preserving fine-tuning of large language models that addresses both statistical and functional heterogeneity across federated learning clients. The method uses tree-structured aggregation to allow layer-wise specialization while maintaining shared consensus on foundational layers, significantly outperforming existing personalized federated learning approaches.

AIBullisharXiv – CS AI · Mar 176/10

🧠

From Stochastic Answers to Verifiable Reasoning: Interpretable Decision-Making with LLM-Generated Code

Researchers propose a new framework that uses LLMs as code generators rather than per-instance evaluators for high-stakes decision-making, creating interpretable and reproducible AI systems. The approach generates executable decision logic once instead of querying LLMs for each prediction, demonstrated through venture capital founder screening with competitive performance while maintaining full transparency.

🧠 GPT-4

AIBullisharXiv – CS AI · Mar 176/10

🧠

Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs

Researchers introduce Pragma-VL, a new alignment algorithm for Multimodal Large Language Models that balances safety and helpfulness by improving visual risk perception and using contextual arbitration. The method outperforms existing baselines by 5-20% on multimodal safety benchmarks while maintaining general AI capabilities in mathematics and reasoning.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Not All Queries Need Rewriting: When Prompt-Only LLM Refinement Helps and Hurts Dense Retrieval

Research reveals that LLM query rewriting in RAG systems shows highly domain-dependent performance, degrading retrieval effectiveness by 9% in financial domains while improving it by 5.1% in scientific contexts. The study identifies that effectiveness depends on whether rewriting improves or worsens lexical alignment between queries and domain-specific terminology.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Evidence-based Distributional Alignment for Large Language Models

Researchers propose Evi-DA, an evidence-based technique that improves how large language models predict population response distributions across different cultures and domains. The method uses World Values Survey data and reinforcement learning to achieve up to 44% improvement in accuracy compared to existing approaches.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Feature-level Interaction Explanations in Multimodal Transformers

Researchers introduce FL-I2MoE, a new Mixture-of-Experts layer for multimodal Transformers that explicitly identifies synergistic and redundant cross-modal feature interactions. The method provides more interpretable explanations for how different data modalities contribute to AI decision-making compared to existing approaches.

AIBullisharXiv – CS AI · Mar 176/10

🧠

LUMINA: Laplacian-Unifying Mechanism for Interpretable Neurodevelopmental Analysis via Quad-Stream GCN

Researchers developed LUMINA, a new Graph Convolutional Network architecture that improves AI-driven diagnosis of neurodevelopmental disorders using fMRI brain data. The system achieved 84.66% accuracy for ADHD and 88.41% for autism spectrum disorder detection by addressing traditional GCN limitations in capturing neural connection dynamics.

AIBullisharXiv – CS AI · Mar 176/10

🧠

PolyGLU: State-Conditional Activation Routing in Transformer Feed-Forward Networks

Researchers introduce PolyGLU, a new transformer architecture that enables dynamic routing among multiple activation functions, mimicking biological neural diversity. The 597M-parameter PolychromaticLM model shows emergent specialization patterns and achieves strong performance despite training on significantly fewer tokens than comparable models.

🏢 Nvidia

← PrevPage 358 of 763Next →