y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto
🤖All30,133🧠AI12,869⛓️Crypto10,905💎DeFi1,122🤖AI × Crypto558📰General4,679
🧠

AI

12,869 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

12869 articles
AINeutralarXiv – CS AI · Mar 176/10
🧠

PMAx: An Agentic Framework for AI-Driven Process Mining

Researchers have developed PMAx, an autonomous AI framework that democratizes process mining by allowing business users to analyze organizational workflows through natural language queries. The system uses a multi-agent architecture with local execution to ensure data privacy and mathematical accuracy while eliminating the need for specialized technical expertise.

AIBearisharXiv – CS AI · Mar 176/10
🧠

Do Metrics for Counterfactual Explanations Align with User Perception?

A new study reveals that standard algorithmic metrics used to evaluate AI counterfactual explanations poorly correlate with human perceptions of explanation quality. The research found weak and dataset-dependent relationships between technical metrics and user judgments, highlighting fundamental limitations in current AI explainability evaluation methods.

AIBullisharXiv – CS AI · Mar 176/10
🧠

FedTreeLoRA: Reconciling Statistical and Functional Heterogeneity in Federated LoRA Fine-Tuning

Researchers propose FedTreeLoRA, a new framework for privacy-preserving fine-tuning of large language models that addresses both statistical and functional heterogeneity across federated learning clients. The method uses tree-structured aggregation to allow layer-wise specialization while maintaining shared consensus on foundational layers, significantly outperforming existing personalized federated learning approaches.

AIBullisharXiv – CS AI · Mar 176/10
🧠

OpenHospital: A Thing-in-itself Arena for Evolving and Benchmarking LLM-based Collective Intelligence

Researchers introduce OpenHospital, a new interactive arena designed to develop and benchmark Large Language Model-based Collective Intelligence through physician-patient agent interactions. The platform uses a data-in-agent-self paradigm to rapidly enhance AI agent capabilities while providing evaluation metrics for medical proficiency and system efficiency.

AINeutralarXiv – CS AI · Mar 176/10
🧠

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

Researchers introduce VTC-Bench, a comprehensive benchmark for evaluating multimodal AI models' ability to use visual tools for complex tasks. The benchmark reveals significant limitations in current models, with leading model Gemini-3.0-Pro achieving only 51% accuracy on multi-tool visual reasoning tasks.

🧠 Gemini
AINeutralarXiv – CS AI · Mar 176/10
🧠

Gradient Atoms: Unsupervised Discovery, Attribution and Steering of Model Behaviors via Sparse Decomposition of Training Gradients

Researchers introduce Gradient Atoms, an unsupervised method that decomposes AI model training gradients to discover interpretable behaviors without requiring predefined queries. The technique can identify model behaviors like refusal patterns and arithmetic capabilities, while also serving as effective steering vectors to control model outputs.

AIBearisharXiv – CS AI · Mar 176/10
🧠

BrainBench: Exposing the Commonsense Reasoning Gap in Large Language Models

Researchers introduced BrainBench, a new benchmark revealing significant gaps in commonsense reasoning among leading LLMs. Even the best model (Claude Opus 4.6) achieved only 80.3% accuracy on 100 brainteaser questions, while GPT-4o scored just 39.7%, exposing fundamental reasoning deficits across frontier AI models.

🧠 GPT-4🧠 Claude🧠 Opus
AIBullisharXiv – CS AI · Mar 176/10
🧠

Argumentation for Explainable and Globally Contestable Decision Support with LLMs

Researchers introduce ArgEval, a new framework that enhances Large Language Model decision-making through structured argumentation and global contestability. Unlike previous approaches limited to binary choices and local corrections, ArgEval maps entire decision spaces and builds reusable argumentation frameworks that can be globally modified to prevent repeated mistakes.

AINeutralarXiv – CS AI · Mar 176/10
🧠

Dynamic Theory of Mind as a Temporal Memory Problem: Evidence from Large Language Models

Research reveals that Large Language Models struggle with dynamic Theory of Mind tasks, particularly tracking how others' beliefs change over time. While LLMs can infer current beliefs effectively, they fail to maintain and retrieve prior belief states after updates occur, showing patterns consistent with human cognitive biases.

AINeutralarXiv – CS AI · Mar 176/10
🧠

AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents

Researchers introduce AgentProcessBench, the first benchmark for evaluating step-level effectiveness in AI tool-using agents, comprising 1,000 trajectories and 8,509 human-labeled annotations. The benchmark reveals that current AI models struggle with distinguishing neutral and erroneous actions in tool execution, and that process-level signals can significantly enhance test-time performance.

AINeutralarXiv – CS AI · Mar 176/10
🧠

Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

Researchers propose a hierarchical planning framework to analyze why LLM-based web agents fail at complex navigation tasks. The study reveals that while structured PDDL plans outperform natural language plans, low-level execution and perceptual grounding remain the primary bottlenecks rather than high-level reasoning.

AINeutralarXiv – CS AI · Mar 176/10
🧠

Relationship-Aware Safety Unlearning for Multimodal LLMs

Researchers propose a new framework for improving safety in multimodal AI models by targeting unsafe relationships between objects rather than removing entire concepts. The approach uses parameter-efficient edits to suppress dangerous combinations while preserving benign uses of the same objects and relations.

AIBullisharXiv – CS AI · Mar 176/10
🧠

GRPO and Reflection Reward for Mathematical Reasoning in Large Language Models

Researchers propose GRPO (Group Relative Policy Optimization) combined with reflection reward mechanisms to enhance mathematical reasoning in large language models. The four-stage framework encourages self-reflective capabilities during training and demonstrates state-of-the-art performance over existing methods like supervised fine-tuning and LoRA.

AIBullisharXiv – CS AI · Mar 176/10
🧠

EviAgent: Evidence-Driven Agent for Radiology Report Generation

Researchers introduce EviAgent, a new AI system for automated radiology report generation that provides transparent, evidence-driven analysis. The system addresses key limitations of current medical AI models by offering traceable decision-making and integrating external domain knowledge, outperforming existing specialized medical models in testing.

AINeutralarXiv – CS AI · Mar 176/10
🧠

Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models

A comprehensive research study examines the relationship between Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) methods for improving Large Language Models after pre-training. The research identifies emerging trends toward hybrid post-training approaches that combine both methods, analyzing applications from 2023-2025 to establish when each method is most effective.

AIBullisharXiv – CS AI · Mar 176/10
🧠

From Refusal Tokens to Refusal Control: Discovering and Steering Category-Specific Refusal Directions

Researchers developed a method to control AI safety refusal behavior using categorical refusal tokens in Llama 3 8B, enabling fine-grained control over when models refuse harmful versus benign requests. The technique uses steering vectors that can be applied during inference without additional training, improving both safety and reducing over-refusal of harmless prompts.

🧠 Llama
AIBullisharXiv – CS AI · Mar 176/10
🧠

DOVA: Deliberation-First Multi-Agent Orchestration for Autonomous Research Automation

Researchers introduce DOVA (Deep Orchestrated Versatile Agent), a multi-agent AI platform that improves research automation through deliberation-first orchestration and hybrid collaborative reasoning. The system reduces inference costs by 40-60% on simple tasks while maintaining deep reasoning capabilities for complex research requiring multi-source synthesis.

AINeutralarXiv – CS AI · Mar 176/10
🧠

MESD: Detecting and Mitigating Procedural Bias in Intersectional Groups

Researchers propose MESD (Multi-category Explanation Stability Disparity), a new metric to detect procedural bias in AI models across intersectional groups. They also introduce UEF framework that balances utility, explanation quality, and fairness in machine learning systems.

AIBullisharXiv – CS AI · Mar 176/10
🧠

Distilling Deep Reinforcement Learning into Interpretable Fuzzy Rules: An Explainable AI Framework

Researchers developed a Hierarchical Takagi-Sugeno-Kang Fuzzy Classifier System that converts opaque deep reinforcement learning agents into human-readable IF-THEN rules, achieving 81.48% fidelity in tests. The framework addresses the critical explainability problem in AI systems used for safety-critical applications by providing interpretable rules that humans can verify and understand.

AIBullisharXiv – CS AI · Mar 176/10
🧠

Think First, Diffuse Fast: Improving Diffusion Language Model Reasoning via Autoregressive Plan Conditioning

Researchers developed plan conditioning, a training-free method that significantly improves diffusion language model reasoning by prepending short natural-language plans from autoregressive models. The technique improved performance by 11.6 percentage points on math problems and 12.8 points on coding tasks, bringing diffusion models to competitive levels with autoregressive models.

🧠 Llama
AINeutralarXiv – CS AI · Mar 176/10
🧠

The AI Fiction Paradox

A new research paper identifies the 'AI-Fiction Paradox' - AI models desperately need fiction for training data but struggle to generate quality fiction themselves. The paper outlines three core challenges: narrative causation requiring temporal paradoxes, informational revaluation that conflicts with current attention mechanisms, and multi-scale emotional architecture that current AI cannot orchestrate effectively.

AINeutralarXiv – CS AI · Mar 176/10
🧠

Contests with Spillovers: Incentivizing Content Creation with GenAI

Researchers propose the Content Creation with Spillovers (CCS) model to address how GenAI and LLMs create positive spillovers where creators' content can be reused by others, potentially undermining individual incentives. They introduce Provisional Allocation mechanisms to guarantee equilibrium existence and develop approximation algorithms to maximize social welfare in content creation ecosystems.

← PrevPage 200 of 515Next →
Filters
Sentiment
Importance
Sort
Stay Updated
Everything combined