y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#ai-agents News & Analysis

449 articles tagged with #ai-agents. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

449 articles
AIBullisharXiv – CS AI · Mar 276/10
🧠

ElephantBroker: A Knowledge-Grounded Cognitive Runtime for Trustworthy AI Agents

Researchers have introduced ElephantBroker, an open-source cognitive runtime system that combines knowledge graphs with vector storage to create more trustworthy AI agents with verifiable memory. The system implements comprehensive safety measures, evidence verification, and multi-organizational access controls for enterprise AI deployments.

AIBullisharXiv – CS AI · Mar 276/10
🧠

TRAJEVAL: Decomposing Code Agent Trajectories for Fine-Grained Diagnosis

Researchers introduce TRAJEVAL, a diagnostic framework that breaks down AI code agent performance into three stages (search, read, edit) to identify specific failure points rather than just binary pass/fail outcomes. The framework analyzed 16,758 trajectories and found that real-time feedback based on trajectory signals improved state-of-the-art models by 2.2-4.6 percentage points while reducing costs by 20-31%.

🧠 GPT-5
AIBullisharXiv – CS AI · Mar 276/10
🧠

Experiential Reflective Learning for Self-Improving LLM Agents

Researchers introduce Experiential Reflective Learning (ERL), a framework that enables AI agents to improve performance by learning from past experiences and generating transferable heuristics. The method shows a 7.8% improvement in success rates on the Gaia2 benchmark compared to baseline approaches.

AIBullisharXiv – CS AI · Mar 276/10
🧠

AIP: Agent Identity Protocol for Verifiable Delegation Across MCP and A2A

Researchers introduce Agent Identity Protocol (AIP) with Invocation-Bound Capability Tokens (IBCTs) to address the lack of authentication in AI agent communications via Model Context Protocol and Agent-to-Agent protocols. The protocol achieved 100% attack rejection rate in testing with minimal performance overhead of 0.086% in real deployments.

🧠 Gemini
AI × CryptoBearishBlockonomi · Mar 266/10
🤖

Dragonfly’s Haseeb Qureshi Warns Agentic Payments Are Not Ready for Mass Adoption

Dragonfly's Haseeb Qureshi warns that AI agent payments are not ready for mainstream use, comparing current AI agents to the primitive 1964 computer mouse. He highlights that OpenClaw remains buggy for financial tasks and the x402 protocol processes only $1 million daily, indicating the market is still in early experimental stages.

AINeutralarXiv – CS AI · Mar 266/10
🧠

The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence

Researchers developed a Markovian framework to measure reliability and oversight costs for AI agents in organizational workflows before deployment. Testing on enterprise procurement data showed that workflows appearing reliable at the state level can have substantial decision-making blind spots when refined with contextual information.

AIBullisharXiv – CS AI · Mar 266/10
🧠

LensWalk: Agentic Video Understanding by Planning How You See in Videos

Researchers introduced LensWalk, an agentic AI framework that enables Large Language Models to actively control their visual observation of videos through dynamic temporal sampling. The system uses a reason-plan-observe loop to progressively gather evidence, achieving 5% accuracy improvements on challenging video benchmarks without requiring model fine-tuning.

AINeutralThe Register – AI · Mar 256/10
🧠

Oracle: AI agents can reason, decide and act - liability question remains

Oracle highlights that AI agents are advancing in their ability to reason, make decisions and take autonomous actions, but significant questions remain about legal liability and responsibility when these systems operate independently. This development represents a crucial inflection point for AI adoption in enterprise and financial applications.

AI × CryptoNeutralArs Technica – AI · Mar 176/10
🤖

How World ID wants to put a unique human identity on every AI agent

World ID is proposing to use iris-scan backed tokens to create unique human identities for AI agents. This system aims to prevent AI agent swarms from overwhelming online systems by ensuring each agent has a verified human identity.

How World ID wants to put a unique human identity on every AI agent
AIBullishAI News · Mar 176/10
🧠

Trustpilot partners with AI companies as traditional search declines

Trustpilot is pursuing partnerships with large eCommerce companies as AI-driven shopping grows, with CEO Adrian Blair noting that AI agents need comprehensive business information to make effective consumer decisions. The move comes as traditional search methods decline and AI systems require more structured data sources.

AIBullisharXiv – CS AI · Mar 176/10
🧠

Beyond Local Code Optimization: Multi-Agent Reasoning for Software System Optimization

Researchers introduced a multi-agent AI framework for whole-system software optimization that goes beyond local code improvements to analyze entire microservice architectures. The system uses coordinated agents for summarization, analysis, optimization, and verification, achieving 36.58% throughput improvement and 27.81% response time reduction in proof-of-concept testing.

AIBullisharXiv – CS AI · Mar 176/10
🧠

Universe Routing: Why Self-Evolving Agents Need Epistemic Control

Researchers propose a 'universe routing' solution for AI agents that struggle to choose appropriate reasoning frameworks when faced with different types of questions. The study shows that hard routing to specialized solvers is 7x faster than soft mixing approaches, with a 465M-parameter router achieving superior generalization and zero forgetting in continual learning scenarios.

🏢 Meta
AINeutralarXiv – CS AI · Mar 176/10
🧠

NetArena: Dynamic Benchmarks for AI Agents in Network Automation

NetArena introduces a dynamic benchmarking framework for evaluating AI agents in network automation tasks, addressing limitations of static benchmarks through runtime query generation and network emulator integration. The framework reveals that AI agents achieve only 13-38% performance on realistic network queries, significantly improving statistical reliability by reducing confidence-interval overlap from 85% to 0%.

AIBullisharXiv – CS AI · Mar 176/10
🧠

Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models

Researchers introduce Imagine-then-Plan (ITP), a new AI framework that enables agents to learn through adaptive lookahead imagination using world models. The system allows AI agents to simulate multi-step future scenarios and adjust planning horizons dynamically, significantly outperforming existing methods in benchmark tests.

AIBullisharXiv – CS AI · Mar 176/10
🧠

DOVA: Deliberation-First Multi-Agent Orchestration for Autonomous Research Automation

Researchers introduce DOVA (Deep Orchestrated Versatile Agent), a multi-agent AI platform that improves research automation through deliberation-first orchestration and hybrid collaborative reasoning. The system reduces inference costs by 40-60% on simple tasks while maintaining deep reasoning capabilities for complex research requiring multi-source synthesis.

AINeutralarXiv – CS AI · Mar 176/10
🧠

AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents

Researchers introduce AgentProcessBench, the first benchmark for evaluating step-level effectiveness in AI tool-using agents, comprising 1,000 trajectories and 8,509 human-labeled annotations. The benchmark reveals that current AI models struggle with distinguishing neutral and erroneous actions in tool execution, and that process-level signals can significantly enhance test-time performance.

AIBullisharXiv – CS AI · Mar 176/10
🧠

Advancing Multimodal Agent Reasoning with Long-Term Neuro-Symbolic Memory

Researchers introduced NS-Mem, a neuro-symbolic memory framework that combines neural representations with symbolic structures to improve multimodal AI agent reasoning. The system achieved 4.35% average improvement in reasoning accuracy over pure neural systems, with up to 12.5% gains on constrained reasoning tasks.

AINeutralarXiv – CS AI · Mar 176/10
🧠

PMAx: An Agentic Framework for AI-Driven Process Mining

Researchers have developed PMAx, an autonomous AI framework that democratizes process mining by allowing business users to analyze organizational workflows through natural language queries. The system uses a multi-agent architecture with local execution to ensure data privacy and mathematical accuracy while eliminating the need for specialized technical expertise.

AINeutralarXiv – CS AI · Mar 176/10
🧠

Bridging Protocol and Production: Design Patterns for Deploying AI Agents with Model Context Protocol

Researchers identify three critical gaps in the Model Context Protocol (MCP) that prevent AI agents from operating safely at production scale, despite MCP having over 10,000 active servers and 97 million monthly SDK downloads. The paper proposes three new mechanisms to address missing identity propagation, adaptive tool budgeting, and structured error semantics based on enterprise deployment experience.

← PrevPage 11 of 18Next →