y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#agent-reliability News & Analysis

5 articles tagged with #agent-reliability. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles
AINeutralarXiv โ€“ CS AI ยท 1d ago7/10
๐Ÿง 

The Long-Horizon Task Mirage? Diagnosing Where and Why Agentic Systems Break

Researchers introduce HORIZON, a diagnostic benchmark for identifying and analyzing why large language model agents fail at long-horizon tasks requiring extended action sequences. By evaluating state-of-the-art models across multiple domains and proposing an LLM-as-a-Judge attribution pipeline, the study provides systematic methodology for understanding agent limitations and improving reliability.

๐Ÿง  GPT-5๐Ÿง  Claude
AIBullisharXiv โ€“ CS AI ยท 2d ago7/10
๐Ÿง 

Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky

Researchers introduce DiaFORGE, a three-stage framework for training LLMs to reliably invoke enterprise APIs by focusing on disambiguation between similar tools and underspecified arguments. Fine-tuned models achieved 27-49 percentage points higher tool-invocation success than GPT-4o and Claude-3.5-Sonnet, with an open corpus of 5,000 production-grade API specifications released for further research.

๐Ÿง  GPT-4๐Ÿง  Claude
AIBullisharXiv โ€“ CS AI ยท Mar 56/10
๐Ÿง 

A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development

Researchers propose a dual-helix governance framework to address AI agent reliability issues in WebGIS development, implementing a 3-track architecture that achieved 51% reduction in code complexity. The framework uses knowledge graphs and self-learning cycles to overcome LLM limitations like context constraints and instruction failures.

AINeutralarXiv โ€“ CS AI ยท 2d ago6/10
๐Ÿง 

ClawVM: Harness-Managed Virtual Memory for Stateful Tool-Using LLM Agents

ClawVM is a virtual memory management system designed for stateful LLM agents that addresses critical failures in current context window management. The system implements typed pages, multi-resolution representations, and validated writeback protocols to ensure deterministic state residency and durability, adding minimal computational overhead.

AIBullisharXiv โ€“ CS AI ยท Mar 27/1011
๐Ÿง 

Foundation World Models for Agents that Learn, Verify, and Adapt Reliably Beyond Static Environments

Researchers propose a new framework for foundation world models that enables autonomous agents to learn, verify, and adapt reliably in dynamic environments. The approach combines reinforcement learning with formal verification and adaptive abstraction to create agents that can synthesize verifiable programs and maintain correctness while adapting to novel conditions.