y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#failure-analysis News & Analysis

6 articles tagged with #failure-analysis. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

6 articles
AINeutralarXiv – CS AI · 1d ago7/10
🧠

The Long-Horizon Task Mirage? Diagnosing Where and Why Agentic Systems Break

Researchers introduce HORIZON, a diagnostic benchmark for identifying and analyzing why large language model agents fail at long-horizon tasks requiring extended action sequences. By evaluating state-of-the-art models across multiple domains and proposing an LLM-as-a-Judge attribution pipeline, the study provides systematic methodology for understanding agent limitations and improving reliability.

🧠 GPT-5🧠 Claude
AIBullisharXiv – CS AI · 6d ago6/10
🧠

KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis

KITE is a training-free system that converts long robot execution videos into compact, interpretable tokens for vision-language models to analyze robot failures. The approach combines keyframe extraction, open-vocabulary detection, and bird's-eye-view spatial representations to enable failure detection, identification, localization, and correction without requiring model fine-tuning.

AINeutralarXiv – CS AI · Mar 96/10
🧠

Probing Visual Concepts in Lightweight Vision-Language Models for Automated Driving

Researchers analyzed Vision-Language Models (VLMs) used in automated driving to understand why they fail on simple visual tasks. They identified two failure modes: perceptual failure where visual information isn't encoded, and cognitive failure where information is present but not properly aligned with language semantics.

AINeutralarXiv – CS AI · Mar 27/1014
🧠

Demystifying the Lifecycle of Failures in Platform-Orchestrated Agentic Workflows

Researchers present AgentFail, a dataset of 307 real-world failure cases from agentic workflow platforms, analyzing how multi-agent AI systems fail and can be repaired. The study reveals that failures in these low-code orchestrated AI workflows propagate differently than traditional software, making them harder to diagnose and fix.

AIBullishSynced Review · Jun 166/107
🧠

Researchers from PSU and Duke introduce “Multi-Agent Systems Automated Failure Attribution

Researchers from Pennsylvania State University and Duke University have introduced automated failure attribution for multi-agent systems, a methodology that transforms the complex process of identifying system failures and their causes into a quantifiable and analyzable problem. This development could significantly improve the debugging and accountability processes in multi-agent AI system development.