#trajectory-analysis News & Analysis

12 articles tagged with #trajectory-analysis. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

12 articles

AINeutralarXiv – CS AI · May 277/10

🧠

Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows

Researchers introduce Trajel, a dataset and evaluation framework for detecting hallucinations in multi-step LLM agent workflows, revealing that existing benchmarks miss intermediate failures. The framework defines five hallucination types and shows that trajectory-level detection outperforms traditional post-hoc verification, highlighting critical gaps in current AI safety evaluation methodologies.

AIBullisharXiv – CS AI · May 127/10

🧠

Evidence Over Plans: Online Trajectory Verification for Skill Distillation

Researchers introduce SPARK, a framework that verifies AI agent skills through direct environment interaction rather than relying on pre-written plans. The Posterior Distillation Index (PDI) metric ensures skills are grounded in actual task evidence, producing student models that match or exceed human-written skills while reducing inference costs by up to 1,000x.

AINeutralarXiv – CS AI · Mar 97/10

🧠

From Features to Actions: Explainability in Traditional and Agentic AI Systems

Researchers demonstrate that traditional explainable AI methods designed for static predictions fail when applied to agentic AI systems that make sequential decisions over time. The study shows attribution-based explanations work well for static tasks but trace-based diagnostics are needed to understand failures in multi-step AI agent behaviors.

AINeutralarXiv – CS AI · Jun 196/10

🧠

Automating SKILL.md Generation for Computer-Using Agents via Interaction Trajectory Mining

Researchers developed a three-stage pipeline to automatically extract skill libraries from computer-using agent interaction data, achieving high readability (95% purity on labeled benchmarks) but failing to improve downstream policy performance across domains. The study reveals that while trajectory mining can expose interpretable skill structure, current technical limitations prevent reliable cross-domain transfer improvements.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Think Before You Act: Intention-Guided Reasoning for LLM-Based Location Prediction

Researchers propose IntentPOI, a two-stage AI framework that improves next location prediction by first inferring user intentions before selecting specific points-of-interest. The method outperforms existing approaches by decoupling intention reasoning from location selection, addressing limitations in current LLM-based prediction systems.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Projecting the Emerging Mindset of SWE Agent by Launching a Wild Code Understanding Journey

Researchers introduce Ada, a systematic framework for observing how software engineering agents navigate real codebases through tool-mediated exploration. By analyzing 408 trajectories across multiple models and repositories, the study develops observation methods that reveal agent decision-making patterns—including navigation choices, evidence selection, and stopping criteria—without reducing behavior to raw metrics or speculation.

$ADA

AINeutralarXiv – CS AI · Jun 26/10

🧠

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Researchers introduce TELBench, a benchmark for identifying errors in deep-research AI agent trajectories, and propose DRIFT, a claim-centric auditing framework that improves error localization accuracy by up to 30 percentage points. The work addresses a critical gap in AI evaluation by moving beyond final-answer assessment to analyze intermediate steps in agent reasoning.

AINeutralarXiv – CS AI · May 276/10

🧠

AgentAtlas: Beyond Outcome Leaderboards for LLM Agents

AgentAtlas introduces a comprehensive diagnostic framework for evaluating LLM agents beyond simple success/failure metrics, proposing a six-state control-decision taxonomy and trajectory-failure vocabulary to expose behavioral patterns hidden by outcome-only leaderboards. The research demonstrates that evaluation methodology significantly impacts apparent model performance rankings.

AINeutralarXiv – CS AI · May 115/10

🧠

Online Goal Recognition using Path Signature and Dynamic Time Warping

Researchers introduce a novel online goal recognition method using path signatures and dynamic time warping to efficiently encode and compare continuous trajectory data. The approach demonstrates superior predictive accuracy and planning efficiency compared to existing state-of-the-art methods while maintaining competitive offline performance.

AIBullisharXiv – CS AI · Mar 126/10

🧠

Trajectory-Informed Memory Generation for Self-Improving Agent Systems

Researchers introduce a new framework for AI agent systems that automatically extracts learnings from execution trajectories to improve future performance. The system uses four components including trajectory analysis and contextual memory retrieval, achieving up to 14.3 percentage point improvements in task completion on benchmarks.

AIBullisharXiv – CS AI · Mar 26/1017

🧠

VISTA: Knowledge-Driven Vessel Trajectory Imputation with Repair Provenance

Researchers introduce VISTA, a framework for vessel trajectory imputation that uses knowledge-driven LLM reasoning to repair incomplete maritime tracking data. The system provides 'repair provenance' - documented reasoning behind data repairs - achieving 5-91% accuracy improvements over existing methods while reducing inference time by 51-93%.

AINeutralarXiv – CS AI · Mar 35/107

🧠

SIGMAS: Second-Order Interaction-based Grouping for Overlapping Multi-Agent Swarms

Researchers introduce SIGMAS, a self-supervised AI framework for identifying group structures in multi-agent swarms like drone fleets without ground-truth supervision. The system uses second-order interactions to infer latent group memberships from agent trajectories, demonstrating robust performance across diverse synthetic swarm scenarios.