#agent-planning News & Analysis

4 articles tagged with #agent-planning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles

AINeutralarXiv – CS AI · Jun 56/10

🧠

From Risk Classification to Action Plan Remediation: A Guardrail Feedback Driven Framework for LLM Agents

Researchers introduce TRIAD, a guardrail framework for LLM agents that uses iterative feedback to guide safer behavior rather than simply blocking risky tasks. By classifying risks as proceed, refuse, or update with structured guidance, the system reduces attack success rates to 10.42% while maintaining utility for benign task completion.

AINeutralarXiv – CS AI · Jun 16/10

🧠

PatchWorld: Gradient-Free Optimization of Executable World Models

Researchers introduce PatchWorld, a gradient-free framework that converts offline trajectories into executable Python world models for AI agents operating in partially observable environments. The method achieves 76.4% success on planning tasks without requiring LLM calls during prediction, while revealing a fundamental tradeoff between observation accuracy and decision-making utility in executable world models.

AINeutralarXiv – CS AI · May 286/10

🧠

Do Agents Think Deeper? A Mechanistic Investigation of Layer-Wise Dynamics in Sequential Planning

Researchers conducted a mechanistic analysis of how large language models allocate computational depth when operating as autonomous agents performing multi-turn planning and tool use. The study reveals that agents progressively recruit deeper layers as task complexity increases, contrasting with prior findings that LLMs underutilize depth in single-turn tasks, suggesting adaptive depth allocation emerges in sequential reasoning scenarios.

AINeutralarXiv – CS AI · May 76/10

🧠

CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing

Researchers introduce CreativityBench, a benchmark with 4K entities and 150K+ affordance annotations to evaluate how well large language models can creatively repurpose tools by reasoning about their properties rather than canonical uses. Evaluations across 10 state-of-the-art LLMs reveal significant limitations: models struggle to identify correct parts, affordances, and physical mechanisms needed for non-obvious solutions, with performance gains from scaling and reasoning strategies like Chain-of-Thought proving limited.