#procedural-knowledge News & Analysis

12 articles tagged with #procedural-knowledge. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

12 articles

AIBullisharXiv – CS AI · Jun 197/10

🧠

Multi-Agent Transactive Memory

Researchers propose Multi-Agent Transactive Memory (MATM), a framework enabling decentralized LLM agents to share and retrieve trajectories—recorded problem-solving paths—from a shared repository. Experiments in interactive environments demonstrate that agents retrieving stored trajectories improve task performance and efficiency without requiring coordination or joint training.

AIBullisharXiv – CS AI · Jun 97/10

🧠

Anything2Skill: Compiling External Knowledge into Reusable Skills for Agents

Researchers introduce Anything2Skill, a framework that converts external knowledge sources into reusable, executable skills for AI agents. By combining skill extraction with retrieval-augmented generation, the system achieves 98.85% success on command-line tasks and 94.10% on GitHub operations, significantly outperforming RAG-only approaches.

AIBullisharXiv – CS AI · Jun 27/10

🧠

Language-Native Materials Processing Design by Lightly Structured Text Database and Reasoning Large Language Model

Researchers have developed an AI framework that transforms materials synthesis procedures from unstructured narrative text into actionable, computable knowledge using large language models and structured databases. The system successfully optimized boron nitride nanosheet synthesis in three iterations, demonstrating AI's potential to accelerate complex materials discovery beyond traditional trial-and-error approaches.

AIBullisharXiv – CS AI · May 297/10

🧠

GRASP: Gated Regression-Aware Skill Proposer for Self-Improving LLM Agents

Researchers introduce GRASP, a method for improving large language model agents through controlled skill library updates that prevent performance regression. Tested across five base models on clinical benchmarks, GRASP achieves dramatic improvements (40.6% to 88.8% on MedAgentBench) while maintaining stability, outperforming existing self-improvement approaches by significant margins.

🧠 GPT-4🧠 GPT-5🧠 Gemini

AIBullisharXiv – CS AI · May 127/10

🧠

MIND-Skill: Quality-Guaranteed Skill Generation via Multi-Agent Induction and Deduction

Researchers introduce MIND-Skill, an automated framework that generates reusable skills for LLM-powered AI agents by analyzing successful task trajectories. The system uses dual agents with quality-control mechanisms to create generalizable, documented procedures that enable autonomous systems to handle complex, multi-step problems without manual human expertise.

AINeutralarXiv – CS AI · Jun 116/10

🧠

SkillJuror: Measuring How Agent Skill Organization Changes Runtime Behavior

Researchers introduce SkillJuror, a framework measuring how LLM agent skill organization affects runtime behavior independent of content. Testing Progressive Disclosure—a hierarchical skill structure—against flat baselines shows agents access 3.26x more resources and achieve 4.1% higher verification rates, revealing that procedural knowledge presentation meaningfully influences agent reasoning patterns.

AIBullisharXiv – CS AI · Jun 86/10

🧠

Workflow-to-Skill: Skill Creation via Routing-Workflow-Semantics-Attachments Decomposition

Researchers introduce W2S, a framework for automatically constructing high-quality skills for large language model agents by decomposing execution traces into workflow structures, semantics, and attachments. The approach outperforms traditional summarization methods by 10.5%, demonstrating that treating traces as executable specifications rather than text yields more reliable agent behavior.

AIBullisharXiv – CS AI · Jun 26/10

🧠

SkillRevise: Improving LLM-Authored Agent Skills via Trace-Conditioned Skill Revision

Researchers introduce SkillRevise, a framework that automatically refines LLM agent skills through execution-grounded iteration, improving task success rates from 36% to 62% on benchmarks. The approach addresses the cold-start problem in agent development by diagnosing defects from execution traces and applying targeted repairs, while demonstrating strong cross-model transferability.

AINeutralarXiv – CS AI · Jun 26/10

🧠

MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

Researchers introduce MMG2Skill, a framework that converts unstructured web guides into executable skills for AI agents, with a new benchmark for evaluation. The system improves agent performance by 12.8-25.3 percentage points across multiple domains by structuring knowledge, conditioning vision-language models on refined skills, and iteratively improving them from agent trajectories.

AINeutralarXiv – CS AI · May 275/10

🧠

Managing Uncertainty in LLM-Generated Procedural Knowledge for Virtual Laboratory Planning

Researchers present a framework for managing uncertainty in language model-generated laboratory procedures for virtual educational environments. The system uses structured domain representations and LLM outputs to extract, validate, and repair procedural steps, addressing common LLM failures like missing actions, incorrect sequencing, and logical incompatibilities.

AIBullisharXiv – CS AI · May 126/10

🧠

EmbodiSkill: Skill-Aware Reflection for Self-Evolving Embodied Agents

EmbodiSkill introduces a training-free framework enabling embodied AI agents to autonomously improve their skills through reflection on task execution trajectories. By distinguishing between skill deficiencies and execution lapses, the system allows frozen language models to achieve significantly higher task success rates, with a Qwen 3.5-27B model reaching 93.28% success on ALFWorld benchmarks.

🧠 GPT-5

AINeutralarXiv – CS AI · Mar 166/10

🧠

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

SkillsBench introduces a new benchmark to evaluate Agent Skills - structured packages of procedural knowledge that enhance LLM agents. Testing across 86 tasks and 11 domains shows curated Skills improve performance by 16.2 percentage points on average, while self-generated Skills provide no benefit.