#agent-skills News & Analysis

14 articles tagged with #agent-skills. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

14 articles

AIBullisharXiv – CS AI · Jun 117/10

🧠

Runtime Skill Audit: Targeted Runtime Probing for Agent Skill Security

Researchers introduced Runtime Skill Audit (RSA), a dynamic analysis method that detects malicious behavior in LLM agent skills by testing them under targeted runtime conditions rather than relying on static code review. RSA achieved 90% accuracy in identifying harmful skills and maintained effectiveness against evolving attacks where static methods failed, addressing a critical security gap in agent-based AI systems.

AIBullisharXiv – CS AI · Jun 47/10

🧠

AIP: A Graph Representation for Learning and Governing Agent Skills

Researchers introduce the Agent Instruction Protocol (AIP), a graph-based framework that structures AI agent skills as executable directed graphs instead of free-form prose. Testing on real agent tasks shows significant performance improvements, with Claude Sonnet's task completion rate increasing from 53% to 67%, while enabling more precise skill debugging and improvement through schema validation and node-level diagnostics.

🧠 Claude

AIBullisharXiv – CS AI · Jun 27/10

🧠

ANDES: Agent Native Data Evolving Synthesis Tool for Autonomous Instruction Alignment

Researchers introduce ANDES, a framework that enables AI agents to autonomously generate high-quality training data for LLM alignment by abstracting complex data-gathering tasks into a manageable agent skill. The system uses a self-evolving World Tree routing mechanism to help agents navigate noisy web environments and achieve state-of-the-art performance on alignment benchmarks despite computational constraints.

AINeutralarXiv – CS AI · Jun 27/10

🧠

ClawHub Security Signals: When VirusTotal, Static Analysis, and SkillSpector Disagree

Researchers released ClawHub Security Signals, a dataset of 67,453 AI agent skills analyzed by three security scanners, revealing significant disagreement among detection methods. Only 0.69% of skills were flagged by all three scanners, indicating that single-scanner verdicts are insufficient for securing AI agent ecosystems and requiring layered security governance instead.

🏢 Nvidia

AIBearisharXiv – CS AI · May 287/10

🧠

Technical Report: Exploring the Emerging Threats of the Agent Skill Ecosystem

Researchers identified 76 confirmed malicious AI agent skills across major marketplaces, with 13.4% of 3,984 analyzed skills containing critical security vulnerabilities. The findings highlight urgent risks as AI agents gain access to sensitive credentials and systems, with malicious payloads still publicly available on platforms like clawhub.ai.

AIBullisharXiv – CS AI · Apr 207/10

🧠

Bilevel Optimization of Agent Skills via Monte Carlo Tree Search

Researchers propose a bilevel optimization framework using Monte Carlo Tree Search to systematically improve LLM agent skills—structured collections of instructions, tools, and resources. The framework optimizes both skill structure and component content simultaneously, demonstrating performance improvements on Operations Research tasks and addressing a previously unsolved challenge in agent design optimization.

AIBearisharXiv – CS AI · Apr 67/10

🧠

Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis

Researchers conducted the first comprehensive security analysis of Agent Skills, an emerging standard for LLM-based agents to acquire domain expertise. The study identified significant structural vulnerabilities across the framework's lifecycle, including lack of data-instruction boundaries and insufficient security review processes.

AI × CryptoBullishBlockonomi · Apr 47/10

🤖

Solana Foundation Launches Agent Skills to Connect AI Tools With On-Chain Operations

Solana Foundation launched Agent Skills, a developer toolkit that enables one-line integration of AI tools with blockchain operations. The platform features over 60 community-built skills with prebuilt security and compatibility components, supporting DeFi, payments, and infrastructure functions across major platforms like JupiterExchange and Raydium.

$SOL

AINeutralarXiv – CS AI · Jun 236/10

🧠

Harnessing Agent Skills: Architectural Patterns and a Reference Architecture for Skill-Mediated LLM Agents

Researchers present a formal architectural framework for managing LLM agent skills—reusable behavioral components that agents dynamically select and execute. The paper catalogs ten architectural patterns organized into four responsibility layers (Supply Chain, Mediation, Execution Control, Evidence & Feedback) and provides a reference architecture validated across eight systems, establishing a standardized approach for skill governance in agent-based AI applications.

AINeutralarXiv – CS AI · May 46/10

🧠

Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis

Semia is a static auditor for LLM-driven agent skills that uses constraint-guided synthesis to analyze security risks in hybrid code-and-prose configurations. Testing 13,728 real-world skills from public marketplaces, Semia identified critical semantic vulnerabilities in over half and achieved 97.7% recall, significantly outperforming existing security tools.

AINeutralarXiv – CS AI · May 46/10

🧠

Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes

Researchers propose a trust framework for AI agent skills—reusable code packages that extend language models—treating them as untrusted by default until verified. The approach introduces verification levels, capability gates, and correctness criteria to enable sustainable human-in-the-loop oversight without operational bottlenecks.

AINeutralarXiv – CS AI · Mar 166/10

🧠

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

SkillsBench introduces a new benchmark to evaluate Agent Skills - structured packages of procedural knowledge that enhance LLM agents. Testing across 86 tasks and 11 domains shows curated Skills improve performance by 16.2 percentage points on average, while self-generated Skills provide no benefit.

AIBullishHugging Face Blog · Mar 66/10

🧠

Conversational LLM Evaluations in Minutes with NVIDIA NeMo Evaluator Agent Skills

NVIDIA has released NeMo Evaluator Agent Skills, a tool that enables rapid evaluation of conversational large language models in minutes. This development streamlines the testing and validation process for LLM applications, potentially accelerating AI development workflows.

🏢 Nvidia

AINeutralarXiv – CS AI · Mar 37/106

🧠

Formal Analysis and Supply Chain Security for Agentic AI Skills

Researchers developed SkillFortify, the first formal analysis framework for securing AI agent skill supply chains, addressing critical vulnerabilities exposed by attacks like ClawHavoc that infiltrated over 1,200 malicious skills. The framework achieved 96.95% F1 score with 100% precision and zero false positives in detecting malicious AI agent skills.