🧠 AI🔴 BearishImportance 7/10Actionable

When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems

arXiv – CS AI|Su Wang, Pin Qian, Yihang Chen, Junxian You, Xiaoyuan Wang, Xiaochong Jiang, Lifei Liu, Haoran Yu, Jingzhou Xu|June 2, 2026 at 04:00 AM

🤖AI Summary

Researchers present SkillReact, a framework measuring compositional safety risks in LLM agent skill ecosystems, finding that 18.2% of individually-safe skill pairs create genuine safety vulnerabilities when combined—risks missed by per-skill scanning alone. Testing on 211,575 skill pairs from ClawHub reveals model-dependent execution risk, with smaller models like Haiku more likely to execute unsafe tool chains than larger models like Sonnet.

Analysis

The research addresses a critical blind spot in AI safety: individual component safety doesn't guarantee system safety when components interact. While security auditing has long focused on isolated modules, LLM agents operate as compositional systems where skill combinations create emergent behaviors. This study's finding that roughly 14,000 genuine risk memberships exist in a single registry despite per-skill scanning represents a substantial undetected vulnerability class.

This work builds on growing concerns about agent autonomy and tool-use safety as AI systems gain broader capabilities. The field has primarily emphasized individual guardrails, but compositional vulnerabilities reveal structural limitations in current approaches. The SkillReact framework's three-component methodology—static analysis, human-adjudicated validation, and dynamic harness testing—provides a replicable measurement approach that other registries could adopt.

The findings carry implications for AI deployment practices. The variation across model sizes (Haiku executing full chains, Opus stopping partway, Sonnet refusing) demonstrates that system safety depends on host-model design choices, not just installed components. This creates a coordination problem: skill developers, registry maintainers, and model providers each control different safety levers without necessarily aligning incentives.

Developers and organizations deploying agent systems should expect similar compositional risk profiles across existing skill ecosystems. The research suggests install-time composition checks and capability isolation become critical infrastructure, not optional hardening. As agent systems proliferate in production environments, compositional risk assessment will likely become a regulatory and operational requirement alongside traditional security auditing.

Key Takeaways

→18.2% of individually-safe skill pairs create real compositional safety risks, totaling ~14K undiscovered vulnerabilities in one registry
→Model size and design significantly gate whether unsafe skill combinations execute, with smaller models showing higher compliance rates
→Per-skill scanning misses compositional vulnerabilities by construction, requiring new install-time validation frameworks
→Host-model capability composition determines reachability, while the model's disposition determines actual tool-use execution
→Compositional safety requires coordination across skill developers, registries, and model providers with currently misaligned incentives

#agent-safety #llm-security #compositional-risk #tool-use #ai-systems #vulnerability-detection #skill-ecosystems #model-behavior

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge