SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills
Researchers introduced SkillSieve, a three-layer detection framework that identifies malicious AI agent skills in OpenClaw's ClawHub marketplace, where 13-26% of over 13,000 skills contain security vulnerabilities. The system combines regex/AST scanning, LLM-based analysis with parallel sub-tasks, and multi-LLM voting to achieve 0.800 F1 score at $0.006 per skill, significantly outperforming existing detection methods.
SkillSieve addresses a critical security gap in AI agent marketplaces where traditional static analysis tools fail to detect sophisticated attacks hidden in natural language documentation. OpenClaw's ClawHub hosts thousands of community-contributed skills, but current security measures miss obfuscated payloads and prompt injection vectors embedded in instructional files. This vulnerability affects developers who incorporate these skills into production systems and end-users whose data may be exposed through compromised agents.
The framework's hierarchical approach represents a practical solution to the detection problem. By filtering benign skills early with lightweight regex and AST checks through XGBoost scoring, SkillSieve processes roughly 86% of non-malicious code in under 40ms without API calls. Suspicious items advance to specialized LLM analysis across four distinct vectors—intent alignment, permission justification, behavior analysis, and consistency checks—rather than relying on single broad evaluations that miss nuanced attacks. The final jury mechanism with independent LLM voting and debate protocols reduces false positives from disagreement.
For the AI infrastructure ecosystem, this work validates that hybrid detection combining symbolic and neural approaches outperforms either method alone. The 0.800 F1 score versus ClawVet's 0.421 demonstrates meaningful improvement in real-world conditions. Developers integrating third-party skills face pressure to implement enhanced vetting, while marketplace operators like OpenClaw may adopt such frameworks to improve platform trust. The open-sourced methodology enables broader adoption across other agent ecosystems.
- →SkillSieve achieves nearly 2x better malicious skill detection (F1: 0.800) than existing tools at minimal cost per scan
- →The three-layer framework efficiently processes 13,000+ real marketplace skills while maintaining low false positive rates
- →Hierarchical triage reduces computational overhead by filtering benign code early before expensive LLM analysis
- →Multi-LLM voting with debate protocols improves detection accuracy on adversarial evasion samples
- →Open-sourced code and benchmark data enable industry-wide adoption for securing AI agent marketplaces