🧠 AI⚪ NeutralImportance 6/10

SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills

arXiv – CS AI|Yinghan Hou, Zongyou Yang|April 10, 2026 at 04:00 AM

🤖AI Summary

Researchers introduced SkillSieve, a three-layer detection framework that identifies malicious AI agent skills in OpenClaw's ClawHub marketplace, where 13-26% of over 13,000 skills contain security vulnerabilities. The system combines regex/AST scanning, LLM-based analysis with parallel sub-tasks, and multi-LLM voting to achieve 0.800 F1 score at $0.006 per skill, significantly outperforming existing detection methods.

Analysis

SkillSieve addresses a critical security gap in AI agent marketplaces where traditional static analysis tools fail to detect sophisticated attacks hidden in natural language documentation. OpenClaw's ClawHub hosts thousands of community-contributed skills, but current security measures miss obfuscated payloads and prompt injection vectors embedded in instructional files. This vulnerability affects developers who incorporate these skills into production systems and end-users whose data may be exposed through compromised agents.

The framework's hierarchical approach represents a practical solution to the detection problem. By filtering benign skills early with lightweight regex and AST checks through XGBoost scoring, SkillSieve processes roughly 86% of non-malicious code in under 40ms without API calls. Suspicious items advance to specialized LLM analysis across four distinct vectors—intent alignment, permission justification, behavior analysis, and consistency checks—rather than relying on single broad evaluations that miss nuanced attacks. The final jury mechanism with independent LLM voting and debate protocols reduces false positives from disagreement.

For the AI infrastructure ecosystem, this work validates that hybrid detection combining symbolic and neural approaches outperforms either method alone. The 0.800 F1 score versus ClawVet's 0.421 demonstrates meaningful improvement in real-world conditions. Developers integrating third-party skills face pressure to implement enhanced vetting, while marketplace operators like OpenClaw may adopt such frameworks to improve platform trust. The open-sourced methodology enables broader adoption across other agent ecosystems.

Key Takeaways

→SkillSieve achieves nearly 2x better malicious skill detection (F1: 0.800) than existing tools at minimal cost per scan
→The three-layer framework efficiently processes 13,000+ real marketplace skills while maintaining low false positive rates
→Hierarchical triage reduces computational overhead by filtering benign code early before expensive LLM analysis
→Multi-LLM voting with debate protocols improves detection accuracy on adversarial evasion samples
→Open-sourced code and benchmark data enable industry-wide adoption for securing AI agent marketplaces

#ai-security #agent-verification #malware-detection #llm-analysis #marketplace-safety #hybrid-detection #openai-ecosystem #vulnerability-assessment

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge