SkillTrojan: Backdoor Attacks on Skill-Based Agent Systems
Researchers have identified SkillTrojan, a novel backdoor attack targeting skill-based agent systems by embedding malicious logic within reusable skills rather than model parameters. The attack leverages skill composition to execute attacker-defined payloads with up to 97.2% success rates while maintaining clean task performance, revealing critical security gaps in AI agent architectures.
SkillTrojan represents a paradigm shift in AI security threats by targeting the compositional architecture of skill-based agent systems rather than traditional attack vectors. As AI agents increasingly rely on modular, reusable skills to handle complex tasks, this research exposes how attackers can weaponize the system's fundamental design principle—skill composition—against itself. The attack partitions encrypted payloads across multiple seemingly benign skills, activating only when specific trigger conditions are met, making detection extraordinarily difficult.
This vulnerability emerges as skill-based architectures gain adoption for their scalability and modularity benefits. Systems like those powering code-based agents and enterprise applications increasingly depend on integrating third-party skills, creating an expanding attack surface. Unlike traditional model poisoning or data contamination attacks, SkillTrojan operates at the skill level, making it potentially harder to detect through conventional security audits that focus on model weights or training data integrity.
The broader implications are significant for developers and enterprises deploying skill-based agent systems, particularly in sensitive domains like healthcare (as demonstrated by the EHR SQL evaluation). The ability to maintain 89.3% clean accuracy while achieving 97.2% attack success rates means compromised systems could operate effectively in normal conditions while executing hidden malicious objectives. This dual-operation capability poses substantial risks for enterprises relying on agent systems for critical decision-making or data access.
Moving forward, the research motivates development of defenses specifically addressing skill composition and execution patterns. Organizations deploying skill-based agents should implement rigorous verification of skill origins, monitoring of skill invocation sequences, and architectural changes that limit potential payload reconstruction from skill interactions.
- →SkillTrojan attacks embed malicious logic in reusable skills, achieving 97.2% attack success rates while maintaining normal performance
- →The attack exploits skill composition architecture by partitioning encrypted payloads across multiple benign-looking skill invocations
- →Current skill-based agent security models lack defenses against composition-level attacks, representing a critical architectural blind spot
- →The researchers released 3,000+ backdoored skills to enable systematic evaluation and defense development
- →Enterprises deploying skill-based agents in sensitive domains face elevated risks without new compositional-aware security measures