🧠 AI⚪ NeutralImportance 7/10Actionable

Defenses & Enablers For Skill Injection Attacks on Terminal Based Agents

arXiv – CS AI|Yoshinari Fujinuma, Varun Gangal, Traian Rebedea, Makesh Narasimhan Sreedhar, Prasoon Varshney, Rebecca Qian, Anand Kannappan|June 2, 2026 at 04:00 AM

🤖AI Summary

Researchers demonstrate that LLM-based terminal agents face significant security risks from skill injection attacks, where malicious instructions embedded in reusable skill files can compromise system integrity. Guardian-based defenses—both static and dynamic intermediary agents—reduce attack success rates by over 50%, though dynamic guardians prove more robust against sophisticated attack reframing attempts.

Analysis

This research addresses a critical vulnerability in the emerging architecture of LLM agents that rely on modular, reusable skill libraries. As AI systems become more autonomous and integrated into production environments, the attack surface expands beyond traditional model vulnerabilities to include supply-chain-like compromises through manipulated skill files. The study reveals that static defenses, while improving security, can be circumvented through prompt reframing techniques that preserve malicious intent while altering phrasing—pushing attack success rates to 81.4% in undefended scenarios.

The guardian-based approach represents a paradigm shift in agent security architecture, introducing an intermediary verification layer that operates either at build time or dynamically during execution. The dynamic guardian's superior performance (reducing ASR to 18.6% even against reframed attacks) demonstrates that real-time monitoring and mediation outperform pre-computed defenses. This finding mirrors security principles from other domains, suggesting that active oversight mechanisms adapt better to adversarial innovation than static rule sets.

For developers building production AI systems, these results carry immediate implications. Organizations deploying LLM agents in terminal environments face a choice between performance overhead (dynamic guardians) and residual vulnerability (static approaches). The research validates that no single defense is perfect, but strategic layering significantly reduces risk. The 81.4% baseline attack success rate without defenses suggests that skill injection is not merely theoretical—it represents a practical threat to systems in development today.

Future work should explore whether these defense mechanisms scale to enterprise-grade skill repositories with thousands of files and whether attackers can develop adversarial techniques that defeat dynamic guardians through obfuscation or temporal attack vectors.

Key Takeaways

→LLM agents using reusable skill files face skill injection attacks with up to 81.4% success rates without defenses
→Dynamic guardian agents reduce attack success rates to 18.6%, proving more effective than static pre-rewriting approaches
→Attack reframing that preserves malicious intent while changing phrasing can circumvent non-guardian defenses
→Real-time mediation of skill file access provides robust defense against sophisticated adversarial attacks
→Guardian-based defenses maintain task utility while cutting baseline attack success rates by over 50%