POISE: Position-Aware Undetectable Skill Injection on LLM Agents
Researchers introduce POISE, a novel skill-poisoning attack against LLM agents that achieves 89.3% success by embedding malicious triggers into skill instructions in ways that evade both automated detection and human inspection. The attack exploits the reliability-stealth trade-off in existing injection methods, demonstrating that current security defenses struggle to distinguish poisoned skills from legitimate ones due to high false-positive rates.
POISE represents a significant advancement in adversarial attacks against language model agents, exploiting a critical vulnerability in how agent skills are validated and executed. The attack works by compressing malicious payloads into single, contextually-blended instructions positioned within skill bodies, making them appear as natural prerequisites or setup steps rather than explicit commands. This approach sidesteps the detection-reliability dilemma: YAML header injections are easily inspected, while naive body injections trigger agent suspicion, but POISE threads this needle through intelligent positioning and context-aware generation.
The research reveals a systemic weakness in current defense mechanisms. Static security scanners, when deployed against legitimate skills containing privileged tool operations, generate false-positive rates exceeding 74% across multiple judges and benchmarks. POISE exploits this signal-to-noise problem by blending poisoned variants into the noise floor—only 5.6% of poisoned skills gain new high-risk alerts compared to clean baselines. This creates a fundamental asymmetry: defenders must catch all poisoned skills while attackers need only slip past overwhelmed detection systems.
For the AI security community, POISE underscores that open skill formats inherently trade extensibility for vulnerability. The 28-point improvement over baseline body injection attacks demonstrates that attack sophistication is outpacing defensive capabilities. Organizations deploying LLM agents with skill systems face a critical challenge: current detection tools are too noisy to be reliable, yet perfectly accurate verification may be computationally prohibitive. The findings suggest that agent architecture design—including skill sandboxing, capability restrictions, and behavioral verification—requires immediate reassessment before widespread deployment of agent systems in high-stakes environments.
- →POISE achieves 89.3% attack success by embedding triggers into skill instructions while evading detection through context-aware blending
- →Current LLM security scanners have 74.6% false-positive rates on legitimate skills, making them ineffective filters for poisoned variants
- →Position-aware payload placement exploits the gap between automated detection's sensitivity and its specificity limitations
- →Existing skill-poisoning defenses face an unsolved reliability-stealth trade-off that POISE successfully exploits
- →Agent skill security requires architectural redesign beyond static scanning, including sandboxing and behavioral verification